University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Compiler Extensions towards Reliable Multicore Processors

Nezzari, Yasser and Bridges, Christopher (2017) Compiler Extensions towards Reliable Multicore Processors In: 2017 IEEE Aerospace Conference, 2017-03-04 - 2017-03-11, Montana, USA.

2017_IEEE_AIAA_Nezzari_Final.pdf - Accepted version Manuscript
Available under License : See the attached licence file.

Download (543kB) | Preview
Text (licence)
Available under License : See the attached licence file.

Download (33kB) | Preview


The current trend in commercial processors is producing multi-core architectures which pose both an opportunity and a challenge for future space based processing. The opportunity is how to leverage multi-core processors for high intensity computing applications and thus provide an order of magnitude increase in onboard processing capability with less size, mass, and power. The challenge is to provide the requisite safety and reliability in an extremely challenging radiation environment. The objective is to advance from multiple single processor systems typically flown to a fault tolerant multi-core system. Software based methods for multi-core processor fault tolerance to single event effects (SEEs) causing interrupts or ‘bit-flips’ are investigated and we propose to utilize additional cores and memory resources together with newly developed software protection techniques. This work also assesses the optimal trade space between reliability and performance. Our work is based on the modern compiler “LLVM” as it is ported to many architectures, where we implement optimization passes that enable automatic addition of protection techniques including Nmodular redundancy (NMR) and error detection and correction (EDAC) at assembly/instruction level to languages supported. The optimization passes modify the intermediate representation of the source code meaning it could be applied for any high level language, and any processor architecture supported by the LLVM framework. In our initial experiments, we implement separately triple modular redundancy (TMR) and error detection and correction codes including (Hamming, BCH) at instruction level. We combine these two methods for critical applications, where we first TMR our instructions, and then use EDAC as a further measure, when TMR is not able to correct the errors originating from the SEE. Our initial experiments show good performance (about 10% overhead) when protecting the memory of code using double error detection single error correction hamming code and TMR (Triple modular redundancy), further work is needed to improve the performance when protecting the memory of code using the BCH code. This work would be highly valuable, both to satellites/space but also in general computing such as in in aircraft, automotive, server farms, and medical equipment (or anywhere that needs safety critical performance) as hardware gets smaller and more susceptible.

Item Type: Conference or Workshop Item (Conference Paper)
Subjects : Surrey Space Centre
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering > Surrey Space Centre
Authors :
Date : 8 June 2017
DOI : 10.1109/AERO.2017.7943714
Copyright Disclaimer : © 2017 Crown. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Contributors :
Related URLs :
Depositing User : Symplectic Elements
Date Deposited : 13 Jan 2017 15:35
Last Modified : 16 Jan 2019 17:11

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800