Concurrent Online Testing for Many Core Systems-on-Chips
Shrinking transistor sizes have introduced new challenges and opportunities for system-on-chip (SoC) design and reliability. Smaller transistors are more susceptible to early lifetime failure and electronic wear-out, greatly reducing their reliable lifetimes. However, smaller transistors will also a...
Main Author: | |
---|---|
Other Authors: | , , , |
Format: | Thesis |
Language: | English |
Published: |
2010
|
Subjects: | |
Online Access: | https://hdl.handle.net/1969.1/ETD-TAMU-2010-12-8615 |
id |
fttexasamuniv:oai:oaktrust.library.tamu.edu:1969.1/ETD-TAMU-2010-12-8615 |
---|---|
record_format |
openpolar |
spelling |
fttexasamuniv:oai:oaktrust.library.tamu.edu:1969.1/ETD-TAMU-2010-12-8615 2023-07-16T03:57:33+02:00 Concurrent Online Testing for Many Core Systems-on-Chips Lee, Jason Daniel Mahapatra, Rabinarayan N. Walker, Duncan M. Kim, Eun J. Choi, Seong G. December 2010 application/pdf https://hdl.handle.net/1969.1/ETD-TAMU-2010-12-8615 en_US eng https://hdl.handle.net/1969.1/ETD-TAMU-2010-12-8615 concurrent online testing many core safety-critical systems in-field testing electronic wearout Thesis text 2010 fttexasamuniv 2023-06-27T22:22:35Z Shrinking transistor sizes have introduced new challenges and opportunities for system-on-chip (SoC) design and reliability. Smaller transistors are more susceptible to early lifetime failure and electronic wear-out, greatly reducing their reliable lifetimes. However, smaller transistors will also allow SoC to contain hundreds of processing cores and other infrastructure components with the potential for increased reliability through massive structural redundancy. Concurrent online testing (COLT) can provide sufficient reliability and availability to systems with this redundancy. COLT manages the process of testing a subset of processing cores while the rest of the system remains operational. This can be considered a temporary, graceful degradation of system performance that increases reliability while maintaining availability. In this dissertation, techniques to assist COLT are proposed and analyzed. The techniques described in this dissertation focus on two major aspects of COLT feasibility: recovery time and test delivery costs. To reduce the time between failure and recovery, and thereby increase system availability, an anomaly-based test triggering unit (ATTU) is proposed to initiate COLT when anomalous network behavior is detected. Previous COLT techniques have relied on initiating tests periodically. However, determining the testing period is based on a device's mean time between failures (MTBF), and calculating MTBF is exceedingly difficult and imprecise. To address the test delivery costs associated with COLT, a distributed test vector storage (DTVS) technique is proposed to eliminate the dependency of test delivery costs on core location. Previous COLT techniques have relied on a single location to store test vectors, and it has been demonstrated that centralized storage of tests scales poorly as the number of cores per SoC grows. Assuming that the SoC organizes its processing cores with a regular topology, DTVS uses an interleaving technique to optimally distribute the test vectors across the entire ... Thesis Attu Texas A&M University Digital Repository |
institution |
Open Polar |
collection |
Texas A&M University Digital Repository |
op_collection_id |
fttexasamuniv |
language |
English |
topic |
concurrent online testing many core safety-critical systems in-field testing electronic wearout |
spellingShingle |
concurrent online testing many core safety-critical systems in-field testing electronic wearout Lee, Jason Daniel Concurrent Online Testing for Many Core Systems-on-Chips |
topic_facet |
concurrent online testing many core safety-critical systems in-field testing electronic wearout |
description |
Shrinking transistor sizes have introduced new challenges and opportunities for system-on-chip (SoC) design and reliability. Smaller transistors are more susceptible to early lifetime failure and electronic wear-out, greatly reducing their reliable lifetimes. However, smaller transistors will also allow SoC to contain hundreds of processing cores and other infrastructure components with the potential for increased reliability through massive structural redundancy. Concurrent online testing (COLT) can provide sufficient reliability and availability to systems with this redundancy. COLT manages the process of testing a subset of processing cores while the rest of the system remains operational. This can be considered a temporary, graceful degradation of system performance that increases reliability while maintaining availability. In this dissertation, techniques to assist COLT are proposed and analyzed. The techniques described in this dissertation focus on two major aspects of COLT feasibility: recovery time and test delivery costs. To reduce the time between failure and recovery, and thereby increase system availability, an anomaly-based test triggering unit (ATTU) is proposed to initiate COLT when anomalous network behavior is detected. Previous COLT techniques have relied on initiating tests periodically. However, determining the testing period is based on a device's mean time between failures (MTBF), and calculating MTBF is exceedingly difficult and imprecise. To address the test delivery costs associated with COLT, a distributed test vector storage (DTVS) technique is proposed to eliminate the dependency of test delivery costs on core location. Previous COLT techniques have relied on a single location to store test vectors, and it has been demonstrated that centralized storage of tests scales poorly as the number of cores per SoC grows. Assuming that the SoC organizes its processing cores with a regular topology, DTVS uses an interleaving technique to optimally distribute the test vectors across the entire ... |
author2 |
Mahapatra, Rabinarayan N. Walker, Duncan M. Kim, Eun J. Choi, Seong G. |
format |
Thesis |
author |
Lee, Jason Daniel |
author_facet |
Lee, Jason Daniel |
author_sort |
Lee, Jason Daniel |
title |
Concurrent Online Testing for Many Core Systems-on-Chips |
title_short |
Concurrent Online Testing for Many Core Systems-on-Chips |
title_full |
Concurrent Online Testing for Many Core Systems-on-Chips |
title_fullStr |
Concurrent Online Testing for Many Core Systems-on-Chips |
title_full_unstemmed |
Concurrent Online Testing for Many Core Systems-on-Chips |
title_sort |
concurrent online testing for many core systems-on-chips |
publishDate |
2010 |
url |
https://hdl.handle.net/1969.1/ETD-TAMU-2010-12-8615 |
genre |
Attu |
genre_facet |
Attu |
op_relation |
https://hdl.handle.net/1969.1/ETD-TAMU-2010-12-8615 |
_version_ |
1771544177606131712 |