WHOLE: A Low Energy I-Cache with Separate Way History

Set-associative instruction caches achieve low miss rates at the expense of significant energy dissipation. Previous energy-efficient approaches usually suffer from performance degradation and redundant extension bits. In this paper, we propose a Way History Oriented Low Energy Instruction Cache (WH...

Full description

Bibliographic Details
Published in:2009 IEEE International Conference on Computer Design
Main Authors: Xie, Zichao, Tong, Dong, Cheng, Xu
Other Authors: Xie, ZC (reprint author), Peking Univ, Microprocessor Res & Dev Ctr, Beijing 100871, Peoples R China., Peking Univ, Microprocessor Res & Dev Ctr, Beijing 100871, Peoples R China.
Format: Conference Object
Language:English
Published: 2009
Subjects:
Online Access:https://hdl.handle.net/20.500.11897/260967
https://doi.org/10.1109/ICCD.2009.5413162
Description
Summary:Set-associative instruction caches achieve low miss rates at the expense of significant energy dissipation. Previous energy-efficient approaches usually suffer from performance degradation and redundant extension bits. In this paper, we propose a Way History Oriented Low Energy Instruction Cache (WHOLE-Cache) design for single issue and in-order execution processors. The WHOLE-Cache design not only achieves a significant portion of energy reduction by effectively reducing dynamic energy dissipation of set-associative instruction cache, but also leads to no additional cycle penalties. Tag comparison results are stored into either the Branch Target Buffer (BTB) or the Instruction Cache (I-Cache) to avoid tag checks and unnecessary way activation for subsequent accesses to visited cache lines. The extended BTB uses way history bits for branch instructions, while the I-Cache extension bits are used in case of fetching consecutive instructions resided in different cache lines. A valid flag is associated with each stored tag comparison result to indicate whether the instruction to be fetched is resided in the recorded location. A simple invalidation scheme is implemented in the cache miss replacement operation. Whenever a cache line is replaced, the pointers to it, which reside in the BTB or other I-cache lines, will be invalidated accordingly. We model the WHOLE-Cache design in Verilog. By deriving basic parameters from TSMC 65nm technology, we use Wattch simulator to evaluate the performance and energy reduction of the WHOLE-Cache in the instruction fetch stage. We use SPEC2000 and Mediabench as benchmarks. It is observed that compared with a conventional 4-way set-associative I-Cache, the energy consumption of the WHOLE-Cache is reduced by 65% without any performance penalty. Computer Science, Hardware & Architecture Computer Science, Theory & Methods Engineering, Electrical & Electronic EI CPCI-S(ISTP) 1