WHOLE: A Low Energy I-Cache with Separate Way History
Set-associative instruction caches achieve low miss rates at the expense of significant energy dissipation. Previous energy-efficient approaches usually suffer from performance degradation and redundant extension bits. In this paper, we propose a Way History Oriented Low Energy Instruction Cache (WH...
Published in: | 2009 IEEE International Conference on Computer Design |
---|---|
Main Authors: | , , |
Other Authors: | , |
Format: | Conference Object |
Language: | English |
Published: |
2009
|
Subjects: | |
Online Access: | https://hdl.handle.net/20.500.11897/260967 https://doi.org/10.1109/ICCD.2009.5413162 |
Summary: | Set-associative instruction caches achieve low miss rates at the expense of significant energy dissipation. Previous energy-efficient approaches usually suffer from performance degradation and redundant extension bits. In this paper, we propose a Way History Oriented Low Energy Instruction Cache (WHOLE-Cache) design for single issue and in-order execution processors. The WHOLE-Cache design not only achieves a significant portion of energy reduction by effectively reducing dynamic energy dissipation of set-associative instruction cache, but also leads to no additional cycle penalties. Tag comparison results are stored into either the Branch Target Buffer (BTB) or the Instruction Cache (I-Cache) to avoid tag checks and unnecessary way activation for subsequent accesses to visited cache lines. The extended BTB uses way history bits for branch instructions, while the I-Cache extension bits are used in case of fetching consecutive instructions resided in different cache lines. A valid flag is associated with each stored tag comparison result to indicate whether the instruction to be fetched is resided in the recorded location. A simple invalidation scheme is implemented in the cache miss replacement operation. Whenever a cache line is replaced, the pointers to it, which reside in the BTB or other I-cache lines, will be invalidated accordingly. We model the WHOLE-Cache design in Verilog. By deriving basic parameters from TSMC 65nm technology, we use Wattch simulator to evaluate the performance and energy reduction of the WHOLE-Cache in the instruction fetch stage. We use SPEC2000 and Mediabench as benchmarks. It is observed that compared with a conventional 4-way set-associative I-Cache, the energy consumption of the WHOLE-Cache is reduced by 65% without any performance penalty. Computer Science, Hardware & Architecture Computer Science, Theory & Methods Engineering, Electrical & Electronic EI CPCI-S(ISTP) 1 |
---|