The motivation of this research is to study different cache designs for on-chip
caches that improve processor performance and at the same time
minimize the degradation to system performance caused by an increase in
the processor memory traffic. As VLSI technology advances we can have
bigger and more complex on-chip...
Conventional register files spread porting resources uniformly across all registers. This paper proposes a method called Asymmetric Clustering using a Register Cache (ACRC). ACRC utilizes a fast register cache that concentrates valuable register file ports to the most active registers thereby reducing the total register file area and power consumption....
Computers using the tagged-token dataflow model are among the best candidates for delivering extremely high levels of performance required in the future. Instruction scheduling in these computers is determined by associatively matching data-bearing tokens in a Waiting-Matching Unit (W-M unit). At the W-M unit, incoming tokens with matching contexts are...
The amount of instruction level parallelism (ILP) that can be exploited depends
greatly on the size of the instruction window and the number of in-flight instructions
the processor can support. However, this requires a register file with a large set of
physical registers for renaming and multiple ports to provide...
Memory hierarchy design is becoming more important as the speed gap be- tween processor and memory continues to grow. Investigations of memory perfor- mance have typically been conducted using trace-driven emulation, which could take tremendous resources (e.g. long emulation time, large storage requirements for traces, and high overall cost). Recent...
The purpose of this thesis is to explore methods which can reduce the power dissipation of a mobile system while decoding MPEG video. MPEG decoding is a microprocessor intensive process that makes heavy use of both the L1 and L2 caches as well as main memory. The heavy load placed...