Modern superscalar processors exploit instruction-level parallelism (ILP) by issuing multiple instructions in a single cycle because of increasing demand for higher performance in computing. However, stalls due to cache misses severely degrade the performance by disturbing the exploitation of ILP. Multiprocessors also greatly exacerbate the memory latency problem. In SMPs,...