The purpose of this thesis is to explore dependency speculation in Dynamic Simultaneous Multi-Threading (DSMT). DSMT is a microprocessor architecture which attempts to extract Thread Level Parallelism (TLP) from single-threaded programs at run-time. This is accomplished by running multiple iterations of program loops in parallel. The DSMT architecture was originally...
Conventional register files spread porting resources uniformly across all registers. This paper proposes a method called Asymmetric Clustering using a Register Cache (ACRC). ACRC utilizes a fast register cache that concentrates valuable register file ports to the most active registers thereby reducing the total register file area and power consumption....
The amount of instruction level parallelism (ILP) that can be exploited depends
greatly on the size of the instruction window and the number of in-flight instructions
the processor can support. However, this requires a register file with a large set of
physical registers for renaming and multiple ports to provide...
Dynamic multithreaded processors attempt to increase the performance of a single
sequential program by dynamically extracting threads from sources such as loop
iterations. The scheduling of instructions in such a processor plays a vital role in the
amount of thread level parallelism that can be extracted and thus the overall...
The purpose of this thesis is to explore methods which can reduce the power dissipation of a mobile system while decoding MPEG video. MPEG decoding is a microprocessor intensive process that makes heavy use of both the L1 and L2 caches as well as main memory. The heavy load placed...
Wireless Networks have been widely adopted into a major part of today's network infrastructure. They have become a popular technology to not only expand the coverage of wired networks but also to interconnect a large wireless network, i.e., wireless mesh networks. As they allow more flexible communication than traditional wired-networks...
The Advent of multi-cores allows programs to be executed much faster than before. Cryptoalgorithms use long-bit words thus parallelizing these operations on multi-cores will achieve significant performance improvement. However, not all long-bit word operations in cryptosystems are suitable for parallel execution on multi-cores. In particular, long-bit words used in Elliptic...
This thesis presents a novel methodology that enables power efficient video decoding
in an embedded system based on MPSoC (Multiprocessor System on Chip). This
methodology is a physical combination of parallel processing which reduces power
consumption of processors by exploiting thread-level parallelism and Dynamic
Voltage Frequency Scaling (DVFS) that allows...
For many years, the von Neumann bottleneck has imposed speed limits on the execution of a program. Because of their sequential nature, von Neumann computers can only execute a single instruction at a time. Instructions that are side-effect free and can be executed in parallel must wait. In an effort...
Designing and building a home Media Center can be a daunting task given the multitude of available options and configurations. The goal of this project is to build a home Media Center, and in the process, create a guide that a less technically knowledgeable person could use to build their...