- With the evolving popularity of new computing platforms such as Ultrabooks, Tablets, and Smart Phones, and the shift to multi-core computing, power is now the key performance limiter, a departure from the traditional frequency limitation. As such, increasingly low-power design solutions feature prominently in early architectural and design space exploration in CPU/SoC design. On a high performance CPU, majority of these early studies involve memories, especially Register Files. Register Files (RF) are the preferred memory element for fast data access and are therefore ubiquitous in modern microprocessor design, contributing approximately 30% of Intel's 32nm CPU core power. The goal of this research is two-fold. First, it explores low-power design techniques to reduce RF leakage and dynamic power at minimal delay and area cost. We analyze RF power distribution, data residencies, signal activities, and logic dependencies in modern 32nm/22nm high performance microprocessors. We then propose new circuit techniques to reduce power in critical memory logic blocks such as the bitcell, write data distribution, read access data path, and decoder. We use innovative transistor stack-forcing techniques to reduce RF read bitline and decoder leakage by as much as 90% and delay by 30% at minimal to no area overhead compared to existing stacking approaches. An essential component of low-power design is an accurate predictive model (power, area, and timing) for early architectural and design space tradeoff analysis. On a high performance CPU, greater than 75% of RFs are custom designed due to design complexities and constraints (power, area, timing, low-voltage operation requirements). Existing models are particularly unsuited for custom RF because these models typically assume a generic RF circuit implementation and are therefore inaccurate for predicting unique RF topologies without requiring new model development. Furthermore, these models do not accurately address common design optimizations such as device sizing, data gating, segmentation, and device stacking that significantly impact the power profile of an RF. In the second part of this research we've developed a customizable predictive model that addresses these key limitations. The proposed model is a hybrid of empirical reference data and analytical equations. We use an empirical reference implementation data of a topology under study to capture topology-specific characteristics and analytically model the impact of cross-topology features such as changes in bit-width, entry-count, ports, and common circuit-level design optimizations such as segmentation, gating, device stacking, and sizing. We show how the proposed model can be customized for different RF topologies and other memory structures such as SRAM and ROM using the same model equations. We also demonstrate how the new predictive model, with <10% error, is used in the real world tradeoff analysis in the design of state-of-the-art high performance CPUs and SoCs.