A method for improving performance/watt of an embedded single-instruction multiple-data (SIMD) architecture using application-guided a priori scheduling of hardware resources is presented. A multi-core architectural simulator is adopted that accurately estimates power, performance, and utilization of various processor components (logic, interconnect and memory). A greedy search is then performed on each algorithm block of a signal processing chain in order to schedule each component's throughput and power. The proposed software-directed hardware rebalancing, applied to a typical electroencephalography (EEG) filtering chain, is analyzed for two different SIMD architectures. The first, representing a super V[subscript th] processor demonstrates a 51%-86% improvement in performance/watt at 1%-10% throughput reduction using block level or algorithm level a priori scheduling. The second architecture used is Synctium, a near V[subscript th] processor which demonstrates 50%-99% performance/watt improvement across the same throughput reduction range and optimization techniques.