General-purpose Graphics Processing Units (GPGPUs) have become a critical component in high-performance computing (HPC) systems in executing modern computational workloads. The high thread level parallelism (TLP) and programmable shader cores allow thousands of threads to execute in Parallel. The fast-scaling of GPGPUs have increased the demand for performance optimizations on...