Various natural language processing (NLP) tasks necessitate deep models that are fast, efficient, and small based on their ultimate application at the edge or elsewhere. While significant investigation has furthered the efficiency and reduced the size of these models, reducing their downstream latency without significant trade-offs remains a difficult task....