Atlas Wang – Center for Signal and Information Processing

Title: Attaining Sparsity in Large Language Models: Is It Easy or Hard?

Time: Friday, September 8, 2:30 PM
Location: CSIP library (room 5126), 5th floor, Centergy one building

BIO: Atlas Wang teaches and researches at UT Austin ECE (primary), CS, and Oden CSEM. He usually declares his research interest as machine learning, but is never too sure what that means concretely. He has won some awards, but is mainly proud of three things: (1) he has done some (hopefully) thought-invoking and practically meaningful work on sparsity, from inverse problems to deep learning; his recent favorites include essential sparsity (https://arxiv.org/abs/2306.03805), heavy-hitter oracle (https://arxiv.org/abs//2306.14048), and sparsity-may-cry (https://openreview.net/forum?id=J6F3lLg4Kdp); (2) he co-founded the Conference on Parsimony and Learning (CPAL), known as the new ” conference for sparsity” to its community, and serves as its inaugural program chair (https://cpal.cc/); (3) he is fortunate enough to work with a sizable group of world-class students, all smarter than himself. He has so far graduated 13 Ph.D. students and postdocs that are well placed, including three (assistant) professors; and his students have altogether won seven prestigious PhD fellowships (NSF GRFP, IBM, Apple, Adobe, Amazon, Qualcomm, and Snap), among many other honors.

ABSTRACT: In the realm of contemporary deep learning, large pre-trained transformers have seized the spotlight. Understanding the underlying frugal structures within these burgeoning models has become imperative. Although the tools of sparsity, like pruning, the lottery ticket hypothesis, and sparse training, have enjoyed popularity and success in traditional deep networks, their efficacy in the new era of colossal pre-trained models, such as Large Language Models (LLMs), remains uncertain. This presentation aims to elucidate two seemingly contradictory perspectives. On one hand, we explore the notion that compressing LLMs is “easier” compared to earlier deep models; but on the other hand, we delve into the aspects that make this endeavor “harder” in its own unique way. My goal is to convince you that I am indeed not contradicting myself.