Speaker: Rachel Ward (University of Texas at Austin)
Title: Finding globally optimal k-means clusterings through convex relaxation
We study exact recovery conditions for convex relaxations of point cloud clustering problems, focusing on the most common optimization problem in the unsupervised setting: k-means clustering. We consider the distributional setting where there are k clusters in m-dimensional euclidean space and data from each cluster consists of n points sampled from a symmetric distribution within a ball of unit radius. We ask: what is the minimal cluster separation distance needed for convex relaxations of the k-means problem to exactly recover these clusters as the optimal integral solution? For a linear programming relaxation, we show that a certain minimal separation is needed for the relaxation to be tight, even as the number of points goes to infinity. By contrast, an SDP relaxation of k-means will recover the k clusters with high probability at cluster separation (2k/m)^(1/2) + 1/n^(1/4).
In the same setting, we show that common heuristics such as Lloyd's algorithm (a.k.a. the k-means algorithm) can fail to exactly recover such clusters with exponentially high probability. We conclude by discussing several open problems.
This is joint work with Pranjal Awasthi, Afonso Bandeira, Moses Charikar, and Ravishankar Krishnaswamy (Princeton) and Soledad Villar (UT Austin).
Rachel Ward is an assistant professor of mathematics at the University of Texas at Austin. Dr. Ward received the Ph.D. in Applied and Computational Mathematics from Princeton University in 2009 under the supervision of Ingrid Daubechies. She was an NSF postdoctoral fellow at the Courant Institute, NYU, between 2009 and 2011, before joining UT Austin in the fall of 2011. Dr. Ward received a Sloan research fellowship in 2012 and an NSF CAREER award in 2013. She is also a 2013 recipient of the AFOSR Young Investigator Program (YIP) Award.