“Latent Variable Identification using Identifiable Matrix Factorization Methods”
Thursday, Sept. 26 at 1:00 pm
Latent variable identification is a unifying problem formulation technique for unsupervised machine learning and big data analytics. Interesting applications include topic modeling, community detection, hyperspectral unmixing, and many more. Identifiability arises as a fundamental issue since it amounts to answering whether the latent structure can truly be learned without the help of labeled data. Among many approaches that have identifiability guarantees, this talk focuses on nonnegative matrix factorization (NMF)-type methods. NMF is widely and suc- cessfully used in many applications, but a theoretical understanding of why it is able to identify latent variables used to be very limited. The take-home point of this talk is that a latent variable can be uniquely identified if it is sufficiently scattered, an assumption inspired by convex geometry, using either plain NMF model or in addition with a “volume” regularization. This principle is demonstrated in the application of hidden Markov model (HMM) identification, which shows that a HMM can be uniquely identified from the pairwise co-occurrence probability of consecutive observations if the emission probability is sufficiently scattered. This is the first method that guarantees identifiability of a HMM from pairwise co-occurrences, which is particularly suitable for applications where the possible outcomes of the observations is relatively large, for example in topic modeling. We show that we can learn topics with higher quality if documents are modeled as observations of HMMs sharing the same emission (topic) probability, compared to the simple but widely used bag-of-words model.
Dr. Kejun Huang is currently an assistant professor in the Department of Computer and Information Science and Engineering at the University of Florida. He received the Ph.D. degree in Electrical Engineering from University of Minnesota in 2016. He was a Postdoctoral Associate at the Department of Electrical and Computer Engineering at the University of Minnesota from 2016 to 2018. His research interests include machine learning, signal processing, optimization, and statistics, with special focus on identifiability analysis and non-convex algorithm design for latent variable models.