Download PDFOpen PDF in browserDiscriminative Bayesian Filtering for the SemiSupervised Augmentation of Sequential Observation DataEasyChair Preprint 572314 pages•Date: June 4, 2021AbstractWe aim to construct a probabilistic classifier to predict a latent, timedependent boolean label given an observed vector of measurements. Our training data consists of sequences of observations paired with a label for precisely one of the observations in each sequence. As an initial approach, we learn a baseline supervised classifier by training on the labeled observations alone, ignoring the unlabeled observations in each sequence. We then leverage this first classifier and the sequential structure of our data to build a second training set as follows: (1) we apply the first classifier to each unlabeled observation and then (2) we filter the resulting estimates to incorporate information from the labeled observations and create a much larger training set. We describe a Bayesian filtering framework that can be used to perform step 2 and show how a second classifier built using the latter, filtered training set can outperform the initial classifier. At Adobe, our motivating application entails predicting customer segment membership from readilyavailable proprietary features. We administer surveys to collect label data for our subscribers and then generate feature data for these customers at regular intervals around the survey time. While we can train a supervised classifier using paired feature and label data from the survey time alone, the availability of nearby feature data and the relative expensive of polling drives this semisupervised approach. We perform an ablation study comparing both a baseline classifier and a likelihoodbased augmentation approach to our proposed method and show how our method best improves predictive performance for an inhouse classifier. Keyphrases: data augmentation, discriminative Bayesian filtering, semisupervised learning, user classification from survey data
