Title: Semi-Supervised Learning
Abstract:
Abstract: Automatic methods for collecting data have in many domains
far outstripped the pace of human annotation. In machine learning,
the result has been a growing interest in semi-supervised learning:
learning algorithms that are able to combine labeled and unlabeled
data in a way that usefully leverages a large body of unlabeled data
to improve learning from a small labeled sample. In this talk, I will
survey a number of very different learning algorithms that have been
designed for this task (including Co-Training, Semi-Supervised SVM, and
graph-based methods). I will then describe a new theoretical
framework for semi-supervised learning that can be used to analyze
when unlabeled data can be of help, how much help it can provide, and
to place these algorithms in a common context. I will also discuss
a number of conceptual issues this model raises.
Portions of this talk are joint work with Nina Balcan.