Bootstrapping for Batch Active Sampling

Heinrich Jiang,Maya R. Gupta

The goal of active learning is to select the best examples from an unlabeled pool of data to label to improve a model trained with the addition of these labeled examples. We discuss a real-world use case for batch active sampling that works at larger scales. The standard margin algorithm has repeatedly been shown difficult to beat in practice for the classic active sampling set-up, but for larger batches and candidate pools, we show that margin sampling may not provide enough diversity. We present a simple variant of margin sampling for the batch setting that scores candidate samples by their minimum margin to a set of bootstrapped margins, and explain how this proposal increases diversity in a supervised and efficient way, and why it differs from the usual ensemble methods for active sampling. Experiments on benchmark datasets show that the proposed min-margin sampling consistently works better than margin as the batch size grows, and better than the five other diversity-encouraging active sampling methods we tested. Two real-world case studies illustrate the practical value, and help highlight challenges of applying and deploying batch active sampling.