Weakly-Supervised Temporal Localization via Occurrence Count Learning


We propose a novel model for temporal detection and localization which allows the training of deep neural networks using only counts of event occur rences as training labels. This powerful weakly- supervised framework alleviates the burden of the imprecise and time-consuming process of annotating event locations in temporal data. Unlike existing methods, in which localization is explic itly achieved by design, our model learns local ization implicitly as a byproduct of learning to count instances. This unique feature is a direct consequence of the model’s theoretical properties. We validate the effectiveness of our approach in a number of experiments (drum hit and piano onset detection in audio, digit detection in images) and demonstrate performance comparable to that of fully-supervised state-of-the-art methods, despite much weaker training requirements.



alternate textFigure 1. Out-of-Sample Piano Onset Detection. The predictions are well centered despite the model having no localization prior for training (only occurrence counts).

alternate textFigure 2. Computation of the Estimated Count Distibution

More information coming soon!

ICML 2019