Large-Scale Unsupervised Object Discovery

Huy V. Vo
INRIA, Valeo.ai, ENS

Elena Sizikova
NYU

Cordelia Schmid
INRIA

Patrick Pérez
Valeo.ai

Jean Ponce
INRIA, NYU

Code [Pytorch, Matlab]

Abstract

Existing approaches to unsupervised object discovery (UOD) do not scale up to large datasets without approximations that compromise their performance. We propose a novel formulation of UOD as a ranking problem, amenable to the arsenal of distributed methods available for eigenvalue problems and link analysis. Through the use of self-supervised features, we also demonstrate the first effective fully unsupervised pipeline for UOD. Extensive experiments on COCO [42] and OpenImages [35] show that, in the single-object discovery setting where a single prominent object is sought in each image, the proposed LOD (Large-scale Object Discovery) approach is on par with, or better than the state of the art for mediumscale datasets (up to 120K images), and over 37% better than the only other algorithms capable of scaling up to 1.7M images. In the multi-object discovery setting where multiple objects are sought in each image, the proposed LOD is over 14% better in average precision (AP) than all other methods for datasets ranging from 20K to 1.7M images. Using self-supervised features, we also show that the proposed method obtains state-of-the-art UOD performance on OpenImages. .

BibTex

@inproceedings{Vo21LOD,
  title     = {Large-Scale Unsupervised Object Discovery},
  author    = {Vo, Huy V. and Sizikova, Elena and Schmid, 
               Cordelia and P{\'e}rez, Patrick and Ponce, Jean},
  booktitle = {Advances in Neural Information Processing Systems 35 ({NeurIPS})},
  year      = {2021},
}

Acknowledgments

This work was supported in part by the Inria/NYU collaboration, the Louis Vuitton/ENS chair on artificial intelligence and the French government under management of Agence Nationale de la Recherche as part of the “Investissements d’avenir” program, reference ANR19-P3IA-0001 (PRAIRIE 3IA Institute). Elena Sizikova was supported by the Moore-Sloan Data Science Environment initiative (funded by the Alfred P. Sloan Foundation and the Gordon and Betty Moore Foundation) through the NYU Center for Data Science. Huy V. Vo was supported in part by a Valeo/Prairie CIFRE PhD Fellowship. We thank Spyros Gidaris for providing the VGG16-based OBoW model. Finally, we thank anonymous reviewers for their helpful suggestions and feedback for the paper.