Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, Aude Oliva
Massachusetts Institute of Technology

The dataset is designed following principles of human visual cognition. Our goal is to build a core of visual knowledge that can be used to train artificial systems for high-level visual understanding tasks, such as scene context, object recognition, action and event prediction, and theory-of-mind inference. The semantic categories of are defined by their function: the labels represent the entry-level of an environment. To illustrate, the dataset has different categories of bedrooms, or streets, etc, as one does not act the same way, and does not make the same predictions of what can happen next, in a home bedroom, an hotel bedroom or a nursery.

In total, contains more than 10 million images comprising 400+ unique scene categories. The dataset features 5000 to 30,000 training images per class, consistent with real-world frequencies of occurrence. Using convolutional neural networks (CNN), dataset allows learning of deep scene features for various scene recognition tasks, with the goal to establish new state-of-the-art performances on scene-centric benchmarks. Here we provide the Database and the trained CNNs for academic research and education purposes.

News (Sep 26, 2016): The results of the Places2 Challenge 2016 are released

News (May 16, 2016): The data of Places365 and the pre-trained CNNs are now available for download

News (May 12, 2016): We will have Places Challenge 2016 to be held jointly with the ILSVRC and COCO workshop at ECCV 2016.


Download our paper

Please cite the following paper if you use this service:

B. Zhou, A. Khosla, A. Lapedriza, A. Torralba and A. Oliva
Arxiv, 2016 (pdf coming soon)

Scene Recognition API

Usage: http://places2.csail.mit.edu/cgi-bin/image.py?url=IMG_URL

Example: http://places2.csail.mit.edu/cgi-bin/image.py?url=http://places2.csail.mit.edu/imgs/1.jpg

Notice: Please do not overload our server by querying repeatedly in a short period of time. This is a free service for academic research and education purposes only. It has no guarantee of any kind.


Acknowledgements

Places dataset development has been partly supported by the National Science Foundation CISE directorate (#1016862), the McGovern Institute Neurotechnology Program (MINT), ONR MURI N000141010933, MIT Big Data Initiative at CSAIL, and Google, Xerox, Amazon and NVIDIA. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation and other funding agencies.