Bolei Zhou, Agata Lapedriza, Aditya Khosla, Antonio Torralba, Aude Oliva
Massachusetts Institute of Technology

The dataset is designed following principles of human visual cognition. Our goal is to build a core of visual knowledge that can be used to train artificial systems for high-level visual understanding tasks, such as scene context, object recognition, action and event prediction, and theory-of-mind inference. The semantic categories of are defined by their function: the labels represent the entry-level of an environment. To illustrate, the dataset has different categories of bedrooms, or streets, etc, as one does not act the same way, and does not make the same predictions of what can happen next, in a home bedroom, an hotel bedroom or a nursery.

In total, contains more than 10 million images comprising 400+ unique scene categories. The dataset features 5000 to 30,000 training images per class, consistent with real-world frequencies of occurrence. Using convolutional neural networks (CNN), dataset allows learning of deep scene features for various scene recognition tasks, with the goal to establish new state-of-the-art performances on scene-centric benchmarks. Here we provide the Database and the trained CNNs for academic research and education purposes.

News (Sept 4, 2017): PlacesCNN demo is upgraded with scene categories, attributes, class activation map predicted in a single pass! Source code of the prediction model in PyTorch is also released.

News (June 21, 2017): Places Challenge 2017 is online!

Download our paper

Please cite the following paper if you use our data or CNNs:

B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017

CNNs and demo

Download the Places365-CNNs: CNN models such as AlexNet, VGG, GoogLeNet, ResNet trained on Places.

Scene recognition demo: Upload images (either from web or mobile phone) to recognize the scene categories.


Places dataset development has been partly supported by the National Science Foundation CISE directorate (#1016862), the McGovern Institute Neurotechnology Program (MINT), ONR MURI N000141010933, MIT Big Data Initiative at CSAIL, and Google, Xerox, Amazon and NVIDIA. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation and other funding agencies.