Crowdsourced (OpenStreetMap) dataset for land use/land cover mapping

Data or Tool

This dataset can be used by researchers to test different land use/land cover mapping methods using crowdsourced geographic data and Landsat satellite imagery. The study area is the Laguna de Bay area of the Philippines, and the satellite data is from 2014-2015. This dataset will also be available on the "UCI Machine Learning Repository" ( in the near future.

The "training.csv" file contains training data with NDVI (normalized difference vegetation index) values from Landsat time-series images, and this training data was extracted using the crowdsourced geographic data (OpenStreetMap "landuse" and "natural" datasets). The "testing.csv" file contains ground-truth data from 300 random point locations, and should be used for accuracy assessment purposes.

LEGEND (column names)
"max_ndvi": maximum NDVI value derived from the time-series images
"20150720_N" - "20140101": NDVI values from July 20, 2015 (20150720_N) to January 01, 2014 (20140101_N).
"LULC_class": land use/land cover class (i.e. the target variable to classify)

Please see following paper for more information, and please cite the paper if you use the data set:
Johnson, B. A., Iizuka, K. (2016) "Integrating OpenStreetMap crowdsourced data and Landsat time-series imagery for rapid land use/land cover mapping: Case study of the Laguna de Bay area of the Philippines". Applied Geography 67, 140-149.