HarvardX and MITx offered 16 first-year MOOCs on the edX platform in 2012-2013. The learning data from the students enrolled in those courses, numbering over one million, has been de-identified and released for public re-use.


Image: screenshot from MITx Insights, world map of certificate attainment


A research team at Harvard and MIT prepared the data for public use by removing all personally identifiable information. The dataset is the basis for a set of open-source interactive data visualisation tools with which anyone can explore various aspects of the learners' characteristics. The research team also published a set of working papers containing an analysis of the data by course and in aggregate for each university.


“Learning data from open online courses hold great promise for research, but good research must be replicable by others,” commented Andrew Ho, an associate professor at the Harvard Graduate School of Education and co-chair of the HarvardX Research Committee. “By sharing these de-identified data, we hope to show that we can protect information about individuals while still enabling replicable research about what works in online learning.”


harvard insights

Image: screenshot from HarvardX Insights, age composition


Anyone can now download the dataset, explore the data visualisation tools for HarvardX or MITx, or read the working papers (MITx, HarvardX), all based on the multitude of records collected in just one year from two university's MOOCs. 


Area Of Interest: 

  • elearning_label_higher_education
  • elearning_label_learning_and_society
  • elearning_label_training_and_work