Here are collected some sources of big data on various topics.
Natality (Google BigQuery): 137 million records for all births registered in the United States and the District of Columbia from 1969 to 2008.
Global Burden of Metabolic Risk Factors (Imperial College London): Body Mass Index (BMI), blood pressure, serum cholesterol, fasting plasma glucose, and diabetes. This data is used in Dr. Han Rosling’s Gapminder program.
The U.N.’s World Population Prospects: Hans Rosling calls this the “best source for population data”. Also, World Urbanization Prospects 2014
U.N. World Population Prospects, 2012 Revision
Popular U.S. Baby Names
NYC Open Data
London Datastore London’s city dashboard and public data sets
NYPD Crash Data Band-Aid: The latest New York City Police Department (NYPD) traffic crash data, as a geocoded CSV.
Real-time Transit Data
Hubway Data Visualization Challenge: Boston’s bike sharing system, Hubway, public release of five years of user data.
Google Ngrams (from Google Books): Every one- to five-word phrase that has appeared in millions of books that Google has scanned, including number of times each phrase appeared each year.
Shakespeare (from Google BigQuery): 164,000 records including each word that William Shakespeare ever wrote, which work it appeared in, and how many times.
Harvard Library Open Metadata
Million Song Dataset
US Census – Business & Industry Statistics
MIT’s Observatory of Economic Complexity
World Top Incomes Database : This is the source of the data used by Thomas Pikkety in his book Capital in the Twenty-First Century
World Bank DataBank
World Bank Commodity Markets (pink sheet) Monthly prices worldwide for various commodities: oil, coal, natural gas, rice, wheat, sugar, copper, gold, silver, steel and more.
Commodity Price Indices
MATCH: Metadata Access Tool for Climate and Health
Carbon Dioxide Information Analysis Center
NASA: Climate Change Key Indicators
NASA Earth Exchange Global Daily Downscaled Projections (NEX-GDDP)
US Drought Monitor
Ecological Data Wiki
EM-DAT: International Disaster Database
NOAA Digital Coast
US IRS Reports of Exempt Organizations
Povcalnet: poverty analysis tool
Statistical Abstract of the United States : Discontinued as a single publication, but historical versions are archived and there are links to sources.
Full Fact Finder
UCI Machine Learning Datasets YouTube, gas collection drift, climate modeling crashes, wine quality and other sets
Air Transportation Multiplex
U.S. Federal Government Administrative Datasets
Global Terrorism Database
Social Progress Imperative
WomanStats (free registration required)
Wikipedia pageview data
Northern Sea Route (NSR) transit statistics
Bloomberg Visual Data
Norwegian Petroleum Directorate – FactPages
If you have suggestions for other big data sources, please let me know in the Comments field below.