Here are collected some sources of big data on various topics.


Natality (Google BigQuery): 137 million records for all births registered in the United States and the District of Columbia from 1969 to 2008.

Global Burden of Metabolic Risk Factors (Imperial College London): Body Mass Index (BMI), blood pressure, serum cholesterol, fasting plasma glucose, and diabetes. This data is used in Dr. Han Rosling’s Gapminder program.

The U.N.’s World Population Prospects: Hans Rosling calls this the “best source for population data”. Also, World Urbanization Prospects 2014

U.N. World Population Prospects, 2012 Revision

Popular U.S. Baby Names


NYC Open Data

London Datastore London’s city dashboard and public data sets

NYPD Crash Data Band-Aid: The latest New York City Police Department (NYPD) traffic crash data, as a geocoded CSV.

Real-time Transit Data

Hubway Data Visualization Challenge: Boston’s bike sharing system, Hubway, public release of five years of user data.


Google Ngrams (from Google Books): Every one- to five-word phrase that has appeared in millions of books that Google has scanned, including number of times each phrase appeared each year.

Shakespeare (from Google BigQuery): 164,000 records including each word that William Shakespeare ever wrote, which work it appeared in, and how many times.

Harvard Library Open Metadata

Open Library

Million Song Dataset


US Census – Business & Industry Statistics

MIT’s Observatory of Economic Complexity


World Top Incomes Database : This is the source of the data used by Thomas Pikkety in his book Capital in the Twenty-First Century


World Bank
World Bank DataBank

World Bank Commodity Markets (pink sheet) Monthly prices worldwide for various commodities: oil, coal, natural gas, rice, wheat, sugar, copper, gold, silver, steel and more.

Commodity Price Indices


MATCH: Metadata Access Tool for Climate and Health

Carbon Dioxide Information Analysis Center

NASA: Climate Change Key Indicators Climate

NASA Earth Exchange Global Daily Downscaled Projections (NEX-GDDP)

Berkeley Earth

US Drought Monitor

Ecological Data Wiki

EM-DAT: International Disaster Database

NOAA Digital Coast



US IRS Reports of Exempt Organizations

Povcalnet: poverty analysis tool

Statistical Abstract of the United States : Discontinued as a single publication, but historical versions are archived and there are links to sources.


Our Airports

Full Fact Finder

Statistics Norway

UCI Machine Learning Datasets YouTube, gas collection drift, climate modeling crashes, wine quality and other sets

Air Transportation Multiplex

U.S. Federal Government Administrative Datasets


Global Terrorism Database



Social Progress Imperative

WomanStats (free registration required)

OECD iLibrary

Wikipedia pageview data

Northern Sea Route (NSR) transit statistics

Bloomberg Visual Data

resourcesNorwegian Petroleum Directorate – FactPages


If you have suggestions for other big data sources, please let me know in the Comments field below.


    Leave a Reply

    Fill in your details below or click an icon to log in: Logo

    You are commenting using your account. Log Out /  Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out /  Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out /  Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out /  Change )


    Connecting to %s

%d bloggers like this: