Here are collected some sources of big data on various topics.

population-header

Natality (Google BigQuery): 137 million records for all births registered in the United States and the District of Columbia from 1969 to 2008.

Global Burden of Metabolic Risk Factors (Imperial College London): Body Mass Index (BMI), blood pressure, serum cholesterol, fasting plasma glucose, and diabetes. This data is used in Dr. Han Rosling’s Gapminder program.

The U.N.’s World Population Prospects: Hans Rosling calls this the “best source for population data”. Also, World Urbanization Prospects 2014

HealthData.gov

U.N. World Population Prospects, 2012 Revision

Popular U.S. Baby Names

urban-header

NYC Open Data

London Datastore London’s city dashboard and public data sets

NYPD Crash Data Band-Aid: The latest New York City Police Department (NYPD) traffic crash data, as a geocoded CSV.

Real-time Transit Data

Hubway Data Visualization Challenge: Boston’s bike sharing system, Hubway, public release of five years of user data.

writing-header

Google Ngrams (from Google Books): Every one- to five-word phrase that has appeared in millions of books that Google has scanned, including number of times each phrase appeared each year.

Shakespeare (from Google BigQuery): 164,000 records including each word that William Shakespeare ever wrote, which work it appeared in, and how many times.

Harvard Library Open Metadata

Open Library

Million Song Dataset

economics-header

US Census – Business & Industry Statistics

MIT’s Observatory of Economic Complexity

Eurostat

World Top Incomes Database : This is the source of the data used by Thomas Pikkety in his book Capital in the Twenty-First Century

 

World Bank
World Bank DataBank

World Bank Commodity Markets (pink sheet) Monthly prices worldwide for various commodities: oil, coal, natural gas, rice, wheat, sugar, copper, gold, silver, steel and more.

Commodity Price Indices

environment-header

MATCH: Metadata Access Tool for Climate and Health

Carbon Dioxide Information Analysis Center

NASA: Climate Change Key Indicators

Data.gov: Climate

NASA Earth Exchange Global Daily Downscaled Projections (NEX-GDDP)

Berkeley Earth

US Drought Monitor

Ecological Data Wiki

EM-DAT: International Disaster Database

NOAA Digital Coast

unsorted-header

OpenStates

US IRS Reports of Exempt Organizations

Povcalnet: poverty analysis tool

Statistical Abstract of the United States : Discontinued as a single publication, but historical versions are archived and there are links to sources.

AggData

Our Airports

Full Fact Finder

Statistics Norway

UCI Machine Learning Datasets YouTube, gas collection drift, climate modeling crashes, wine quality and other sets

Air Transportation Multiplex

U.S. Federal Government Administrative Datasets

Knoema

Global Terrorism Database

ProPublica

Visualizing.org

Nationmaster

Social Progress Imperative

WomanStats (free registration required)

OECD iLibrary

Wikipedia pageview data

Northern Sea Route (NSR) transit statistics

Bloomberg Visual Data

resourcesNorwegian Petroleum Directorate – FactPages

~~~

If you have suggestions for other big data sources, please let me know in the Comments field below.




    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s



%d bloggers like this: