Learn History the Google Way

Google scanned the text of millions of books and made statistics about them freely available. You can see how frequently a word or phrase has appeared in print simply by doing a search at the Google Ngrams site.

However, I am interested in how popular the references to different dates have been. I would like to know which years in the Second Millennium were the most important.

So, I scanned through the data files for the year 2000 and recorded the number of times that the 4-digit sequence ‘1000’ occurred, ‘1001’, ‘1002’, and so forth, all the way up to ‘2000’.

You can see the result in the graph below.

(click image to enlarge)

Of course, not every usage of ‘1500’ refers to the year. So, it would be better to look for the phrase ‘year 1492’ and not just ‘1492’. However, Google’s records for 2-word phrases cover 100 different data files and I simply didn’t think it was worth it to download them all. Instead, I have simply ignored all numbers evenly divisible by 10.

Some 4-digit sequences cause spikes for reasons other than the year. For example, we see a large spike for 1099 probably because it is a US tax form. Similar spikes are seen for 1011 (a binary number), 1024 (which is the number of bytes in a kilobyte), 1111, 1234 and some others. I have noted some of these in gray.

There are a few interesting things I see in this graph. One is that it seems that you can clearly see the different periods of Second Millennium history. There seems to be a step up around the year 1200, followed by a slow drop for the next 250 years or so. Personally, I see 5 distinct phases:

  • Early (before 1200)
  • Medieval (1201 – 1468)
  • Renaissance (1469 – 1737)
  • Enlightenment (1738 – 1920)
  • Modern (after 1920)

The years for the transitions are subject to a little debate, but are not exactly the ones that most historians would assign to these periods. For example, the Enlightenment is typically considered to have started in the 1630s and 1640s, but the graph doesn’t show a transition until the 1730s. Similarly, it supposedly ended in the early 1800s, but I see no sharp discontinuity until 1920.

I have dated the start of the Renaissance to the death of Gutenberg.

Another important thing I see in this graph is that you can see which individual 4-digit numbers experienced the greatest jump in references compared the previous 4-digit number. For example, ‘1492’ saw far more references than ‘1491’.

I have marked the top 20 numbers according to the degree of step-up in reference volume. I guess if aliens landed on Earth and demanded that you tell them the 20 most-important years in the last 1000, the ones marked on the graph above would be a good-enough response.

You can quibble about whether the metric I used is a good one or not, but take a few minutes, if you have the time, and see if you can guess the major event that caused each of the red years to make the cut. (Keep in mind that the books are mostly American and British).

Here are my responses, based on prior knowledge and some Googling:

1066 – Battle of Hastings
1095 – First Crusade starts
1215 Magna Carta
1325 – who knows?
1377 – coronation of Richard II
1415 – Battle of Agincourt
1453 – fall of Constantinople
1483 – start of the Spanish Inquisition
1485 – death of Richard III, coronation of Henry VII
1492 – Columbus
1545 – Council of Trent
1603 – coronation of James I
1648 – Peace of Westphalia (end of the Thirty Years’ War)
1688 – The Glorious Revolution
1776 – US Declaration of Independence, Wealth of Nations published
1787 – US Constitution ratified
1789 – Washington elected first US president, French Revolution
1812 – War of 1812
1848 – European revolutions, Communist Manifesto published
1914 – start of World War I
1945 – end of World War II

I am a little displayed to see nothing here related to science or math. The number with the largest jump up was 1492. Surprisingly to me, the second largest was 1688, the Glorious Revolution (the overthrow of King James II).

In case the 1325 date (or some other) was a fluke, just missing the cut for the top 20 numbers was 1189 – the start of the Third Crusade. (Maybe the fact that no date from the 1100s was in the top 20 is why there are not too many students of 12th Century History.)

The US Civil War appears as a large peak from 1861-1865. However, because this was a multi-year phenomenon, no individual year represents a significant spike over the previous year.

It might be interesting to repeat this analysis for German, Russian, French, etc. and see which other dates are important.


  1. All about politics and war, huh? Maybe that’s because children are being asked to write about them in school?

    • All of the numbers were taken from books written in the year 2000, so yes, if children were being taught that history is war and coronations in the 1950s, 60s and 70s, then when they were writing books in 2000, they likely wrote about the dates they learned were important as children.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: