Google digitized millions of books and published statistics about them and a tool for looking through those stats, Google Ngram Viewer.
German History the Google Way: (13 April 2016) The frequency of 4-digit numbers between ‘1000’ and ‘2000’ in German-language texts reveals some of the most-important individual years from a German perspective.
Is Google’s OCR getting worse in some ways?: (23 March 2013) In some ways, such as the recognition of ‘beft’ as the word ‘best’, Google’s 2012 statistics are superior to their 2009 edition. But in other ways, like the recognition of chemical formulas, Google’s statistics appear to have gotten worse.
Separated by a Common Language: (16 February 2013) It is easy to find odd words and phrases that Americans use more than Brits, or vice-versa. But this post is about finding common phrases that each uses more than the other.
One More New Time-Out: (19 January 2013) Over time, the percentage of words in a random passage of text that you would recognize if you knew only the, say, 500 most-popular words in US English has declined. That is, our language is getting more complex, not simpler.
Colors According to Google Books: (16 January 2013) An image I made by coloring squares according to how frequently that color’s name was used as an adjective in U.S. English books published between 1990 and 2005.
Our Colorful Language: (22 August 2012) More languages have a word for ‘red’ than for ‘green’. And more have a word for ‘green’ than for ‘pink’. Similarly, ‘red’ is more popular in English than ‘green’ and ‘green’ is more popular than ‘pink’.
Relative usage of the exclamation point: (6 August 2012) German usage of the exclamation point rose dramatically in the years leading up the Second World War, while usage in American and British English texts did not show the same trend.
Learn History the Google Way: (19 July 2012) The frequency of 4-digit numbers between ‘1000’ and ‘2000’ shows the major eras of Second Millennium history and makes for a good list of the most-important individual years.
The Fall of Fiction: (5 June 2012) Google’s data suggests that the number of works of fiction as a percentage of all works published in English was in decline for much of the 20th Century.
Google Ngrams mysteries: word length and decimal frequency in US and UK texts: (14 May 2012) The average length of words in US and UK texts grew for much of the period from 1860-2000, except for about the last 20 years of the 20th century in the US. However, Google’s data also shows that US works use decimal numbers far more frequently than the British. Perhaps these mysteries are due primarily to the quality of Google’s data.
Nice Going, Einstein!: (16 April 2012) The percentage of German-language books that use the word ‘Einstein’ at least once has fluctuated wildly since the 1960s. But the fraction of German-language words that are ‘Einstein’ has stayed constant. Google’s Ngram Viewer doesn’t make it easy to figure out why this difference exists.
ETAION SRHLDC: (9 April 2012) The frequency of the most common letters in English-language books has changed over time. So, the list of most-common letters changes depending on which years the books you look at were published.
Looking Back at the Future: (6 April 2012) A study of the rate at which people in different countries to future-oriented searches relative to past-oriented searches showed a strong correlation with the financial well-being of that country. By looking at Google Ngram data, I find evidence that the ratio for the US and UK holds up in published books, too, although the ratio seems to be much higher than for Google searches.
Do You Know Who I Am?: (22 March 2012) An analysis of millions of published books found that the names of politicians and authors tended to appear more frequently than scientists and mathematicians. But there is a simple explanation for why that might be.