Nice Going, Einstein!

 

Google Ngrams, statistics about the frequency of words published in millions of different books over the past couple of hundred years, is such a wonderful data source that I am awed that this data is possible to obtain and freely available, but also a bit disappointed that Google hasn’t done more to make it more accessible.

Google has set up a website, Google Ngram Viewer, where you can type in words and see how frequently that word has appeared in print each year (as a fraction of all words published) over time in different languages and countries. You can even enter multiple words to compare them. But if you want to see the same word in say, US English and German, or if you want to see how many times a word has appeared at least once in a book as a fraction of all books published, you need to download the full dataset and sift through it on your own.

In the graph below I have plotted the percentage of all works published in US English, British English and German that include the word ‘Einstein’ at least once. (Google Ngrams tells in how many works ‘Einstein’ appeared at least once in each year, so I simply divided this number by the number of works in which a period (‘.’) appears at least once.)


(click image to enlarge)

You can see that Einstein was virtually unknown until 1905, his ‘annus mirabilis’, the year in which he published five seminal papers on different subjects: relativity, Brownian motion and the photoelectric effect (for which he would later win the Nobel Prize in Physics).

In the period from 1905 through 1920, he became better known in German texts, but was still relatively unknown in the US and UK until the year he won the Prize.

When the Nazis took power in Germany in 1933, you can clearly see the impact they had on the fraction of works that mention his name. (Similar suppression effects have already been noted in Google’s data for other people in Germany who were disliked by the Nazi regime.) In contrast, the fraction of works that mention his name was undamped in the US and UK.

One striking thing about the US and UK trends is how they diverge after the Second World War. Einstein’s popularity continues to rise in the US, but levels off in the UK. We tend to think of suppression of certain people as something that the Nazis did, but perhaps the British are somewhat guilty of that too, though I suspect for less-sinister reasons than the Nazis. My suspicion is that maybe throughout the 1940s, 50s and 60s, the term ‘Einstein’ gained usage in the US as a synonym for ‘genius’, so it started appearing in American works of fiction in usages like, “Nice going, Einstein!” Perhaps in the UK this non-literal usage of Einstein’s name wasn’t quite as popular, so his name appears in a chronically smaller percentage of works.

Meanwhile back in post-war Germany, the writers who grew up in the 1930s and 40s perhaps realized they had some catching up to do and wrote about him with increasing frequency until about 1980, when an astonishing 1 in 5 books written in German mentioned his name at least once.

With everyone in Germany getting a little sick of hearing his name, the frequency of works mentioning dropped off for the next 20 years and, as works mentioning it grew old, another wave of new works mentioning ‘Einstein’ began to grow again.

It would be interesting to see in 20 or 30 or 40 years if German writing continues to undergo oscillations in the usage of his name. In all three countries, the average frequency seems to be about 1 book in 10 to about 1-in-8. Perhaps German-language books will settle down around this level in another few decades.

We tend to think of World War II as something that ended many decades ago, but in German texts, at least, this one word seems to still be affected by that initial dampening.

The graph below shows the frequency of the use of ‘Einstein’ as a fraction of all words (not works) in German. If we hadn’t taken the time to dig into Google’s data, it might look like the popularity of ‘Einstein’ in German had bounced back by 1960 and then simply leveled off after that.

(click image to enlarge)

Like I said, I find it amazing that Google would do all of the work to digitize millions of books and then just give away the data for free. They are really to be commended for that. But it seems like a few simple enhancements to Google’s Ngram Viewer would make a world of difference. The ability to compare words in different languages, as I have done above, or maybe x-y diagrams. Or the ability to look at the percentage of works that mention a term, or the frequency of times a word or phrase appears in each work – or in the title of each work – so we could see which fraction of these books mention Einstein many, many times (biographies about him) versus those that mention his name only a few times.

These are only a few ideas and I am certain that other people have had the same thoughts about this. These two graphs above tell such different stories – one saying that the popularity of ‘Einstein’ has oscillated wildly and ther that it has not – that I wish there were more-definitive ways of saying which is correct.

Advertisements

  1. An extension of this is investigating different English texts. The Code Book CD-ROM includes a video clip that shows Simon Singh comparing the distribution of letters in a Shakespeare sonnet with an article in the Sun (‘How frequency analysis works’.) The words may be different, but the distribution of letters should be very similar.




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s



%d bloggers like this: