Looking Back at the Future

I saw this fascinating study ‘Quantifying the Advantage of Looking Forward‘ by Tobias Preis, Helen Susannah Moat, et al at nature.com.

The authors looked at Google search records from recent years and looked at the ratio of searches that included the next year compared to searches that included the previous year. So, for example, in 2010, the number of searches that included ‘2011’ relative to the number that included ‘2009’.

The authors broke up the data by country and found that, in wealthy countries, people tend to search more for the coming year than the previous year, while, in less-wealthy countries, people tend to search more for the previous year than coming year.

They presented a graph that shows this quite dramatically:

They call this ratio (the number of searches that include the next year divided by the number that include the previous year) the ‘future orientation index‘.

A value of 1 means that people in that country tended in 2010 to use ‘2011’ in their queries as often as they used ‘2009’. Values greater than 1 mean the searches are biased toward the future and values less than 1 mean the searches are biased toward the past.

The whole study reminds me of the presentation ‘A Healthy Take on Time’ by the sociologist Philip Zimbardo, where he found that people who are future-oriented tend to be happier than those who dwell on the past.

In the Google search study by Preis, the authors say the subject of their study had previously “been considered in an investigation of a large corpus of text from books” and they provide a link to the study I blogged about recently that looked at millions of digitized books.

I am not certain that I agree with Preis’s assertion that the digitized book study was looking at the same thing that Preis et al call the ‘future orientation index’.

In the digitized book study, the authors, Jean-Baptiste Michel and Erez Lieberman Aiden, looked at the frequency of usage of, say, the number ‘1950’ in the various years of the 19th and 20th centuries. Prior to 1950, the number ‘1950’ was not used very often. In 1950, it was used very frequently, and in the years following 1950, the frequency of references to 1950 dropped off over time.

Here is the graph the authors presented:

What is interesting to note is that references to 1950 drop off faster after 1950 than references to 1910 did after 1910. And references to 1910 dropped off faster after 1910 than references to 1883 did after 1883.

In other words, we seem to be ignoring the past faster. The implication being that we are getting more ‘future-oriented’ as time goes on.

But what Michel and Aiden looked at (the frequency of references in millions of books to ‘1950’ in many years surrounding 1950) is not quite the same as what Preis and Moat looked at (the ratio of Google searches for ‘2011’ and ‘2009’ in the year 2010).

So I took Google’s digitized books data files (available at Google Ngram Viewer datasets) and tabulated for each year, the ratio of references to years up to a decade in the future compared to references up to a decade in the past. So, for example, for books published in 1925, I added up the number of times ‘1926’, ‘1927’, ‘1928’ … ‘1935’ were used and then I added up the number of times ‘1924’, ‘1923’, ‘1922’ … ‘1915’ were used.

If we were ‘future-oriented’ in 1925, we should make more references to the coming 10 years than to the past 10. Unfortunately, I couldn’t break up this data by country the way that Preis and Moat did, but Google has split English-language books into those published in the US and those in the UK.

The graph below shows the book-data ‘future orientation index’ for the US and UK:

(click image to enlarge)

There’s a lot about this graph that’s intriguing.

First, it indicates that the British have generally over the past 150 years or so made more references to the coming 10 years than to the previous 10 than Americans have. My personal bias would have told me that Americans were more future-oriented, but I guess not. Reassuringly, though, this is in line with Preis and Moat’s findings from Google searches.

Second, it is interesting to see the how times of war tend to cause more references to the recent past. You can see what looks like the effect of the US Civil War (1861-1865) in the US, and World Wars I and II in the UK quite strongly. Since about 1960 changes in the US and UK indices seem to move in lock-step, perhaps due to more cross-cultural communication.

Finally, the scale of the y-axis surprises me. Preis and Moat found that in even the most future-oriented countries, the ratio of ‘2011’ searches to ‘2009’ searches was only about 1.5. However, in printed books over the past 150 years, the ratio of references to the next 10 years is typically at least 10 times (and sometimes upward of 80 times) more than the number of references to the previous 10 years.

This last point really surprised me and makes me worry about the quality of the data. Michel and Aiden mentioned needing to do a large-scale inspection by hand of the data they looked at, to see what portion might be erroneous and they found that perhaps half of the references were digitization error or other such problems.

I have not been able to do this sort of check, but it seems reasonable to be that many usages of, say, ‘1950’ in texts might not be references to the year 1950. It might be interesting to do this analysis using Google’s data for the phrases ‘year 1950’ and ‘year 1951’, rather than just ‘1950’ and ‘1951’, but I have neither the time nor the hard disk space to do that right now by myself.


