The Fall of Fiction

I have been looking a little more into this trend I observed in a previous post (see ‘Google Ngrams mysteries: word length and decimal frequency in US and UK texts‘) that there has been a general increase in the average length of words used in both US and UK English published works over the 20th Century. That is, thanks to the Google Books project, which has scanned in the text of millions of books, we have some hard evidence that authors are using longer words, on average.

A commenter suggested: “I’m guessing that it has something to do with trends in books for children versus books for adults – that could explain short vs. long words”

I have not yet thought of a way to investigate that hypothesis, but Google’s data does make it easy to find out the fraction of works they scanned that are fiction. A data summary is available, by year, of the number of books that Google scanned overall and the number of scanned books that they classified as fiction. This summary includes the number of books Google scanned from each category, so I simply divided the two numbers to find the percentage of books published in English that are fiction.

(click image to enlarge)

In the graph above we can see the general decline in the percentage of published English-language books that are fiction, from about 30% for most the 19th Century down to about 20% in the latter half of the 1900s.

First this makes me wonder why, if only a maximum of about 30% of works were fiction, then why does fiction get its own label, while everything else (which makes up the bulk of published works) is simply labeled ‘non-fiction’? Wouldn’t it be better to call ‘non-fiction’ something like “factuals” and everything else “non-factuals”?

I don’t want to take the time to try to attribute historical events to each dip and peak, but it is interesting to see a small shift downward in the proportion of works that are fiction during World War II. Also, there is a strange dip in fiction from about 1960 through the mid-1970s, which runs contrary to everything I have heard about that era.

I have included another copy of the average word length graph below:

(click image to enlarge)

It is also interesting to see that the average word length began to rise right around the year 1900, just as the percentage of works that are “non-factuals” started to drop.

So, perhaps ‘The Fall of Fiction’ and rise in ratio of works that are fact-based is a piece of this puzzle. Unfortunately, Google’s data does not factoring in of the popularity of each work. Each scanned work, to the best of my knowledge, is considered only once, even if it has sold millions of copies. Also, self-help books and diet books and all sorts of pseudoscience books are often classified as ‘non-fiction’, so that muddles things a bit too.

Update: Page 49 of Shapiro and Varian’s Information Rules has this quote: “more than 70 percent of public library circulation is fiction,a figure that has remained constant for 200 years or more.” So, though only about 20% of works are fiction, they account for more than 70% of library circulation.


