I saw this graph in the article ‘Spain is Beyond Doomed‘ by Matthew O’Brien at The Atlantic. It shows the dramatic upswing in unemployment in Spain during the past few years, especially among those who have been unemployed for 2 or more years.
(click image to enlarge)
As a chemical engineer, though, one thing that struck me about the graph is that it resembles a simple model of the behavior of the flow of liquid through a leaky pipeline. Say that we have a pipeline made of segments representing people who have been unemployed for a certain number of months. Each month, everyone in each segment can find work (but not yet start working) and thus, leak out of the pipeline into a bin called ‘Already Found Work’. One month later, these people will join the ranks of the Employed. Alternatively, if you do not leak out of the pipe, you simply advance to the next step down the line.
Also, each month each Employed person has a chance of becoming unemployed, thereby getting put into the start of the pipeline.
Here’s what the model looks like visually:
If we assume:
(1) that the ‘finding work probability’ is such that 2/3rds of the people who enter the blue, red and green sections (that is, 0-6 months, 6-12 months unemployed, and 1-2 years unemployed) leak out at some before reaching the end of that section,
(2) that the ‘becoming unemployed probability‘ – the chance of an Employed person becoming unemployed each month – is 1%, and
(3) Anyone in the purple (2+ years unemployed) section has a 5% chance per month of finding work,
then the behavior of the model looks like what we see in the graph below:
This scenario also assumes that, 20 months after the start of 2005 the ‘becoming unemployed probability’ doubled over the next 20 months from 1% to 2%. Over the same time period, the ‘finding work probability‘ dropped (for everyone unemployed less than 2 years) from 2/3rds to 1/3rd.
The model does not fit the observed data precisely, but it does not take into account things like emigration, an aging population, dropping out of the labor force and differences in education, experience, age and gender.
If these ‘becoming unemployed’ and ‘finding work’ probabilities, starting right now – essentially May 2013 – return to their previous values (1% and 2/3rds) over the next 20 months, then we can see from the graph that it might still take years for Spanish unemployment to return to anything resembling pre-crash levels. In this scenario, it would take until 2015 for unemployment to drop below 20%.
These probabilities are shown in the graphs below:
(click image to enlarge)
In the graph below, I show the total unemployment rate under this ‘gradual restoration’ scenario and also for the case where conditions do not deteriorate any further from the current parameter values of 2% and 1/3. If the current parameters persist, then unemployment should level out at just under 30%.
(click image to enlarge)
It is sort of a truism that if conditions do not deteriorate any further, then the unemployment situation should not get worse. But this simple model suggests that, without a recovery of the ‘becoming unemployed probability’ and the ‘finding work probability’, the unemployment rate might not go below 20% at any foreseeable time in the future.
The Edge.org has an annual feature where they ask about 100 or more scientists, artists, authors and other certifiably Smart People (plus Terry Gilliam) a big question about the world. Each person responds with about 1000 words or so, so the responses to this year’s question, “What Should We Be Worried About?” runs to more than 110,000 words. (Previous years have asked things like ‘What do you believe is true even though you cannot prove it?‘ and ‘What is the most important invention of the past 2000 years?‘.)
They’ve been doing this since 1998, which makes this collective body of text very tempting for analysis. Below, I took just the responses to this year’s question and picked out every 2-word phrase. I then tossed out phrases with so-called ‘stop words’ in them (like ‘like, of, and, but, is’ and so on) to get a list of the top things this year’s respondents to the Edge.org’s question think are worth mentioning, in order of how often they were mentioned. (Of course, if one person said the same phrase 10 times, then that phrase will rate high on this list. Perhaps it would have been better to calculate the fraction of respondents using each phrase, but that would have taken more effort than I care to put in right now.)
So, here is the list of the 155 top two-word phrases that Smart People are worrying about right now:
I got this letter recently:
“Standing on the deck. The cacophony from the frogs sounds good. A welcoming reminder of spring. Where have these apparently full grown frogs been hanging out all winter? Is it suspended animation like Austin Powers or have they miraculously grown overnight? And why are there so many songs about rainbows?”
Kermit, too, wondered why there are so many songs about rainbows. I’m not certain he’s right.
I downloaded a list of song titles (with artist name and year recorded) from the Million Song Dataset. This file contains 515,576 tracks – mostly from the 1980s through 2011 – and 466,801 after I de-duped it.
So, as far as rainbow songs go, the set has Johnny Mathis’ version of Rainbow Connection, and Me First and the Gimme Gimmes and the The Carpenter’s cover, but not Jim Henson’s original. (Likewise, it has World of Twist’s really good cover version of She’s a Rainbow, but not the Rolling Stones’ original.)
I counted the number of times each word appears, filtering out so-called ‘stop words’ (so “so“, like “like” and “and”. “Too”, too). I also filtered out words that appear in song names that aren’t really part of the name, like ‘remix’ and ‘remastered’ and ‘featuring’). I then sorted those words from the one that appears the most (“love”), second-most (“don’t”), and so forth, stopping at number 100,000 (“jawbone”).
Here’s the result:
Written text often follows the Zipf-Mandelbrot distribution (named after George Zipf (pronounced ‘ziff’) and Benoit Mandelbrot, the mathematician who studied fractals (where fractals are iterated (that is, nested or (some might say) repeated) structures that exhibit ‘self-similarity’)) and song title words seem to be no exception. You can see that “rainbow” is relatively popular (#645), appearing in Elvis Presley’s Pocketful of Rainbows and the more-contemporary Angus and Julia Stone’s Living on a Rainbow. Certainly not out-of-line with thousands of other popular words. (The file only contains song titles, not lyrics, so these counts don’t include songs like John Prine’s excellent In Spite of Ourselves, which mentions ‘rainbow’ in the lyrics but not the title.)
“Rainbows” is on the graph too, though down at position 3330.
But the first 40 or so words on the list are seen more-frequently than George’s calculations would suggest. Here are those outlier words, written in order of popularity, though as if they are song titles. Effectively, they are the Most-Overused Words in Song Titles:
1-3. Love, Don’t Live
4-7. Time One-Up Song
8-11. Man Down & Out Blues
12-13. I’m Life
14-15. Night World
16-18. Day-Go Heart
19-21. Back Little Way
22-27. Come, It’s Good Baby Girl
29-31. Never Know Home
32-33. Away Take
34-36. Black Can’t Blue
37-38. Over Last
39-41. Want More Now
As for where the frogs go, I’m certain you already know the answer to that question.
There’s this path I walk on a regular basis. To make it less slippery in the wintertime, they scatter little rocks around.
Then, one day in the spring they clean the rocks up, exposing the beautiful stone pathway beneath.
I sometimes bring a heavy case full of tools along this path and the rocks make it very difficult to roll the case along the ground and they have damaged the case’s wheels. So typically I carry the case, even though it means stopping every dozen steps or so.
Now that it’s spring and the snow melting, the rocks act as a dam to trap the meltwater, causing portions of the path to become flooded.
Someone who came along before me was kind enough drag their heel through the rocks in different directions, allowing the meltwater to mostly drain away.
I find it ironic that these rocks, which are scattered around to make it easier to walk this path in the wintertime actually make it more difficult in two different ways: by directly hindering the movement of things like my tool case and also by trapping water, which makes it difficult for everyone to walk this path.
I suppose that, when it is above freezing in the daytime but cold enough at night for the water to refreeze, this trapped meltwater might again on occasion become patches of flat slippery ice; thus making the rocks contribute to the very problem they are trying to alleviate.
Google uses Optical Character Recognition (OCR) to turn scanned pages from millions of printed books into digital text.
But OCR is not always accurate. For example, before about the year 1800 it was common to sometimes write the letter ‘s’ in an elongated form that looks a lot like the letter ‘f’.
So, in the image below from John Norris’ 1710 book ‘A Collection of Miscellanies’ you can see the passage: “The first Thing that I observe, is, that ’tis generally agreed upon among them, That this Fruition of God consists in some Operation; and I think with very good Reason.”
But if you look closely you’ll see that the ‘s’ in words like ‘first’ and ‘observe’ looks a lot like an ‘f’, so that the words look like ‘firft’ and ‘obferve’. Just in this short passage we can also see ‘confists’, ‘fome’, ‘Reafon’, ‘Happinefs’, ‘underftand’, ‘beft’, ‘laft’, ‘fo’ and ‘underftood’.
Obviously, it can also be difficult for a computer program to understand that these are s’s, not f’s.
In 2009, when Google released summarized statistics about the occurrence of different words in the books they scanned, there were many instances of words like ‘beft’ and ‘firft’, especially in text written before 1800. However, they updated their analysis in 2012, reclassifying many of those occurrences of ‘beft’ as ‘best’.
(click image to enlarge)
But, for some words, the newer version of Google’s statistics looks like they have gotten worse at identifying the correct word. For example, the capital letter ‘O’ looks a lot like the number ’0′, so in the chemical formulas for compounds that contain oxygen, like ‘H2O’ (h-two-oh), can easily be misread as ‘H20′ (h-twenty).
For cases like ‘H2S04′ and ‘Fe203′ (which contain zeroes instead of the letter ‘O’) , it is highly likely that these words are simply misreadings of ‘H2SO4′ (sulfuric acid) and ‘Fe2O3′ (iron oxide).
I have been able to find many more examples like this using the names of chemical compounds that contain oxygen. In each of these cases, Google’s character recognition for the 2009 statistics seems better than their 2012 values.
I am interested in words that were used frequently in books published in the US and the UK in the year 2000.
It turns out that since 1980, Americans and Brits have used nouns at nearly the same rate. And, Brits use adjectives more than Americans. So, any differences in usage of the nouns listed below is not due to Americans or Brits generally using them more than the other. And any adjectives that Americans use more is not because of Americans’ overall usage of adjectives.
Here are some things I found:
3. Americans write more about diseases and body parts than Brits: diabetes, thyroid, chin, chest, breathing, insulin, medication, fingers, cardiovascular, abdominal, waist, knees, dental, healing, seizures, finger, breast, breasts, throat, elbow, forehead, hormone, mouth, arthritis, kidney, asthma, lung, lungs, belly, hearts, fatty, hypertension, jaw, vomiting, pain, trauma, anatomy, abortion, sick, diagnosis, glucose, nasal, pregnancy, artery, ultrasound and more.
4. Americans write first names more than Brits: Jack, James, Joshua, Daniel (Dan), Harold, Samuel, Joseph (Joe), Matthew (Matt), Mark, Tim, Robert, Luke, John, George, Chloe, Emily, Charlotte, Jessica, Lauren, Laura, Sarah, Sophie, Olivia, Hannah, Amy, Ann, Emma, Rachel, Mary, Rebecca. These are actually the most popular British names, and yet Americans still use them more often. Think about it – Americans actually use Charles, Elizabeth, William, Harry and Kate more than Brits do!
Thus, combining this point with point #3 above, if you are trying to write a cowboy story and want the main character to sound as American as possible, call him “Bleeding Feet Billy“. It’s mathematically proven to be nearly the most American name you can imagine.
6. Americans write more about beverages than Brits: water, beer, wine, juice, liquor, rum, vodka, coffee, cola, and milk. Only for tea and toddy do the Brits compare. (Even, then, we still beat them at ‘tea’.)
What do Brits write about more than Americans?
1. The British write more than Americans about royalty: king, queen (almost), prince (but not princess), duke, duchess (barely), lord, earl, royal, castle, crown, emperor, and empress. (Interestingly, Americans write ‘royalty‘ itself more than Brits, though probably in the financial sense.)
2. The British write more than Americans the last names of people who were never President of the United States: Unwin, Darwin, Bentham, Thatcher, Ruskin, Wittgenstein, Brecht, Wordsworth, Levinas, Nero, Foucault, Lenin, Claudius, Kant, Coleridge, Chomsky, Hume, Khan, Boyle, Machiavelli, Woolf, Livingstone, Franco, Caesar, Derrida, Chamberlain, Hegel, Marx, Mussolini, Goethe, Yeats, Dickens, Chaucer, Freud, Hardy, Hobbes, Rousseau, Shakespeare and Hitler.
So, basically, the Brits write more about themselves, their royalty and their neighbors more than Americans, who, in turn, write more about their name, their rectum and which day of the week it is than Brits.