I saw this graph in the article ‘Spain is Beyond Doomed‘ by Matthew O’Brien at The Atlantic. It shows the dramatic upswing in unemployment in Spain during the past few years, especially among those who have been unemployed for 2 or more years.

SpainUnemployment1-thumb-570x410-119984

(click image to enlarge)

As a chemical engineer, though, one thing that struck me about the graph is that it resembles a simple model of the behavior of the flow of liquid through a leaky pipeline. Say that we have a pipeline made of segments representing people who have been unemployed for a certain number of months. Each month, everyone in each segment can find work (but not yet start working) and thus, leak out of the pipeline into a bin called ‘Already Found Work’. One month later, these people will join the ranks of the Employed. Alternatively, if you do not leak out of the pipe, you simply advance to the next step down the line.

Also, each month each Employed person has a chance of becoming unemployed, thereby getting put into the start of the pipeline.

Here’s what the model looks like visually:

leaky-pipeline(click image to enlarge)

If we assume:

(1) that the ‘finding work probability’ is such that 2/3rds of the people who enter the blue, red and green sections (that is, 0-6 months, 6-12 months unemployed, and 1-2 years unemployed) leak out at some before reaching the end of that section,

(2) that the ‘becoming unemployed probability‘ – the chance of an Employed person becoming unemployed each month – is 1%, and

(3) Anyone in the purple (2+ years unemployed) section has a 5% chance per month of finding work,

then the behavior of the model looks like what we see in the graph below:

simple-model-of-spanish-unemployment(click image to enlarge)

This scenario also assumes that, 20 months after the start of 2005 the ‘becoming unemployed probability’ doubled over the next 20 months from 1% to 2%. Over the same time period, the ‘finding work probability‘ dropped (for everyone unemployed less than 2 years) from 2/3rds to 1/3rd.

The model does not fit the observed data precisely, but it does not take into account things like emigration, an aging population, dropping out of the labor force and differences in education, experience, age and gender.

If these ‘becoming unemployed’ and ‘finding work’ probabilities, starting right now – essentially May 2013 – return to their previous values (1% and 2/3rds) over the next 20 months, then we can see from the graph that it might still take years for Spanish unemployment to return to anything resembling pre-crash levels. In this scenario, it would take until 2015 for unemployment to drop below 20%.

These probabilities are shown in the graphs below:

find-work-become-unemployed-probabilities

(click image to enlarge)

In the graph below, I show the total unemployment rate under this ‘gradual restoration’ scenario and also for the case where conditions do not deteriorate any further from the current parameter values of 2% and 1/3. If the current parameters persist, then unemployment should level out at just under 30%.

leaky-pipeline-unemployment-spain

(click image to enlarge)

It is sort of a truism that if conditions do not deteriorate any further, then the unemployment situation should not get worse. But this simple model suggests that, without a recovery of the ‘becoming unemployed probability’ and the ‘finding work probability’, the unemployment rate might not go below 20% at any foreseeable time in the future.

The Edge.org has an annual feature where they ask about 100 or more scientists, artists, authors and other certifiably Smart People (plus Terry Gilliam) a big question about the world. Each person responds with about 1000 words or so, so the responses to this year’s question, “What Should We Be Worried About?” runs to more than 110,000 words. (Previous years have asked things like ‘What do you believe is true even though you cannot prove it?‘ and ‘What is the most important invention of the past 2000 years?‘.)

They’ve been doing this since 1998, which makes this collective body of text very tempting for analysis. Below, I took just the responses to this year’s question and picked out every 2-word phrase. I then tossed out phrases with so-called ‘stop words’ in them (like ‘like, of, and, but, is’ and so on) to get a list of the top things this year’s respondents to the Edge.org’s question think are worth mentioning, in order of how often they were mentioned. (Of course, if one person said the same phrase 10 times, then that phrase will rate high on this list. Perhaps it would have been better to calculate the fraction of respondents using each phrase, but that would have taken more effort than I care to put in right now.)

So, here is the list of the 155 top two-word phrases that Smart People are worrying about right now:

quantum mechanics
climate change
economic growth
neural data
global cooperation
black hole
good life
social media
natural selection
human nature
long term
human beings
mental disorders
mate value
standard model
complex systems
synthetic biology
sex differences
hubristic pride
video games
united states
population growth
two cultures
20th century
young people
new technologies
fourth culture
solar system
higgs boson
specific heat
data privacy
data streams
general public
real world
world war
science fiction
search engines
conscious experience
21st century
side effects
genomic instability
years old
living standards
life science
breast cancer
lamplight probabilities
liberal democracy
medical care
bad news
fire departments
scientific method
new ideas
privacy rights
fertility rates
human behavior
social sciences
local cooperation
educated people
less educated
next generation
higher education
human genome
screw driver
human population
human rights
financial crisis
subjective experience
quantum information
human brain
natural world
unmarried men
older people
nuclear war
health care
brain regions
young men
billion years
risk factors
global population
cell phones
brain development
social networks
scientific research
science-media complex
global warming
lung cancer
modern technology
dark age
time scales
steven pinker
heart disease
silicon valley
limited resources
public policy
industrial revolution
decision makers
social reality
taboo words
urn model
public sphere
birth rates
world population
regulatory capture
psychoactive substances
public health
soviet union
human life
larger families
arable land
high school
todays world
biometric data
modern states
new information
digital technologies
mutated cell
bad guys
evolutionary psychology
material progress
status quo
boundary conditions
environmental factors
information technology
phase space
artificial intelligence
theoretical physics
jehovahs witnesses
technological innovation
general relativity
population size
business school
frontal cortex
annual cost
bell curve
thousand years
homo sapiens
conscious worries
social scientists
scientific discovery
cultural practices
developed world
mass destruction
big data
western world
error catastrophe
unknown unknowns
collective delusions
modern society
making decisions
public debate
conscious experiences
global culture
losing touch
brain activity
water resources

I got this letter recently:

“Standing on the deck. The cacophony from the frogs sounds good. A welcoming reminder of spring. Where have these apparently full grown frogs been hanging out all winter?  Is it suspended animation like Austin Powers or have they miraculously grown overnight? And why are there so many songs about rainbows?”

Kermit, too, wondered why there are so many songs about rainbows. I’m not certain he’s right.

I downloaded a list of song titles (with artist name and year recorded) from the Million Song Dataset. This file contains 515,576 tracks – mostly from the 1980s through 2011 – and 466,801 after I de-duped it.

So, as far as rainbow songs go, the set has Johnny Mathis’ version of Rainbow Connection, and Me First and the Gimme Gimmes and the The Carpenter’s cover, but not Jim Henson’s original. (Likewise, it has World of Twist’s really good cover version of She’s a Rainbow, but not the Rolling Stones’ original.)

I counted the number of times each word appears, filtering out so-called ‘stop words’ (so “so“, like “like” and “and”. “Too”, too). I also filtered out words that appear in song names that aren’t really part of the name, like ‘remix’ and ‘remastered’ and ‘featuring’). I then sorted those words from the one that appears the most (“love”), second-most (“don’t”), and so forth, stopping at number 100,000 (“jawbone”).

Here’s the result:

popularity-of-rainbow-in-song-titles(click image to enlarge)

Written text often follows the Zipf-Mandelbrot distribution (named after George Zipf (pronounced ‘ziff’) and Benoit Mandelbrot, the mathematician who studied fractals (where fractals are iterated (that is, nested or (some might say) repeated) structures that exhibit ‘self-similarity’)) and song title words seem to be no exception. You can see that “rainbow” is relatively popular (#645), appearing in Elvis Presley’s Pocketful of Rainbows and the more-contemporary Angus and Julia Stone’s Living on a Rainbow. Certainly not out-of-line with thousands of other popular words. (The file only contains song titles, not lyrics, so these counts don’t include songs like John Prine’s excellent In Spite of Ourselves, which mentions ‘rainbow’ in the lyrics but not the title.)

“Rainbows” is on the graph too, though down at position 3330.

But the first 40 or so words on the list are seen more-frequently than George’s calculations would suggest. Here are those outlier words, written in order of popularity, though as if they are song titles. Effectively, they are the Most-Overused Words in Song Titles:

1-3.    Love, Don’t Live
4-7.    Time One-Up Song
8-11.   Man Down & Out Blues
12-13.  I’m Life
14-15.  Night World
16-18.  Day-Go Heart
19-21.  Back Little Way
22-27.  Come, It’s Good Baby Girl
28.      New
29-31.  Never Know Home
32-33.  Away Take
34-36.  Black Can’t Blue
37-38.  Over Last
39-41.  Want More Now

As for where the frogs go, I’m certain you already know the answer to that question.

There’s this path I walk on a regular basis. To make it less slippery in the wintertime, they scatter little rocks around.

Then, one day in the spring they clean the rocks up, exposing the beautiful stone pathway beneath.

the-shortcut-pathway
I sometimes bring a heavy case full of tools along this path and the rocks make it very difficult to roll the case along the ground and they have damaged the case’s wheels. So typically I carry the case, even though it means stopping every dozen steps or so.

Now that it’s spring and the snow melting, the rocks act as a dam to trap the meltwater, causing portions of the path to become flooded.

Someone who came along before me was kind enough drag their heel through the rocks in different directions, allowing the meltwater to mostly drain away.

I find it ironic that these rocks, which are scattered around to make it easier to walk this path in the wintertime actually make it more difficult in two different ways: by directly hindering the movement of things like my tool case and also by trapping water, which makes it difficult for everyone to walk this path.

I suppose that, when it is above freezing in the daytime but cold enough at night for the water to refreeze, this trapped meltwater might again on occasion become patches of flat slippery ice; thus making the rocks contribute to the very problem they are trying to alleviate.

Google uses Optical Character Recognition (OCR) to turn scanned pages from millions of printed books into digital text.

But OCR is not always accurate. For example, before about the year 1800 it was common to sometimes write the letter ‘s’ in an elongated form that looks a lot like the letter ‘f’.

So, in the image below from John Norris’ 1710 book ‘A Collection of Miscellanies’ you can see the passage: “The first Thing that I observe, is, that ’tis generally agreed upon among them, That this Fruition of God consists in some Operation; and I think with very good Reason.”

Norris-1710
But if you look closely you’ll see that the ‘s’ in words like ‘first’ and ‘observe’ looks a lot like an ‘f’, so that the words look like ‘firft’ and ‘obferve’. Just in this short passage we can also see ‘confists’, ‘fome’, ‘Reafon’, ‘Happinefs’, ‘underftand’, ‘beft’, ‘laft’, ‘fo’ and ‘underftood’.

Obviously, it can also be difficult for a computer program to understand that these are s’s, not f’s.

In 2009, when Google released summarized statistics about the occurrence of different words in the books they scanned, there were many instances of words like ‘beft’ and ‘firft’, especially in text written before 1800. However, they updated their analysis in 2012, reclassifying many of those occurrences of ‘beft’ as ‘best’.

beft-compare

(click image to enlarge)

But, for some words, the newer version of Google’s statistics looks like they have gotten worse at identifying the correct word. For example, the capital letter ‘O’ looks a lot like the number ’0′, so in the chemical formulas for compounds that contain oxygen, like ‘H2O’ (h-two-oh), can easily be misread as ‘H20′ (h-twenty).

For cases like ‘H2S04′ and ‘Fe203′ (which contain zeroes instead of the letter ‘O’) , it is highly likely that these words are simply misreadings of ‘H2SO4′ (sulfuric acid) and ‘Fe2O3′ (iron oxide).

chemical-compare(click image to enlarge)

I have been able to find many more examples like this using the names of chemical compounds that contain oxygen. In each of these cases, Google’s character recognition for the 2009 statistics seems better than their 2012 values.

I am interested in words that were used frequently in books published in the US and the UK in the year 2000.

It turns out that since 1980, Americans and Brits have used nouns at nearly the same rate. And, Brits use adjectives more than Americans. So, any differences in usage of the nouns listed below is not due to Americans or Brits generally using them more than the other. And any adjectives that Americans use more is not because of Americans’ overall usage of adjectives.

Here are some things I found:

US_flag-425x100-strip

1. Americans write about colors (and colours) more than Brits: black, white, red, blue, green, brown, orange, pink, purple. (Americans write more about ‘gray‘ while Brits write ‘grey‘ more often.)

2. Americans write more about days of the week than Brits: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday.

3. Americans write more about diseases and body parts than Brits: diabetes, thyroid, chin, chest, breathing, insulin, medication, fingers, cardiovascular, abdominal, waist, knees, dental, healing, seizures, finger, breast, breasts, throat, elbow, forehead, hormone, mouth, arthritis, kidney, asthma, lung, lungs, belly, hearts, fatty, hypertension, jaw, vomiting, pain, trauma, anatomy, abortion, sick, diagnosis, glucose, nasal, pregnancy, artery, ultrasound and more.

4. Americans write first names more than Brits: Jack, James, Joshua, Daniel (Dan), Harold, Samuel, Joseph (Joe), Matthew (Matt), Mark, Tim, Robert, Luke, John, George, Chloe, Emily, Charlotte, Jessica, Lauren, Laura, Sarah, Sophie, Olivia, Hannah, Amy, Ann, Emma, Rachel, Mary, Rebecca. These are actually the most popular British names, and yet Americans still use them more often. Think about it – Americans actually use Charles, Elizabeth, William, Harry and Kate more than Brits do!

But Brits use titles way more than Americans do: Mr., Mrs., Dr., Ms.,  Sir, and Professor

Thus, combining this point with point #3 above, if you are trying to write a cowboy story and want the main character to sound as American as possible, call him “Bleeding Feet Billy“. It’s mathematically proven to be nearly the most American name you can imagine.

5. Americans write more about animals than Brits: horse, horses, cat, cats, dog, bird, chickens, cows, monkey, chimp, and whales.

6. Americans write more about beverages than Brits: water, beer, wine, juice, liquor, rum, vodka, coffee, cola, and milk. Only for tea and toddy do the Brits compare. (Even, then, we still beat them at ‘tea’.)

7. Americans write the words for numbers more than Brits: one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, twenty, hundred, thousand, million, shitload, and dozen.

UK_flag_425x100

What do Brits write about more than Americans?

1. The British write more than Americans about royalty: king, queen (almost), prince (but not princess), duke, duchess (barely), lord, earl, royal, castle, crown, emperor, and empress. (Interestingly, Americans write ‘royalty‘ itself more than Brits, though probably in the financial sense.)

2. The British write more than Americans the last names of people who were never President of the United States: Unwin, Darwin, Bentham, Thatcher, Ruskin, Wittgenstein, Brecht, Wordsworth, Levinas, Nero, Foucault, Lenin, Claudius, Kant, Coleridge, Chomsky, Hume, Khan, Boyle, Machiavelli, Woolf, Livingstone, Franco, Caesar, Derrida, Chamberlain, Hegel, Marx, Mussolini, Goethe, Yeats, Dickens, Chaucer, Freud, Hardy, Hobbes, Rousseau, Shakespeare and Hitler.

And, 3. distant lands: Norway, Denmark, India, Ghana, Africa, Arabia, Iran, Russia, Japan, Australia and others.

So, basically, the Brits write more about themselves, their royalty and their neighbors more than Americans, who, in turn, write more about their name, their rectum and which day of the week it is than Brits.

What do Americans and Brits write about equally? Canada, goats, July and stereotypes.

Next Page »



Follow

Get every new post delivered to your Inbox.