Thoughts on politics and life from a liberal perspective

Wednesday 28 October 2009

Is Arnold Schwarzenegger a fan of acrostics? 200 billion to one chance he isn't

Paul on Liberal Burblings has drawn my attention to a letter written by Californian Governor Arnold Schwarzenegger to the members of the Californian State Assembly refusing to sign an assembly bill. The thing is there is an interesting message if you just read the first letters of the two paragraphs in the letter. They are in order F, u, c, k, Y, o and u. Is Mr Schwarzenegger a fan of acrostics I wonder.


A rather crude calculation by Gary Langer from ABC news puts the odds of this happening by chance at one in 10 billion although he is assuming uniform distribution of letters throughout the language and is simply using (1/26)^7 which of course is a bit simplistic (and is actually around one in 8 billion - he's rounding up by 2 billion).

A slightly more sophisticated way of doing this would be to weight the letters using letter frequency within English. According to this Wikipedia article, the letter frequencies are: f = 2.228%, u = 2.758%, c = 2.782%, k = 0.772%, y = 1.974%, o = 7.507%.

So I calculate the odds as 0.02228 x 0.02758 x 0.02782 x 0.00772 x 0.01974 x 0.07507 x 0.02578 = 0.00000000000504173890853168

Or to put it another way odds of one in 198,344,265,370 (nearly 200 billion).

This, to put it mildly seems vanishingly unlikely...

Here's the original letter:

Ammiano Veto Message

4 comments:

Duncan Stott said...

I think it is even less likely than that...

Your calculation gives the probabiltiy of producing the sequence 'fuckyou'. However a new paragraph was inserted between the 'k' and the 'y' to create 'fuck you'. So the probability of having a new paragraph in this position also needs to be taken into account.

I don't know how you would include this in the equation, but it would definitely lower the odds even more.

Kalvis Jansons said...

I agree with the mathematics!

However, there is another factor. Even if a random letter turned out like that, you might notice it, and not send it for obvious reasons. So the chances of getting such a letter, without it having been constructed, would be even smaller than you have calculated.

Kalvis Jansons said...

Note that you can improve your estimate by looking at the frequencies of letters at the start of words, which will change it a bit. However, your estimate will be of the right order of magnitude.

Then you can improve on even this, by looking at the freqencies of letters at the start of words for other text from the same author.

The right reply to this letter would be something like "I'll be back" done in the same way.

Chris Paul said...

You asked here how I got to a trillion to one? Well it is as Dr Jansons says a matter of frequency of initial letters in words within the working vocabulary of the Swearminator. Words starting in all the letters apart from U and K are reasonably numerous, though in a couple of misives whose initial letters I quote in my post I got 4 As, 4 Os, 2 Is, 2 Ys and an E in the vowels. And an assortment of consonants including 4 Cs and 1 F.

With not even the slightest instance of any sensible hidden words. Oh, ha, yo, id, oid (which may appeal to Dr Jansons), ac, ba and ad being your lot as far as I can see.

A factor of just 5 is I suspect a rather low one to allow for frequency of *initial* letters within the Swearminator's vocabulary. So the good doctor may be wrong strictly speaking re order of magnitude.

I'd also stick another potential weighting factor in the mix; that is that long words may be more likely to appear at the start of lines than short ones.

And an observation also. The swearminators other letters in the public domain - there are loads, he/the post holder is prolific - show a significant resistance to "widows and orphans" - i.e. the odd single words that often arise when paras are allowed to break as they naturally do on a particular measure.

In other words someone on the staff is either casting off carefully and tweaking point size to avoid, or redrafting to avoid these - I often do the latter myself on my blog for my preferred browser. A little anal actually. I publish a post, look at it, go back and try to get rid of widows and orphans. Comes from being an ex sub and typographer.

That active process militates against FU moments.