Let me take you back to 1881. Everyone is excited because not only is the year palindromic, it is also the same upside-down and that has to mean something, right? Meanwhile, in America, Mr Newcomb is even grumpier than usual because, once again, he has had to clear up the books of log tables after a really unsuccessful statistics party. As he brushes half-eaten vol-au-vents off the pages, he notices something really strange.
All the pages used to look up the logs of numbers starting with a ‘1’ are dog-eared and worn. They also seem to have a lot of what will later be determined to be jam, on them. The pages for numbers beginning with ‘2’ are less aurally canine and less preserved.
Most people would have left it that and gone to the pub, but not Mr Grump, er, Newcomb. He strokes his beard, which takes a full twenty minutes, after which he goes and writes his thoughts down.
Only some of those thoughts are about the folly of serving jam vol-au-vents at statistics parties.
The Universe is a pretty strange place, right? Be it quantum physics, peanut-shaped comets or the popularity of Keith Chegwin, as a species we should live in a constant state of amazement and wonder.
We don’t of course. We go about our lives worrying about the electricity bill, superstitious nutters taking over the world and the popularity of Keith Chegwin. So much of the wondrous stuff doesn’t apply to us, to our existences – we sit and watch a DVD, text a friend or play with lasers, unaware of the Universal strangeness they all rely on.*
Occasionally, I find a way to marvel in some of this weirdness. The other day, I was starting the process of finishing up at work and preparing to go back to my flat for the other half of Monday’s pasta sauce and an evening of writing.**
I was working with a large set of data covering two hundred thousand people, meaning that analysing it in its entirety can take time – even opening the data can be a chore, let alone saving it and closing the program down.As I waited for one set to save and close, what popped into my head but that weird thing about how the frequency of numbers in large sets are predictable, provided the data can be considered random in some way. I have clearly encountered it before, somewhere, but now was the moment it sort of metaphorically fell out my head and sat on my desk taunting me.
I searched on Google (other search engines are available but probably don’t have amusing doodles on their home pages) for “occurrence of numbers in datasets” and, with equal strangeness to the whole idea, the first hit was the Wikipedia entry for “Benford’s Law”.
This law is quite scary because it turns out that if you take any set of numbers that can be considered random, then the proportion of numbers starting with a ‘1’ is about 30%, the proportion starting with a ‘2’ is about 17% and so on. It doesn’t matter what the set of numbers is describing, or what units they use. So, the lengths of all the main rivers in the world, be they in miles or kilometres, broadly agree with the law. As long as the numbers you are dealing with span a reasonable range and aren’t artificially constrained in some way (such as UK ‘phone numbers that always start with a ‘0’), then the law will probably hold.
The law, named after Benford, was actually discovered by Simon Newcomb, all-round clever clogs and winner of “Grumpiest Man with a Beard” in 1880. This is the bit that really appealed to me – how did he discover it?
Well, as the opening paragraph of this blog alludes to, he really did think about the condition of the books of log tables*** and come up with what would eventually be called Benford’s Law.**** Would anyone have spotted this today, I wonder? Probably, but it would have been found in the digital world and felt less tangible, less relevant to the Universe somehow.
Now the law is applied to all sorts of things including forensic accounting, where a couple of years ago the financial submissions of European countries to the EU were analysed. Guess what? The countries in most trouble appeared to have accounts that did not follow Benford’s Law – they had been cooked in some way.
I took the data I was looking at and ran a quick check. Fortunately, but slightly scarily, the expected probabilities were within 0.5% of those quoted by the law. My mind boggled at the fact. What did this mean in the context of the Universe in which I live?
It meant it was time to go home. Perhaps there would be jam vol-au-vents for tea. Perhaps not.
* They mostly depend on weird quantum tunnelling effects that just prove that the Universe is laughing at us.
** Well, half of this turned out to be true.
*** Before we turned the mathematical parts of our brains over to Casio and Microsoft, we used brilliant tables of numbers that allowed us to do hard sums (multiplication and division) by doing simple sums (addition and subtraction). Everyone should be taught how to use log tables because they are magical and a testament to Human ingenuity.
**** Benford’s Law itself agrees with Stigler’s Law that states that important laws are never named after the first people to discover them. Except for Fred Cole who did stuff with salad dressing and cabbage, obviously.