Proverb Proof by Markov Walker
January 31st, 2010 11:05 AMI believe other players have addressed this proverb, but not to my satisfaction. Addressing this proverb raises several questions. How do we compare images to words? Which pictures are worth which thousand words?
I'll motivate my answer to these preliminary questions with the following picture:

This picture can be summarized as "One full period of the sine function." Far fewer than 1000 words. In fact, I could specify this exact image by including words describing the image resolution, the size of the margins, and the location of the axes, and the fact that the background is white and the values of the function and the axes are black, and this would still be less than 1000 words.
The best way to quantify this is as information. Discovering patterns in something is equivalent to reducing the amount of information in that thing. Another way to say this is that the information in a message (a picture or a sequence of words) is, roughly, what you have left once you've removed all the redundancy, predictability, or patterns. In other words, the more unexpected the message, the more information.
A natural units of information is bits, and multiples of bits. In this case, kilobytes (2^13 bits) is a convenient unit to measure information. To find out whether a picture is worth 1000 words, we need a file that encodes 1000 words in an efficient way, and another that encodes a picture in an efficient way, and compare to see which is bigger. So, if the question is "Is a picture worth 1000 words?", the clear answer is, it depends which picture, and which words. But that's a boring answer, so I'll answer the more interesting question. Which pictures are worth your 1000 typical written English words?
I'll start with a naive way of measuring the information in 1000 typical words. I sampled 1000 word tokens from the Brown corpus, without replacement. Sampling tokens means that, of the 1000 words, about 50 will be "the", about 30 will be "of", about 20 "and"s, etc. I did this 20 times and found the sample average length of 1000 words to be 4,538 characters, counting the whitespace marking the end of each word. Assuming we don't need to capitalize, we only need to encode 26 letters, plus the space to separate words, plus some punctuation (comma, period, quote, apostrophe, question mark), and we'll have perfectly readable, if not standard English with an alphabet of only 32 characters. To encode words in a 32 character alphabet we can use 5 bits per letter, and this way we can encode 1000 words in about 2.77 kilobytes.
An equally naive way to measure the amount of information in a picture is to assume we specify the color of each pixel in the image using 8 bits, with the 8 bits referring to some color in a color palette. Under these assumptions, our 1000 words are equal to a picture with about 2800 pixels, or a 70x40 picture.

This image is worth about 1000 English words.
But there's a lot of patterns in both the words and that picture, so both should contain far less than 2.77 kilobytes. For instance, written English has a lot of 'e's and very few 'q's. If we assigned 'e' 2 bit codeword, for instance, and 'q' a 7 bit codeword, it would take less information to represent our 1000 words. And that image uses fewer than 256 colors, with black being the most common color. These facts can be exploited to compress our words and picture and get a more accurate estimate of the actual amount of information they contain.
To measure this I'll compare lossless compression of text and pictures. The text will be a file containing 1000 words taken from the Brown corpus and compressed with Windows 7's compression utility for making zip files. The image will be a PNG. PNG uses both general compression algorithms and compression tailored for images, while the zip algorithm is strictly general purpose. This will slightly overestimate the information in the text compared to the image, but finding or creating an English text specific compression algorithm is beyond the scope of this proof.
English also has important patterns between words. For instance, 'the' is extremely common, and 'of', while extremely common, is extremely rare when the word right before is 'the'. These sorts of pattern are important for comparing the information in text and images, and they're absent if I just sample 1000 random words. Instead, I sampled a string of 1000 consecutive words from the brown corpus, starting in a random location.
The result is about 2.12 kilobytes for 1000 words. That's about the size of this image:

It's slightly bigger than this one

and a little smaller than this

I conclude that a small, simple browser icon is typically worth more than 1000 words.
22 vote(s)

Lincøln
3
relet 裁判長
5
gh◌st ᵰⱥ₥ing
4
Juliette
4
Ombwah
4
Wolf
5
rongo rongo
3
teucer
3
anna one
3
Rin Brooker
5
Optical Dave
5
Electra Fairford
3
Spidere
5
Dan |ØwO|
5
Pixie
5
Picø ҉ ØwO
5
Joe
5
Luna Lovegood
5
done
5
Togashi Ni
5
[øwo] lady minirex
5
Likes Music 0w0
Terms
math, statistics, randomness7 comment(s)
science renders all other perspectives obsolete!
What we really need is a transhumanitarian perspective.
But here's something to consider for your aesthetic perspective:
If a man have a stubborn and rebellious son, which will not obey the voice of his father, or the voice of his mother, and that, when they have chastened him, will not hearken unto them:
Then shall his father and his mother lay hold on him, and bring him out unto the elders of his city, and unto the gate of his place;
And they shall say unto the elders of his city, This our son is stubborn and rebellious, he will not obey our voice; he is a glutton, and a drunkard.
And all the men of his city shall stone him with stones, that he die: so shalt thou put evil away from among you; and all Israel shall hear, and fear.
Draw your own conclusions.
Actually, let me help you draw a conclusion. A carefully chosen picture from The Brick Testament is worth the entirety of the Bible.
You get an extra point from me for that comment.
Nice. I particularly like how for you, the phrase "a picture" immediately brings to mind the question "which picture?"
It's everywhere these days. I'm all for it.
I really like the work put into this. And the SCIENCE! is sound. I would like to see this also taken from a different angle. Take a look at the proverb from an aesthetic angle, or from a humanitarian one. I'd love some contrast.
I think this proof alone is awesome and well done, I'd just like to see other points of view. You handled it in a very Eq way, and I appreciate it. Really I do. Excellent praxis. But in my heart I want more. Maybe I just want too much.