PLAYERS TASKS PRAXIS TEAMS EVENTS
Username:Password:
New player? Sign Up Here
Markov Walker
Time Lord
Level 7: 2384 points
Last Logged In: June 24th, 2021
BADGE: Senator TEAM: The Ørder of the Wild Onion BART Psychogeographical Association Rank 4: Land Surveyor EquivalenZ Rank 3: Protocologist The University of Aesthematics Rank 1: Expert Humanitarian Crisis Rank 2: Justice Biome Rank 1: Hiker Chrononautic Exxon Rank 6: Flux Capacitor Society For Nihilistic Intent And Disruptive Efforts Rank 2: Trickster




25 + 95 points

Proverb Proof by Markov Walker

January 31st, 2010 11:05 AM

INSTRUCTIONS: Perform an experiment to verify or disprove a common proverb, when interpreted literally. Can you really catch more flies with honey than vinegar? How many more? Does a stitch in time really save nine? Or does it only save seven? Please note that aphorisms or non-metaphorical proverb-like sayings are not acceptable for this task.

A picture is worth a thousand words.

I believe other players have addressed this proverb, but not to my satisfaction. Addressing this proverb raises several questions. How do we compare images to words? Which pictures are worth which thousand words?

I'll motivate my answer to these preliminary questions with the following picture:

sine.gif

This picture can be summarized as "One full period of the sine function." Far fewer than 1000 words. In fact, I could specify this exact image by including words describing the image resolution, the size of the margins, and the location of the axes, and the fact that the background is white and the values of the function and the axes are black, and this would still be less than 1000 words.

The best way to quantify this is as information. Discovering patterns in something is equivalent to reducing the amount of information in that thing. Another way to say this is that the information in a message (a picture or a sequence of words) is, roughly, what you have left once you've removed all the redundancy, predictability, or patterns. In other words, the more unexpected the message, the more information.

A natural units of information is bits, and multiples of bits. In this case, kilobytes (2^13 bits) is a convenient unit to measure information. To find out whether a picture is worth 1000 words, we need a file that encodes 1000 words in an efficient way, and another that encodes a picture in an efficient way, and compare to see which is bigger. So, if the question is "Is a picture worth 1000 words?", the clear answer is, it depends which picture, and which words. But that's a boring answer, so I'll answer the more interesting question. Which pictures are worth your 1000 typical written English words?

I'll start with a naive way of measuring the information in 1000 typical words. I sampled 1000 word tokens from the Brown corpus, without replacement. Sampling tokens means that, of the 1000 words, about 50 will be "the", about 30 will be "of", about 20 "and"s, etc. I did this 20 times and found the sample average length of 1000 words to be 4,538 characters, counting the whitespace marking the end of each word. Assuming we don't need to capitalize, we only need to encode 26 letters, plus the space to separate words, plus some punctuation (comma, period, quote, apostrophe, question mark), and we'll have perfectly readable, if not standard English with an alphabet of only 32 characters. To encode words in a 32 character alphabet we can use 5 bits per letter, and this way we can encode 1000 words in about 2.77 kilobytes.

An equally naive way to measure the amount of information in a picture is to assume we specify the color of each pixel in the image using 8 bits, with the 8 bits referring to some color in a color palette. Under these assumptions, our 1000 words are equal to a picture with about 2800 pixels, or a 70x40 picture.

bully.gif

This image is worth about 1000 English words.

But there's a lot of patterns in both the words and that picture, so both should contain far less than 2.77 kilobytes. For instance, written English has a lot of 'e's and very few 'q's. If we assigned 'e' 2 bit codeword, for instance, and 'q' a 7 bit codeword, it would take less information to represent our 1000 words. And that image uses fewer than 256 colors, with black being the most common color. These facts can be exploited to compress our words and picture and get a more accurate estimate of the actual amount of information they contain.

To measure this I'll compare lossless compression of text and pictures. The text will be a file containing 1000 words taken from the Brown corpus and compressed with Windows 7's compression utility for making zip files. The image will be a PNG. PNG uses both general compression algorithms and compression tailored for images, while the zip algorithm is strictly general purpose. This will slightly overestimate the information in the text compared to the image, but finding or creating an English text specific compression algorithm is beyond the scope of this proof.

English also has important patterns between words. For instance, 'the' is extremely common, and 'of', while extremely common, is extremely rare when the word right before is 'the'. These sorts of pattern are important for comparing the information in text and images, and they're absent if I just sample 1000 random words. Instead, I sampled a string of 1000 consecutive words from the brown corpus, starting in a random location.

The result is about 2.12 kilobytes for 1000 words. That's about the size of this image:
Home-prelight.png

It's slightly bigger than this one
userpic-7-50x50.png

and a little smaller than this
server.png

I conclude that a small, simple browser icon is typically worth more than 1000 words.

- smaller


22 vote(s)



Terms

math, statistics, randomness

7 comment(s)

(no subject) -1
posted by Lincøln on January 31st, 2010 12:40 PM

I really like the work put into this. And the SCIENCE! is sound. I would like to see this also taken from a different angle. Take a look at the proverb from an aesthetic angle, or from a humanitarian one. I'd love some contrast.

I think this proof alone is awesome and well done, I'd just like to see other points of view. You handled it in a very Eq way, and I appreciate it. Really I do. Excellent praxis. But in my heart I want more. Maybe I just want too much.

(no subject) +2
posted by Markov Walker on January 31st, 2010 1:29 PM

science renders all other perspectives obsolete!

What we really need is a transhumanitarian perspective.

But here's something to consider for your aesthetic perspective:

If a man have a stubborn and rebellious son, which will not obey the voice of his father, or the voice of his mother, and that, when they have chastened him, will not hearken unto them:
Then shall his father and his mother lay hold on him, and bring him out unto the elders of his city, and unto the gate of his place;
And they shall say unto the elders of his city, This our son is stubborn and rebellious, he will not obey our voice; he is a glutton, and a drunkard.
And all the men of his city shall stone him with stones, that he die: so shalt thou put evil away from among you; and all Israel shall hear, and fear.

dt21_21.jpg

Draw your own conclusions.

(no subject)
posted by Markov Walker on January 31st, 2010 3:44 PM

Actually, let me help you draw a conclusion. A carefully chosen picture from The Brick Testament is worth the entirety of the Bible.

(no subject)
posted by Lincøln on February 1st, 2010 7:04 PM

You get an extra point from me for that comment.

(no subject)
posted by relet 裁判長 on January 31st, 2010 3:10 PM

I like the conclusion!

(no subject)
posted by rongo rongo on February 1st, 2010 10:29 AM

Nice. I particularly like how for you, the phrase "a picture" immediately brings to mind the question "which picture?"

SCIENCE!
posted by Spidere on February 8th, 2010 11:12 PM

It's everywhere these days. I'm all for it.