Distributed Human Sorting of Internet Objects

I learn a lot from reading Ed Felton. In A Spoonful of Sugar he describes an absolutely brilliant method being used at Carnegie-Mellon to “label all the images on the web”.

… a pair of strangers, shown a photographic image, are each asked to guess the single word that the other will use to characterize the image. Get it right and you score valuable points. For an extra challenge, sometimes there are “taboo words” that you aren’t allowed to use. Players report that the game is semi-addictive.

The brilliant part is that the game “tricks” its players into doing an important and incredibly time-consuming job. By playing the game, you’re helping to build a giant index that associates each image on the internet with a set of words that describe it. It’s well known that indexing and searching a set of images requires the time-consuming manual step of assigning descriptive words to each image. Labeling all of the images on the internet is an enormous amount of work. When you play the ESP Game, you’re shown images randomly chosen from the internet. You’re doing the time-consuming manual work to index the whole internet’s images – and enjoying it! So far the group has collected over two million labels.

Oddly, according to their FAQ, the designers are trying to pre-filter the content: to remove porn. I would have thought that this filtering/labeling would be most desired by people trying to blacklist porn images. Of course, if the system were used that way, people might try to game the system by introducing bad data….

This entry was posted in Internet. Bookmark the permalink.

One Response to Distributed Human Sorting of Internet Objects

  1. Heidi says:

    Having now spent far more time playing that game than any human should be allowed, let me assure you that the motivation makes for bad labeling. There’s a motive to skip good words (like “german shepherd”) and use generic ones (like “dog”) You have a very complex picture being described by “blue”: one of the background colors. A better way would be to have a somewhat Boggle-like setup. You can list as many descriptions as you like. You have 5 or so people playing. If nobody else lists your description, you get no points. But two people with the same description get the most points, then three and so on. So generic descriptions don’t get you much, and specific (but true) ones do.

Leave a Reply

Your email address will not be published. Required fields are marked *

Notify me of followup comments via e-mail. You can also subscribe without commenting.