Up: Box of Ideas for Future Language Toys | [Related] «^» «T» |
Wednesday, December 6, 2000
Polysemizer
By Paul Ford
The mighty Polysemizer, making sure all the words are widely known.
1. polysemy -- (the ambiguity of an individual word or phrase that can be used (in different contexts) to express two or more different meanings) (from WordNet.
A polysemy count indicates how common a word is. Specifically, polysemy refers to how many different forms of a word are used in a language; "love" has a polysemy count of 6, because you can love cooking and children, make love, score 9 love in a game of squash, and so forth. Polysemy values are analogous to the "familiarity," and, for most words, familiarity and polysemy values are practically the same. Familiarity, however, is uncovered by analyzing large corpuses of text and counting how often words occur. Polysemy comes out of what I suppose you could call "intralinguistic" analysis.
So, what I want is a Polysemizer, something to take a given amount of text and tag the words by their polysemy values, and show the result in a coherent way. For instance, the more common words could be shown in black, the less common in shades of gray. This way, when I'm writing advertising or some other text that needs to be easy to understand for the widest range of people of many ages, I can make sure that my vocabulary stays simple, concrete, and easy to fathom, avoiding words which might be perfectly normal to me but odd to others. It would also be helpful with other writing tasks.
The overall tool would be fairly simple to build, since polysemy isn't dependent on knowing the correct part of speech (in most cases; I guess homonyms are a problem): just tokenize a text and look up the polysemy count via the Lingua::Wordnet Perl module, transform that into a color value per word, and display as HTML. I think it could be built in a day or two.