Up: Box of Ideas for Future Language Toys | [Related] «^» «T» |
Wednesday, December 6, 2000
Poetry Scanner
By Paul Ford
A possible tool to figure out where the accents are.
I want a computer program to scan poetry and show you the accents and the syllabic inflections. Such a program should be able to guess the meter of a poem - save for that poetry often uses archaic forms of language. I wonder if, ultimately, the problem is intractable for all but the most formally constructed and properly punctuated poems.
What's the point of doing this? Well, once you've broken everything up and gotten the syllables and accents, accurate by part of speech, broken out into a big array of data in the computer's memory, you can put it back together and display it in unique ways. For instance, you could show the number of stresses, counted to the right of each line, or you could analyze the pattern (heptameter! iambic pentameter!). The enterprising poetry analyst, undergraduate, graduate, or even professor-level, has a nifty tool to quickly map the linguistic patterns of a poem, especially if he or she uses the tool in conjunction with the Etymologizer.
One major function would be a rhymical-semantic-suggestotron. You could point to a word you wanted to replace, and the tool would go out and find all the cognitively related words it could that had the same rhythm. Voila!
You could also create some amusing, random, rhymic poetry via these methods.
You could also play around with such functions as “phoneme sort,” where you split out poems and lyrics by phonemes and sort the results to find out which phonemes have prevalence, or a “stress sort,” to find out which syllables get stress. I mean, it's not thrill-a-minute, but it would have helped me during that sophomore poetry 305 I took with Dr. Howard.
Given a block of poetry:
Parse it into sentences, not lines, but remember where the linebreaks are. We need to have full sentences in order to--
Apply a link grammar to each sentence, so that we know the part of speech of each word.
Break each word down into syllables, making sure you're doing it with the right part of speech. This is a problem. While there are web tools to tell you the phonetic breakdown of words, they use data intended for machine reading:
ABROGATED AE1 B R AH0 G EY2 T IH0 D
ABROGATING AE1 B R AH0 G EY2 T IH0 NG
ABROGATION AE2 B R AH0 G EY1 SH AH0 N
ABRON AH0 B R AA1 N
ABRUZZO AA0 B R UW1 Z OW0
ABROGATING AE1 B R AH0 G EY2 T IH0 NG
ABROGATION AE2 B R AH0 G EY1 SH AH0 N
ABRON AH0 B R AA1 N
ABRUZZO AA0 B R UW1 Z OW0
See the problem? It doesn't show you where the words break; abrogated should a*bro*gat*ed; abrupt should be ab*rupt, and trying
to deduce where the split of sounds by reverse engineering the pronunciation codes is very difficult. One way around this
is to parse the syllable information out of the Merriam Webster dictionary, originally published via Project Gutenberg, but
now part of
The GNU Dictionary
Actually, this idea isn't going to go anywhere. But it's nice to think about a little poetry machine, I find.