With algorithms subtle and discrete, I seek iambic writings to retweet
March 21, 2012 10:12 PM   Subscribe

With algorithms subtle and discrete, I seek iambic writings to retweet
Pentametron 2013 (pronounce the year "two thousand and thirteen") scans seven million tweets or so each day, in search of those that happen to be in pentameter - and then it retweets them. It digs up five to ten of these per hour, making a sort of endless sonnet from the vast collective chatter of the Net.

Gah! It would take me all night to write this if I tried to do the whole thing in iambs, so I'll cheat and put the free verse here in "more inside". And yet, after reading Pentametron's feed for a while, it's hard to stop looking for pentameter in everything you read. Kind of disorienting, really.

Possible future plans for Pentametron:
* A blog that presents pentametron's findings in nicely-formatted 14-line chunks (i.e. sonnets of a sort).
* Once it's been running for a month or two, I may have enough data to start mining for rhyming couplets.
* Print-on-demand chapbooks.
* Vocal performances. In fact, I'm looking for orators to intone some of these algorithmically-discovered texts. Contact me if you'd like to record some for me.
Role: concept, programming
posted by moonmilk (17 comments total) 6 users marked this as a favorite
This project was posted to MetaFilter by kenko on March 23, 2012: With algorithms subtle and discrete / I seek iambic writings to retweet.

Here's the last 14 tweets it found:
This feelings only temporary though....
Cry Me A River, Justin Timberlake
Ben Gordon is a crazy fuck. #Detroit
Done with the giving of the effort shit.

I haven't been in twitter jail in months
I'm missing Rip the Runway. Homework first !
not really.. its the other way around..
so fucking tired gonna hit the sack

Hoes want attention, Women want respect.
I'm sick and tired of the disrespect
You never answer my Goodmorning texts...
Discussing parking mitigation now

Especially when the conversation GOOD
She always got a goddamn attitude

posted by moonmilk at 10:53 PM on March 21, 2012 [1 favorite]

posted by cortex at 1:09 PM on March 22, 2012

The website is ready! pentametron.com, of course. I'll be fiddling with it over the next few days.

It seems like there's a theme going on in this evening's tweets:
wow gonna do the #lin tonight again..
I really wanna skate a empty pool.
I wanna see the hunger games tonight

Wait, does The Hunger Games premiere tonight?
Not Even Gonna Entertain The Thought
Last practice of the season. #bittersweet
I'm looking at the mirror on the wall.

4 inch a cocky ano cocky dat
Shit really irritating me again
Who masturbates in public!? What the fuck!
Friends play a major part in human lives

I really wanna see the Hunger Games.
Who's going to the hunger games tonight?

posted by moonmilk at 8:35 PM on March 22, 2012 [1 favorite]

You should index the final words of each individual candidate tweet against a rhyming dictionary and ABAB CDCD EFEF GG this stuff for maximal found-poetry joy.
posted by cortex at 9:00 PM on March 22, 2012

Yes! I'm already caching the pronunciation of every tweet, but with only a few dozen iambic tweets per day, there's not enough data to do rhymed verses yet -- it'll have to build up for a month or three first.

If I can get the attention of some high-powered geeks at twitter, I'd love to run this on the full twitter firehose, with 100 times as much data. Then the rhyme database would fill up much faster!
posted by moonmilk at 9:37 PM on March 22, 2012

What are you using under the hood for this? NLTK?
posted by demiurge at 8:44 AM on March 23, 2012

Ooh, I didn't know about NLTK - I'll have to take a look! This is built in PHP, using the CMU Pronouncing Dictionary.
posted by moonmilk at 9:36 AM on March 23, 2012

Why does this not seem to have found any of Willy Shakes's lines? E.g.
posted by grobstein at 1:47 PM on March 24, 2012

I guess it sees some sliver of all tweets.
posted by grobstein at 1:47 PM on March 24, 2012

Yeah, it's sampling about 1% of all tweets. Though I did have vague thoughts of putting in a blacklist containing all the works of shakespeare, so it can NEVER retweet them!
posted by moonmilk at 3:57 PM on March 24, 2012

moonmilk - Oh, cool! Someone pointed this out to me because I've been working on a limerick detector recently, and obsessing over analyzing rhyme and syllable counts and meter. NLTK is awesome! Is your code up somewhere? I see you're also a Brooklynite - want to geek out over NLP over coffee sometime?
posted by 168 at 8:52 PM on March 30, 2012

Hi 168! I haven't published the code but I'm happy to share it privately. Let's do geek out - email me! (address is in profile)
posted by moonmilk at 10:14 AM on March 31, 2012

You uh probably noticed this, but several of the recent Pentametron RT's are spambots quoting Shakespeare.
posted by grobstein at 7:58 AM on April 26, 2012

I did notice! I'm sure it's a coincidence, but I can't help wondering if the shadowy spambot cabal is deliberately trying to bait Pentametron.
posted by moonmilk at 9:57 AM on April 26, 2012

Just revisited this, so glad to see that you got the rhyming lookups working.
posted by jonbro at 6:48 AM on May 2, 2012

posted by moonmilk at 5:25 AM on December 28, 2012

what in the—

i don't—
posted by cortex at 7:46 AM on December 28, 2012 [1 favorite]

« Older How long does it take to find the most important b...   |   Photo Timeline Generator... Newer »

You are not currently logged in. Log in or create a new account to post comments.