Neural Networks Translate Images to Poetry
January 13, 2016 10:07 AM   Subscribe

Neural Networks Translate Images to Poetry
Neuralsnap generates an image caption using a model I trained (convolutional and recurrent neural networks), then uses another character-level recurrent neural net that I trained on ~40 MB of poetry to expand the caption into a poem. (In this example, generated from a Rothko painting, the red text is the direct image caption, and the rest is the poetic expansion.)

I created Neuralsnap as a follow-up to my prior project, word.camera, on the shoulders of two spectacular open source contributions by Andrej Karpathy: NeuralTalk2 and Char-RNN, both of which run in Torch. The code I've provided is a modest Python wrapper for a few of Karpathy's scripts, and a means to experiment with a few models that I've trained on Nvidia K80 GPUs using the High Performance Computing facilities at NYU.

I am also providing the CNN/RNN models I trained on the MSCOCO data set (for captioning images), and the RNN model I trained on a corpus of ~40MB of poetry (for expanding the captions into poems), each under a Creative Commons license.

In my research, I am developing tools that I hope will serve to augment human creativity, and these are the first neural network models to emerge from my explorations.
Role: programmer
posted by TheMadStork (5 comments total) 5 users marked this as a favorite

I have no idea how this works and maybe I've had a little too much coffee, but this is beautiful. Do you have plans to create a user front-end?
posted by a halcyon day at 3:22 PM on January 13, 2016 [1 favorite]


Thanks, I do. But it's complicated by the fact that there's still some heavy processing involved for a single image's output.

I'm planning to make a portable camera that prints or displays the text when the user takes a photo. (I've been making them with an algorithm from the prior iteration of this project, word.camera.) Not sure what'll be next after that.
posted by TheMadStork at 6:24 PM on January 13, 2016 [1 favorite]


This is scarily convincing.
posted by grobstein at 10:00 AM on January 14, 2016


Even if the portable camera takes several seconds to 'develop' the output it works. You could build a nice show around a series of pre-made pairs, plus a video of you walking around photographing things, plus an installation of a working camera where visitors can show objects and receive a poem.
posted by a halcyon day at 2:16 PM on January 14, 2016


I'm totally stealing the black potato chips of sound and mouth (from the Ikeda poem)
posted by moonmilk at 4:07 PM on January 14, 2016 [1 favorite]


« Older A New Mythology...   |   2016 Hugo Best Editor (Short F... Newer »


You are not currently logged in. Log in or create a new account to post comments.