Neural Networks Translate Images to Poetry
January 13, 2016 10:07 AM Subscribe
Neural Networks Translate Images to Poetry
Neuralsnap generates an image caption using a model I trained (convolutional and recurrent neural networks), then uses another character-level recurrent neural net that I trained on ~40 MB of poetry to expand the caption into a poem. (In this example, generated from a Rothko painting, the red text is the direct image caption, and the rest is the poetic expansion.)
I created Neuralsnap as a follow-up to my prior project, word.camera, on the shoulders of two spectacular open source contributions by Andrej Karpathy: NeuralTalk2 and Char-RNN, both of which run in Torch. The code I've provided is a modest Python wrapper for a few of Karpathy's scripts, and a means to experiment with a few models that I've trained on Nvidia K80 GPUs using the High Performance Computing facilities at NYU.
I am also providing the CNN/RNN models I trained on the MSCOCO data set (for captioning images), and the RNN model I trained on a corpus of ~40MB of poetry (for expanding the captions into poems), each under a Creative Commons license.
In my research, I am developing tools that I hope will serve to augment human creativity, and these are the first neural network models to emerge from my explorations.
Neuralsnap generates an image caption using a model I trained (convolutional and recurrent neural networks), then uses another character-level recurrent neural net that I trained on ~40 MB of poetry to expand the caption into a poem. (In this example, generated from a Rothko painting, the red text is the direct image caption, and the rest is the poetic expansion.)
I created Neuralsnap as a follow-up to my prior project, word.camera, on the shoulders of two spectacular open source contributions by Andrej Karpathy: NeuralTalk2 and Char-RNN, both of which run in Torch. The code I've provided is a modest Python wrapper for a few of Karpathy's scripts, and a means to experiment with a few models that I've trained on Nvidia K80 GPUs using the High Performance Computing facilities at NYU.
I am also providing the CNN/RNN models I trained on the MSCOCO data set (for captioning images), and the RNN model I trained on a corpus of ~40MB of poetry (for expanding the captions into poems), each under a Creative Commons license.
In my research, I am developing tools that I hope will serve to augment human creativity, and these are the first neural network models to emerge from my explorations.
Role: programmer
Thanks, I do. But it's complicated by the fact that there's still some heavy processing involved for a single image's output.
I'm planning to make a portable camera that prints or displays the text when the user takes a photo. (I've been making them with an algorithm from the prior iteration of this project, word.camera.) Not sure what'll be next after that.
posted by TheMadStork at 6:24 PM on January 13, 2016 [1 favorite]
I'm planning to make a portable camera that prints or displays the text when the user takes a photo. (I've been making them with an algorithm from the prior iteration of this project, word.camera.) Not sure what'll be next after that.
posted by TheMadStork at 6:24 PM on January 13, 2016 [1 favorite]
Even if the portable camera takes several seconds to 'develop' the output it works. You could build a nice show around a series of pre-made pairs, plus a video of you walking around photographing things, plus an installation of a working camera where visitors can show objects and receive a poem.
posted by a halcyon day at 2:16 PM on January 14, 2016
posted by a halcyon day at 2:16 PM on January 14, 2016
I'm totally stealing the black potato chips of sound and mouth (from the Ikeda poem)
posted by moonmilk at 4:07 PM on January 14, 2016 [1 favorite]
posted by moonmilk at 4:07 PM on January 14, 2016 [1 favorite]
« Older A New Mythology... | 2016 Hugo Best Editor (Short F... Newer »
posted by a halcyon day at 3:22 PM on January 13, 2016 [1 favorite]