Genomedata
December 22, 2009 8:18 PM Subscribe
Genomedata
Genomedata is a format for efficient storage of multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. We have also developed utilities to load data into this format. A reference implementation in Python and C components is available here under the GNU General Public License.
Genomedata is a format for efficient storage of multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. We have also developed utilities to load data into this format. A reference implementation in Python and C components is available here under the GNU General Public License.
You can think of the human genome sequence like a base map, and this system allows you to easily add layers on top of that map, much as you would add terrain, temperature, air pressure, or radar layers on top of a base map of the shape of the U.S.
There are experiments that will tell you every genomic location where a particular protein is likely to be found—the protein itself, not the DNA encoding its gene. I use this software to add a layer for each protein that tells how likely it is that a protein is at a particular location. I've been developing software that finds patterns in this sort of data, which will probably be a future submission to MeFi Projects.
posted by grouse at 8:48 AM on December 23, 2009
There are experiments that will tell you every genomic location where a particular protein is likely to be found—the protein itself, not the DNA encoding its gene. I use this software to add a layer for each protein that tells how likely it is that a protein is at a particular location. I've been developing software that finds patterns in this sort of data, which will probably be a future submission to MeFi Projects.
posted by grouse at 8:48 AM on December 23, 2009
Cool!
posted by ocherdraco at 8:54 AM on December 23, 2009
posted by ocherdraco at 8:54 AM on December 23, 2009
An application note on Genomedata has now been accepted to the peer-reviewed journal Bioinformatics.
posted by grouse at 10:10 AM on May 5, 2010
posted by grouse at 10:10 AM on May 5, 2010
« Older Bloor and Lansdowne is Blansdowne.... | Fuck This Weather... Newer »
posted by ocherdraco at 8:22 AM on December 23, 2009