Our new verification system, which tests for correct behavior in bioinformatics software packages. We crafted tests to unify correct behavior when tools encounter various edge cases—potentially unexpected inputs that exemplify the limits of the format. Inspired by the web standards Acid tests.
The free Polyidus software identifies the exact genomic regions for integration of a known virus. We developed Polyidus to identify viral integration sites with chimeric sequencing reads from any paired-end sequencing data. First, Polyidus aligns reads to a viral genome. It allows for partial mapping using local alignment, and removes any sequencing fragment where neither read maps to the virus. Second, Polyidus aligns the selected reads to the host genome, permitting partial mapping. Third, Polyidus identifies chimeric reads: those reads mapped partially to the host genome and partially to the virus genome. Fourth, for each chimeric read, Polyidus reports the start and strand of integration in both the host and viral genomes. Polyidus also reports the number of chimeric reads supporting each integration site. [more inside]
Our new method predicts transcription factor and chromatin factor locations in a cell type using new kinds of data like chromatin factor binding in other cell types and learning the association of gene expression patterns with chromatin factor binding patterns. We've made free software available and a track hub that can load our predictions for 36 chromatin factors in 33 human tissue types into the UCSC Genome Browser.
The second major version of semi-automated genome annotation software Segway is now available! It now runs on any Linux system, no longer needing a fancy compute cluster with Grid Engine or LSF. Now you can also install it and all its dependencies with a single Bioconda command!
conda install -c bioconda segway. Includes fancy new modeling methods like mixtures of Gaussians. Turns out a single Gaussian isn't the best distribution for genomic signal data. Who knew? Previously.
Many biological experiments use DNA sequencing as a readout. We can often map the sequenced DNA back to a specific region of the genome. Sometimes, however, we can't. Genomic data is less reliable in those regions. My lab has developed software that makes it easy to identify these regions. We also developed a new method that lets us find those regions in the context of bisulfite sequencing, a technique used to determine where DNA is chemically modified. [more inside]
Database of chemical modifications of DNA found in nature, with detailed information on each one. Includes links to scientific literature describing these modifications and how to sequence them.
BEDOPS is a suite of tools to address common questions raised in genomic studies, mostly with regard to overlap and proximity relationships between data sets. BEDOPS aims to be scalable, flexible and performant, facilitating the efficient and accurate analysis and management of large-scale genomic data.
The free Segway software package contains a novel method for analyzing multiple tracks of functional genomics data. Our method uses a dynamic Bayesian network (DBN) model, which enables it to analyze the entire genome at 1-bp resolution even in the face of heterogeneous patterns of missing data. This method is the first application of DBN techniques to genome-scale data and the first genomic segmentation method designed for use with the maximum resolution data available from ChIP-seq experiments without downsampling. Our software has extensive documentation and was designed from the outset with external users in mind. Researchers at other universities and institutes have already installed and used Segway for their own projects.