Wednesday, March 26, 2014

Predicting Full-Length Ribosomal Gene Sequences

Introduction

The 16S ribosomal gene has been used extensively in biology for distinguishing relatedness between species.  This gene has regions of DNA that are highly conserved among almost all living organisms and other regions that have high DNA sequence variability.  The conserved regions are ideal for building PCR primers that can amplify DNA from many different organism.  The variable regions that are amplified using these conserved primers can be used to determine the relatedness between two or more organisms.  Closely related species typically have much more similar DNA sequences than distantly related species.

Typically PCR is used to amplify a portion of the 16S ribosomal gene for sequencing.  However, whole genome sequences or whole metagenome sequences also contain short DNA reads originating from the 16S gene.  These reads can be separated from the pool of other genomic reads and assembled into the entire 16S gene.  EMIRGE (Miller, et al. 2011) is an algorithm for reconstructing full-length ribosomal genes from short read DNA sequences.


EMIRGE

EMIRGE reconstructs full-length ribosomal genes from short read DNA sequences.  It first maps reads to a database of known 16S genes such as the SILVA or greengenes database.  After the initial mapping, EMIRGE estimates the probability that a given read was generated from the reference to which it mapped.  Based on these probability estimates, reference sequences are changed to reflect the 16S sequences that are likely to be represented by the set of reads.  Reads are then remapped to the adjusted 16S sequence database and the processes is repeated until an equilibrium is achieved.  The resulting database of 16S sequences reflect the likely 16S genes represented by the input set of short reads.

This software was primarily built to infer the set of 16S genes from whole metagenome reads.  However, it can also be used to infer the single 16S gene from genomic sequences from a single isolate.  Full-length 16S genes are difficult to assemble even when only reads from a single genome are considered.

In the Dangl lab, we use EMIRGE to predict full-length 16S genes from reads generated from a single genome of bacteria.  An example of the EMIRGE command we use is:

emirge.py my_output_dir -1 fwd_reads.fastq -2 rev_reads.fastq -b SSURef_NR99_115_tax_silva_formated -f SSURef_NR99_115_tax_silva_formated.fasta -i 600 -s 1000 -l 250

The descriptions of each parameter are below:

my_output_dir: the output and working directory for EMIRGE.

-1: the forward or single-end genomic sequencing reads

-2: the reverse end genomic sequencing reads

-b: the bowtie index of the 16S sequence database

-f: the fasta file of the 16S sequence database

-i: insert size of paired-end reads

-s: standard deviation of insert size for paired-end reads

-l: max length of reads


Other Details

EMIRGE uses bowtie to map reads to the reference database.  To build the bowtie index of the reference database the following command was used:

bowtie-build SSURef_NR99_115_tax_silva_formated.fasta SSURef_NR99_115_tax_silva_formated

Also, the database downloaded from SILVA had to be reformatted using this Perl script.  This script requires BioUtils.

Monday, March 24, 2014

2014 JGI Users Meeting Notes

Here are some notes from a few of the speakers at the JGI Users meeting in California.  In general the speakers were fantastic.  Some general themes of the conference include:  single-cell genomics, synthetic biology, fungal metagenomics, and metabolics.  A person take-home message for me was the need for creative biological solutions to common issues that the human race currently faces or will face in the near future.

Mark Ackermann (opening keynote) – A Single Cell Perspective on Bacterial Interactions
- Focused on phenotypic heterogeneity, when identical cells have different functional profiles.
- Most genes don’t have clonal variation but in the ones that do how is that heterogeneity important for the community.
- Salmonella is an example of phenotypic heterogeneity.  One cell type causes inflammation and one uses the inflammation response to reproduce and cause full infection.
- Different cell types survive better in different environmental conditions.
- Another example of phenotypic heterogeneity is in alpine lakes where there are generally large amounts of ammonium that bacteria can use as a nitrogen source.  However, there are some cells that fix their own nitrogen in the event that ammonium runs out.
- preliminary data show that neighboring cells are more likely to be of the same cell type.

Mary Berbee – Pectinases link Early Fungal Evolution to the Land Plant Lineage
- Sequenced early divergent fungal groups.
- The relationship between the early branching groups is still poorly resolved.
- Showed some cool trees where she had overlaid two trees to highlight difference between the two.  I would like to know what software she used to do this.
- Her trees were based on whole genomes but I’m not sure how she built them.

Rytas Vilgalys – Understanding the Forest Microbiome:  A Fungal Perspective
- Oak and pine share many fungi while populus has more different fungi.
- Soils from the same region are likely to share the same fungi.
- Populus of different genotypes do not assembly different fungi.  At least not nearly as different as fungi from different regions.
- They have isolated ~1,800 fungal isolates.  These isolate represent only ~15% of the isolates that are likely populus endophytes.
- Many fungal isolates stimulate plant growth.
- They are re-inoculating these isolates to confirm they are endophytic.
- Mortierella elongata is an isolate that stimulates plant growth in populus and Arabidopsis thaliana.
- M. elongata also harbors bacterial symbionts (Glomeribacter which are known to affect lipid fermentation and is a sister to Burkholderia.  These bacteria cannot be cultured possibly because they rely so heavily on the host for nutrients). 
- M. elongata migrate to the roots.
- Different genes are expressed in M. elongata grown in culture than those sampled from the rhizosphere.
- Different genes are expressed in M. elongata inoculated on different hosts.

Eddy Rubin
- Bacterial genes are typically ~900bp.
- In a couple of sequenced genomes they saw average bacterial gene lengths as low as 200bp.  However, when they adjust the codon table by replacing one of the stop codons to code for a glycine predicted genes have an average length of 900bp!  Some bacteria use different codon translations! 
- Natalia Ivanova is a gene annotation specialist they consulted for help in this analysis.
- They found evidence of recoding in lots of other bacteria by looking at sequenced isolates.
- Didn’t find evidence of recoding in archea. 
- They show that phages which use different codon profiles can circumvent host cell machinery to match their codon profile!
- CRISPR regions in bacterial cells often contain phage elements that correspond to different codon profiles.  This is further evidence that phages with different codon profiles can infect cells with canonical codon profiles. 

Nicole Dublier –Metagenomics and Metaproteomic Analyses of Symbioses between Bacteria and Gutless Marine Worms
- Bacteria can use hydrogen to produce more energy than methane.  Nature 2011
- They discovered key genes able to metabolize hydrogen.
- The second half of the talk was about gutless worms living in shallow water.  They completely dependent on bacterial symbionts for feeding and waste excretion. 
- There are species specific symbionts.
- Her proteomics data yield more obvious features than comparative genomics.  As an example she shows how one isolate contains a protein that does the function of 3 different proteins in the canonical Calvin Cycle.  DNA sequencing confirmed this observation but would have been a “needle-in-a-haystack” for a comparative genomics project.  This work published in PNAS.

Erin Nuccio – Mapping Soil Carbon from Cradle to Grave:  Using Omics and Isotope Analyses to Identify the Microbial Blueprint for Root-enhanced Decomposition of Organic Matter.
- The general question is how do microbes transform and stabilize root carbon in soil.
- Carbon can affect nitrogen rates.
- Plants fix carbon for microbes in the soil.
- Looking at the rhizosphere over time it gradually deviates from bulk soil in carbon levels at time points of 3, 6, 9, and 12 weeks.
- Some preliminary data show that bacteria prefer carbon excreted by plant over as an energy source over nitrogen liter material (ie material artificially added to the system).

Michael Fischbach – A Gene-to-Molecule Approach to the Discovery and Characterization of Natural Products
- Discovers natural gene products.  By gene products I think he means functional protein units.
- Undiscovered gene products are often coded by clusters of genes.
- Has some type of algorithm to computationally discover these clusters that may produce unknown gene products.
- Lots of his most interesting clusters were found on human associated microbes.
- Discovered several oligosaccharide clusters.  These bacteria were very difficult to work with but these clusters and the functions they provide to the human host are of high interest.
- The general observation of this study was that microbes in our gut are making products for which we have no idea what they are or how they function.  It’s like taking several prescription drugs for your entire life!  We need to figure out what is going on in there. 

Kelly Matzen – Genetic Control of Mosquitoes
- In the 50’s DDT was used to control mosquito populations and subsequently mosquito born disease such as dengue.  However, DDT is know to be detrimental to the environment in several ways and therefore is being used much less.  We are starting to see diseases like dengue make a comeback in places like Florida and of course in places like Central and South America.
- Right now the most effective control is pesticides. 
- They are releasing massive numbers of sterile male mosquitoes to control (ie reduce) mosquito populations.  This technique has been successfully used before in the United States to control populations of other insects many years ago.
- This technique seems to be working in the small field studies they have been conducting. 
- There is some push back from legislators but in general it seems like good solution.

Cameron Coates – Characterization of Cyanobacterial Hydrocarbon composition and Distribution of Biosynthetic Pathways
- Cyanobacteria produce over 30% of the earth’s oxygen.
- They are very diverse and live in all sorts of habitats on earth.
- They can produce hydrocarbons where are relevant of use of biofuels.  However, they don’t produce large amounts of hydorcarbons.
- They looked at the evolution of cyanobacteria hydrocarbon pathways.  There are two main pathways.  Several clades have both pathways suggesting a large amount of horizontal gene transfer. 
- This work was published in PLOS ONE.

June Medford – Making Better Plants:  Synthetic Approaches in Plant Engineering
- They created a biological input/output system.  This allows for some external factor to cause a reaction that can be observed in the plant.
- They use a pariplasmic binding protein as the input signal because it can quickly defuse through the cell wall and are then translocated to the nucleus to transcriptionally regulate some response. 
- They can theoretically use this system as a flag for pollutants or other dangers that we currently use very expensive technology to detect.
- They are currently developing a system to detect TNT where the response signal of the plant is to turn white.  This system can detect traces of TNT 10x smaller than a dog!  There are still some kinks to work through like response time.  But looks like a very promising system.  This idea has countless unexplored applications!

Kankshita Swaminathan – Genome Biology of Miscanthus
- Miscanthus is in the same clade as sugar cane, corn, and sorghum.  These plants have been amenable to breading.
- The genomic sequence of sorghum is very close to Miscanthus except that Miscanthus has had a whole genome duplication event.
- In the winter all the nutrients migrate to the rhizome leaving only the stalk above ground.  The stalk is the most important element for biofuels and can be harvested without significantly depleting soil nutrients.

Annalee Newitz (closing keynote) - How Humans Will Survive a Mass Extinction
- Humans have a very good chance of surviving a mass extinction because we are very adaptable.  However, our focus should be how we can preserve the diversity of the earth as it is now.
- A mass extinction is when greater than 70% of the earth's species are killed.
- Five mass extinctions have occurred in the history of the earth.  Perhaps the largest was caused by cyanobacteria because they released large amounts of oxygen into the atmosphere.  Close to 90% of species became extinct as a result.
- Climate change is inevitable regardless of wither or not humans are the cause.
- The questions we should be asking are:  how can we respond to these changing climates and what can we do to preserve the world as we know it.
- Space travel seems like an important step in human survival.