Saturday, November 2, 2013

Difference Between Paired-End and Mate-Pair Reads

In DNA sequencing lingo the words "paired-end" (PE) and "mate-pair" (MP) are frequently used interchangeably.  While the underlying principles between PE and MP reads have strong similarities, there are inherent differences that are crucial to understand.

The similarities between PE and MP reads include:
  • Reads come in pairs
  • Pairs come from the ends of the same DNA strand

The differences between PE and MP reads include:
  • Library preparation protocols -- In short, PE protocols attach an adapter, SP1, to the fwd end and another adapter, SP2, to the reverse end.  The first sequencing step is started by targeting SP1 to generate the forward read.  The second sequencing step targets SP2 to generate the reverse read.  For MP protocols longer DNA sequences are circularized using biotinylated adapters.  During the circularization process the DNA strand ends are connected with the biotinylated adapter between them.  Circularized DNA are sheared and the biotinylated adapters connecting stand ends are pulled down.  These reads can then be sequenced using the same SP1-SP2 adapter protocols used in PE sequencing.  
  • Insert size -- The insert size refers to the distance between the pairs.  PE reads generally have a smaller insert size (< 1kp) than MP (2-5 kb).  The difference in insert size stems from the difference in protocols.  Depending on the length of your reads it is possible for PE reads to have overlapping ends.
  • Read orientation -- PE reads come in forward-reverse (FR) orientation where read 1 is the forward read and read 2 is the reverse read.  Because of the circularization step MP reads com in reverse-forward (RF) orientation where read 1 is the reverse read and read 2 is the forward read.  These differences are especially important to understand for assembly algorithms and projects.
  • Read Trimming -- Theoretically, PE reads require no trimming before sequence analysis.  However, in practice it is recommended that low quality portions of the read be trimmed using tools like Sickle.  Alternatively, MP reads require trimming because biotinylated adapters are often present in the middle of one or both MP reads.  Adapter trimming software generally remove adapters and any sequence beyond the adapter.  Software options for adapter trimming include cutadapt, Trimmomatic, and FastqMcf.  For more reading on adapter trimming see this post and this post.

For more details on the sequencing protocols see the Illumina documentation for PE and MP sequencing.


10 comments:

  1. Dear Sir,
    Thank you very much for your valuable notes. I understand clearly.
    Thanks once again.
    Sincerely,
    Raman

    ReplyDelete
  2. Respected Scott Yourstone,
    Thanks for your clear and understandable notes.

    ReplyDelete
  3. Thank you, this was very helpful!

    ReplyDelete
  4. Thank you - very clear and very helpful. Thanks for taking the time to put this together.

    ReplyDelete
  5. Thank you very much, i also clearly understand the concept.

    ReplyDelete
  6. I have one question, How can I decide that I have sequence the complete genome of my organism? I mean what is the criteria to decide whether the assembled genome is complete or incomplete??

    ReplyDelete
    Replies
    1. Pandit,

      That is a great question. The short answer is that the genome of your organism will likely never be fully complete. However, even an incomplete genome can still be usable. There are some metric/techniques for measuring the completeness of a genome. Because I get asked this question a lot I decided to write a short post about it.

      http://scottmyourstone.blogspot.com/2014/09/when-is-my-genome-finished.html

      Hopefully that helps!

      Delete
  7. Sir! Thank you very much :)

    ReplyDelete
  8. Thanks for a clear explication. Gracias!

    ReplyDelete