The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

Principle of removing duplicated reads in Picard

blueskys123blueskys123 TaiwanMember Posts: 2

MarkDuplicates of Picard is a useful function to remove duplicated reads. However, after reading the introduction of Picard (https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates), I still have some questions about filtering out the Duplicated reads. Here are the questions:

  1. The source of PCR duplicated includes "library/PCR-generated duplicates (LB)" and "sequencing-platform artifact duplicates (SQ)". How does Picard identify LB and SQ from reads?

  2. In default setting with REMOVE_DUPLICATES=true, which type of duplicated reads will be removed, SQ, LB, or both?

  3. The reads A, B, C are considered as duplicated reads, and their quality scores are equal. If these reads are mapped to the same position in genome, which reads will be removed after filtering by Picard?
    And if these reads are mapped to the different position in genome, which reads will be removed after filtering by Picard?

  4. Continue to the previous question, but the qualities of read A, B, C are not equal, which reads will be removed after filtering by Picard?

Thanks

Tagged:

Best Answer

Answers

Sign In or Register to comment.