We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

tools of diagnostics and quality control output is not clear?

I tried to find some tools in gatk describing sequence quality tools ,such as CollectVariantCallingMetrics , EstimateLibraryComplexity (Picard), CollectAlignmentSummaryMetrics (Picard), I want ot ask is these tools still usful for gatk now, it seems nobody cared about this,

Q1: output of CollectAlignmentSummaryMetrics (Picard), though the link give some explanation, but result still made me confused, for example, the rownames FIRST_OF_PAIR and SECOND_OF_PAIR,how you define this, is not the FIRST_OF_PAIR means map the the positive strand, if so, why not 100%
the Strand balance - reads mapped to positive strand / total mapped reads
PF_INDEL_RATE is also not known.

Q2: another thing is this command need --ADAPTER_SEQUENCE, how is the default value comes, is it fit for all illumina platform?

Q3: EstimateLibraryComplexity (Picard) said > The algorithm attempts to detect optical duplicates separately from PCR duplicates and excludes these in the calculation of library size. so this tool does not account for PCR duplicates?
and the two columns can you explain that?

Q4: CollectVariantCallingMetrics, my input vcf has 219 variants, but I do not know why it just report three, and I do not know how it calculate other columns values


Sign In or Register to comment.