Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

tools of diagnostics and quality control output is not clear?

I tried to find some tools in gatk describing sequence quality tools ,such as CollectVariantCallingMetrics , EstimateLibraryComplexity (Picard), CollectAlignmentSummaryMetrics (Picard), I want ot ask is these tools still usful for gatk now, it seems nobody cared about this,

Q1: output of CollectAlignmentSummaryMetrics (Picard), though the link give some explanation, but result still made me confused, for example, the rownames FIRST_OF_PAIR and SECOND_OF_PAIR,how you define this, is not the FIRST_OF_PAIR means map the the positive strand, if so, why not 100%
the Strand balance - reads mapped to positive strand / total mapped reads
PF_INDEL_RATE is also not known.

Q2: another thing is this command need --ADAPTER_SEQUENCE, how is the default value comes, is it fit for all illumina platform?

Q3: EstimateLibraryComplexity (Picard) said > The algorithm attempts to detect optical duplicates separately from PCR duplicates and excludes these in the calculation of library size. so this tool does not account for PCR duplicates?
and the two columns can you explain that?

Q4: CollectVariantCallingMetrics, my input vcf has 219 variants, but I do not know why it just report three, and I do not know how it calculate other columns values

Sign In or Register to comment.