If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Does GenotypeGVCFs calculate ReadPosRankSum and MQRankSum correctly?
On 2000 samples I have run HC3.2, CGVCFs3.2, GGVCFs3.2 and VR3.2.
For the GenotypeGVCFs step I used the current default annotations:
InbreedingCoeff FisherStrand QualByDepth ChromosomeCounts GenotypeSummaries
And these non-default annotations:
When running VariantRecalibrator and plotting each of the dimensions I noticed all of the non-default annotations taking on discrete values; see bottom of this post. Is it no longer recommended to use ReadPosRankSum and MQRankSum for VR? Should I calculate these annotation with VariantAnnotator instead of GenotypeGVCFs? If I have to run VariantAnnotator, should I then run it separately for SNPs and INDELs cf. my previous question about annotations being different, when applied to BOTH and SNPs:
zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep ReadPosRankSum | sort | uniq -c | awk '$1>20000' | sort -k1n,1 41649 ReadPosRankSum=0.731 41760 ReadPosRankSum=0.550 46305 ReadPosRankSum=0.720 47060 ReadPosRankSum=0.00 87348 ReadPosRankSum=0.406 105254 ReadPosRankSum=0.736 116426 ReadPosRankSum=0.727 164855 ReadPosRankSum=0.358 zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep "MQ=" | sort | uniq -c | awk '$1>5000' | sort -k1n,1 5802 MQ=57.05 8382 MQ=29.00 8525 MQ=56.62 10069 MQ=51.77 10574 MQ=53.95 10682 MQ=47.12 10818 MQ=56.04 11553 MQ=55.21 802603 MQ=60.00 zcat out_GenotypeGVCFs/chrom20.vcf.gz | grep -v ^# | cut -f8 | tr ";" "\n" | grep MQRankSum | sort | uniq -c | awk '$1>20000' | sort -k1n,1 21511 MQRankSum=-7.360e-01 27222 MQRankSum=0.322 33699 MQRankSum=0.550 34481 MQRankSum=0.731 37603 MQRankSum=0.720 60729 MQRankSum=0.00 76031 MQRankSum=0.406 85812 MQRankSum=0.736 98519 MQRankSum=0.727 186092 MQRankSum=0.358