To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

SelectVariants modifies VCF entries, keeping only the base calls intact.

I am using GATKv3.5. I used SelectVariants as shown below to remove 11 samples from a vcf file:

java -jar GenomeAnalysisTK.jar -T SelectVariants -R reference.fasta -V all_samples.vcf -xl_sn sample90 -xl_sn sample91 -xl_sn sample92 -xl_sn samples93 -xl_sn sample94 -xl_sn sample95 -xl_sn sample96 -xl_sn sample97 -xl_sn sample98 -xl_sn sample99 -xl_sn sample100 -o subset_samples.vcf

However, when I compare the SNPs between the original VCF and the subset VCF, the 0/0, 0/1, 1/1 genotype calls remain the same, but the AD, DP, GQ, and PL change to the point of nonsense. e.g. a 0,45 AD is called 0/1 (heterozygous). This is the correct call from the original file, where the AD is 53,24, but based on the 0,45 is should be 1/1. As long as the base calls themselves are correct, this shouldn’t cause any downstream errors, but I can’t be sure this is the case. Has anyone else had this error?

KB222897.1 10810 . C T 113425.71 . AC=102;AF=0.359;AN=284;BaseQRankSum=0.698;ClippingRankSum=0.029;DP=8522;ExcessHet=87.0598;FS=0.000;InbreedingCoeff=-0.4686;MLEAC=102;MLEAF=0.359;MQ=41.97;MQRankSum=-1.540e-01;QD=18.08;ReadPosRankSum=0.132;SOR=0.682 GT:AD:DP:GQ:PL 0/1:53,24:77:99:705,0,1819 0/1:39,16:55:99:470,0,1231 0/1:29,21:50:99:589,0,973

KB222897.1 10810 SKB222897.1_10810 C T . PASS AC=96;AF=0.366;AN=262;BaseQRankSum=0.698;ClippingRankSum=0.029;DP=8027;ExcessHet=87.0598;FS=0.000;InbreedingCoeff=-0.4686;MQ=41.97;MQRankSum=-1.540e-01;QD=18.08;ReadPosRankSum=0.132;SOR=0.682;DP=6516 GT:AD:DP:GQ:PL 0/1:0,45:45:99:255,135,0 0/1:0,48:48:99:255,144,0 0/1:0,44:44:99:255,132,0


Best Answer


Sign In or Register to comment.