This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
SelectVariants modifies VCF entries, keeping only the base calls intact.
I am using GATKv3.5. I used SelectVariants as shown below to remove 11 samples from a vcf file:
java -jar GenomeAnalysisTK.jar -T SelectVariants -R reference.fasta -V all_samples.vcf -xl_sn sample90 -xl_sn sample91 -xl_sn sample92 -xl_sn samples93 -xl_sn sample94 -xl_sn sample95 -xl_sn sample96 -xl_sn sample97 -xl_sn sample98 -xl_sn sample99 -xl_sn sample100 -o subset_samples.vcf
However, when I compare the SNPs between the original VCF and the subset VCF, the 0/0, 0/1, 1/1 genotype calls remain the same, but the AD, DP, GQ, and PL change to the point of nonsense. e.g. a 0,45 AD is called 0/1 (heterozygous). This is the correct call from the original file, where the AD is 53,24, but based on the 0,45 is should be 1/1. As long as the base calls themselves are correct, this shouldn’t cause any downstream errors, but I can’t be sure this is the case. Has anyone else had this error?
KB222897.1 10810 . C T 113425.71 . AC=102;AF=0.359;AN=284;BaseQRankSum=0.698;ClippingRankSum=0.029;DP=8522;ExcessHet=87.0598;FS=0.000;InbreedingCoeff=-0.4686;MLEAC=102;MLEAF=0.359;MQ=41.97;MQRankSum=-1.540e-01;QD=18.08;ReadPosRankSum=0.132;SOR=0.682 GT:AD:DP:GQ:PL 0/1:53,24:77:99:705,0,1819 0/1:39,16:55:99:470,0,1231 0/1:29,21:50:99:589,0,973
KB222897.1 10810 SKB222897.1_10810 C T . PASS AC=96;AF=0.366;AN=262;BaseQRankSum=0.698;ClippingRankSum=0.029;DP=8027;ExcessHet=87.0598;FS=0.000;InbreedingCoeff=-0.4686;MQ=41.97;MQRankSum=-1.540e-01;QD=18.08;ReadPosRankSum=0.132;SOR=0.682;DP=6516 GT:AD:DP:GQ:PL 0/1:0,45:45:99:255,135,0 0/1:0,48:48:99:255,144,0 0/1:0,44:44:99:255,132,0