No 0/0 and I see 0/1, 1/1 and 1/2 GATK SNP call

Hi All,

I don't see any "0/0" call, the final "*_final_snp.vcf" file has "0/1, 1/1 and 1/2". I also loaded the two sample I run on IGV and saw a few instances. One sample called 0/1 and the second sample called 1/1 when both suppose to be called 0/1, when both samples have C and the ref has T in that locus. Any idea?

java -jar $HOME/bin/exome/GenomeAnalysisTK.jar --version

Line executed for single sample:

java$TEMP -jar -Xmx100g $HOME/bin/exome/GenomeAnalysisTK.jar -T PrintReads -R ref -I CJM1_realigned_reads_R.bam -BQSR CJM1_recal_data.table -o CJM1_recal_reads.bam -nct 27

java$TEMP -jar -Xmx100g $HOME/bin/exome/GenomeAnalysisTK.jar -T HaplotypeCaller -R $GENOME -I CJM1_recal_reds.bam -o CJM1_raw_variants_recal.vcf -nct 27

java$TEMP -jar -Xmx100g $HOME/bin/exome/GenomeAnalysisTK.jar -T SelectVariants -R $GENOME -V CJM1_raw_variats_recal.vcf -selectType SNP -o CJM1_raw_snps_recal.vcf

java$TEMP -jar -Xmx100g $HOME/bin/exome/GenomeAnalysisTK.jar -T VariantFiltration -R $GENOME -V CJM1_raw_snps_recal.vcf --filterExpression 'QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0 || SOR > 4.0' --filterName "basic_snp_filter" -o CJM1_filtered_snps_final.vcf

Lines Example:

chr1 265086 . A G 159.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.870;ClippingRankSum=0.000;DP=28;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=5.71;ReadPosRankSum=0.000;SOR=0.582 GT:AD:DP:GQ:PL 0/1:23,5:28:99:188,0,939

chr1 7620221 . A G 63.28 . AC=2;AF=1.00;AN=2;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=48.71;QD=21.09;SOR=1.179 GT:AD:DP:GQ:PL 1/1:0,3:3:9:91,9,0

chr10 18622015 . G A,T 570.77 . AC=1,1;AF=0.500,0.500;AN=2;BaseQRankSum=-1.146;ClippingRankSum=0.000;DP=17;ExcessHet=3.0103;FS=3.274;MLEAC=1,1;MLEAF=0.500,0.500;MQ=61.88;MQRankSum=0.617;QD=33.57;ReadPosRankSum=-1.436;SOR=0.595 GT:AD:DP:GQ:PL 1/2:1,12,4:17:99:599,107,156,441,0,517




    There are no "0/0"s because it is same to the reference. Otherwise, there would be tons of "0/0"s in your vcf file.

    Even if both samples have Cs in their reads and the Ref has T in the locus, it is diploid, so "CCCCTTTT" would become 0/1 and "CCCCCCCC" would become 1/1. I guess "CCCCCCCT" would also become 1/1 because the last "T" is probably an error during sequencing and "CCCCCCCT" basically should be "CCCCCCCC". It depends on GATK's cutoff.

    Hi Dereje,

    If you want the sites that are hom-ref as well, you will need to run HaplotypeCaller with -ERC then GenotypeGVCFs with -allSites.


