If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
AD in VCF doesn't match BAM
I'm using GATK to call variants in my RNA-Seq data. I'm noticing something strange, perhaps someone can help? For a number of sites the VCF is reporting things I cannot replicate from BAMs. How can I recover the reads that contribute to a variant call? Here is an example for 1 site in 1 sample, but I've observed this at many sites/samples:
$ grep 235068463 file.vcf chr1 235068463 . T C 1795.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-3.530;ClippingRankSum=-0.535;DP=60;FS=7.844;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=0.401;QD=29.93;ReadPosRankSum=3.557 GT:AD:DP:GQ:PL 0/1:5,55:60:44:1824,0,44
60 reads, 5 T, 55 C.
But loading the bam in IGV, I do not see any T reads. Similarly:
$ samtools view -uh file.md.realn.bam chr1:235068463-235068463 |samtools mpileup - |grep 235068463 [mpileup] 1 samples in 1 input files <mpileup> Set max per-file depth to 8000 chr1 235068463 N 60 cCCccccCCCcccccCcccccccccCCCccCCCCCcCcccccCCCcCcCCccCCCCccCC >[email protected]@>A>[email protected]:@@[email protected]?>A?CBBAAAABA
There are just 60 C's at that location. How do I decide what the genotype here is? C/C or C/T ?
For methodology I'm using gatk/3.2.0. I tried using HC from gatk/3.3.1 and got the same result. The bam and vcf files come from the final two lines:
-2 pass STAR