Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

AD in VCF doesn't match BAM

kippkipp New York CityMember

Hello,

I'm using GATK to call variants in my RNA-Seq data. I'm noticing something strange, perhaps someone can help? For a number of sites the VCF is reporting things I cannot replicate from BAMs. How can I recover the reads that contribute to a variant call? Here is an example for 1 site in 1 sample, but I've observed this at many sites/samples:

$ grep 235068463 file.vcf 
chr1    235068463   .   T   C   1795.77 .   AC=1;AF=0.500;AN=2;BaseQRankSum=-3.530;ClippingRankSum=-0.535;DP=60;FS=7.844;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=0.401;QD=29.93;ReadPosRankSum=3.557   GT:AD:DP:GQ:PL  0/1:5,55:60:44:1824,0,44

60 reads, 5 T, 55 C.
But loading the bam in IGV, I do not see any T reads. Similarly:

$ samtools view -uh file.md.realn.bam chr1:235068463-235068463 |samtools mpileup - |grep 235068463
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
chr1    235068463   N   60  cCCccccCCCcccccCcccccccccCCCccCCCCCcCcccccCCCcCcCCccCCCCccCC    >[email protected]@>A>[email protected]:@@[email protected]?>A?CBBAAAABA

There are just 60 C's at that location. How do I decide what the genotype here is? C/C or C/T ?

For methodology I'm using gatk/3.2.0. I tried using HC from gatk/3.3.1 and got the same result. The bam and vcf files come from the final two lines:
-2 pass STAR
-Mark Dups
-SplitNCigarReads
-RealignerTargetCreator
-IndelRealigner
-BaseRecalibrator
-PrintReads
-MergeSamFiles.jar
-Mark Dups
-RealignerTargetCreator
-IndelRealigner
-HaplotyeCaller

Thanks,
Kipp

Comments

Sign In or Register to comment.