Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!
more reads in bamout file

Hi GATK team,
I'm using GATK v3.4 to call SNPs from RNA-Seq data. I generated a bamout file and viewed it in IGV, I expected to see the same or lower number of reads in the bamout file, but, strangely, I saw a lot of cases with higher reads in bamout than in the orignial. Do you know why this is happening? Thank you in advance!
Best Answers
-
tommycarstensen United Kingdom ✭✭✭
@maquezaihou Perhaps you have more reads in the bamout file at selected positions because of the local realignment carried out by HC.
-
Sheila Broad Institute admin
@maquezaihou
Hi,I think these two articles will help you.
https://www.broadinstitute.org/gatk/guide/article?id=4146
https://www.broadinstitute.org/gatk/guide/article?id=6005-Sheila
Answers
Oh, I think I know why now. Those extra reads in the bamout file are actually artificial haplotypes! But still, the number of reads in the bamout file is not consistent with vcf file.
@maquezaihou Perhaps you have more reads in the bamout file at selected positions because of the local realignment carried out by HC.
@maquezaihou
Hi,
I think these two articles will help you.
https://www.broadinstitute.org/gatk/guide/article?id=4146
https://www.broadinstitute.org/gatk/guide/article?id=6005
-Sheila
@Sheila @tommycarstensen Thank you very much for the reply! I appreciate it! After reading the acticles, I have another question. If I understand it correct, HC does realignment and reassembly, so the bamout file should be representing the best alignment. HC will only report informative reads in the AD field. My question is if some reads are not informative to call haplotype, are they still good for evaluating the allelic imbalance? Maybe I should simply ask, can I use the number of reads from AD field for allele-specific expression statistic test? Thank you!
@maquezaihou
Hi,
You can use ASEReadCounter to get allele counts. https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_rnaseq_ASEReadCounter.php
Adding -drf DuplicateRead allows duplicate reads to be counted.
-Sheila
@Sheila Thanks for the reply. I tried ASEReadCounter a while ago, and I found that the read counts reported from ASEReadCounter and HC are very different. I think maybe more filters are applied by HC. So for allele counts analysis, you recommend to use ASEReadCounter than HC?
@maquezaihou
Ah, yes. You are correct that the reassembly step may change the reads counts and that Haplotype Caller does apply more filters than ASEReadCounter. However, we do recommend using ASEReadCounter with the DuplicateReadFilter disabled to get allele counts. You can always add the extra read filters Haplotype Caller applies by using -rf with the extra filter Haplotype Caller applies. https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_engine_CommandLineGATK.php#--read_filter
-Sheila