Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Differentiating between uncalled reference alleles and sites with insufficient read depth

ChrisPattersonChrisPatterson Epilepsy Genomics CenterMember

We are working with both WES and Genome Scan data on large families. Our genome scan has help us narrow down our search to a 2 Mb sequence that is shared IBD between all the Affecteds in one of our families. However, we've used both the UnifiedGenotyper and HaplotypeCaller to call VCFs in this region and the search fails to find any variants that segregate with the Affecteds sequenced. Unfortunately, I can't differentiate between loci that aren't emitted to the VCF because they were called as having a reference allele, or if they are absent because the samples didn't have enough reads align to that site. Is there anyway to call a VCF file, on a base-by-base level, so that we can check Genotype, Allele Depth, Genotype Quality, etc for each of our samples within this range, whether the variant called is a reference or alternate allele?

The DepthOfCoverage tool seems to have some of this functionality, but only provides a summary of the read depth at a base-by-base resolution. DiagnoseTargets only allows for analysis over aggregate intervals.

I apologize if this is a stupid question. I've been working with GATK for about a month now, so I am still very new.

Best Answers


  • ChrisPattersonChrisPatterson Epilepsy Genomics CenterMember

    I ended up downloading a BED file of all the Ensembl exonic regions from UCSC and used it to call VCFs with BP_RESOLUTION for 5 samples. I used the same BED file for each of the samples, however, each VCF ended up calling a different number of loci. If BP_RESOLUTION attempts to call every base within a given interval, whether the specific sample has reads aligned to the locus or not, shouldn't all of the VCFs have the same number of records?

Sign In or Register to comment.