Differentiating between uncalled reference alleles and sites with insufficient read depth

ChrisPattersonChrisPatterson Epilepsy Genomics CenterMember

We are working with both WES and Genome Scan data on large families. Our genome scan has help us narrow down our search to a 2 Mb sequence that is shared IBD between all the Affecteds in one of our families. However, we've used both the UnifiedGenotyper and HaplotypeCaller to call VCFs in this region and the search fails to find any variants that segregate with the Affecteds sequenced. Unfortunately, I can't differentiate between loci that aren't emitted to the VCF because they were called as having a reference allele, or if they are absent because the samples didn't have enough reads align to that site. Is there anyway to call a VCF file, on a base-by-base level, so that we can check Genotype, Allele Depth, Genotype Quality, etc for each of our samples within this range, whether the variant called is a reference or alternate allele?

The DepthOfCoverage tool seems to have some of this functionality, but only provides a summary of the read depth at a base-by-base resolution. DiagnoseTargets only allows for analysis over aggregate intervals.

I apologize if this is a stupid question. I've been working with GATK for about a month now, so I am still very new.

Best Answers


  • ChrisPattersonChrisPatterson Epilepsy Genomics CenterMember

    I ended up downloading a BED file of all the Ensembl exonic regions from UCSC and used it to call VCFs with BP_RESOLUTION for 5 samples. I used the same BED file for each of the samples, however, each VCF ended up calling a different number of loci. If BP_RESOLUTION attempts to call every base within a given interval, whether the specific sample has reads aligned to the locus or not, shouldn't all of the VCFs have the same number of records?

Sign In or Register to comment.