Missing data in HaplotypeCaller

Los Angeles

I have an issue with HaplotypeCaller and its use to call SNPs in RAD data.

I recently used HaplotypeCaller to call SNPs in 600 samples + 50 subspecies samples. Worked fine.
After adding more data to 1200 samples + 120 subspecies samples, ~100 of these subspecies results in 0 calls and just missing data "./." at these loci. Recall that some of these samples and sites were analyzed and called in the first analysis. Any ideas why?

My setup:
Merged into 1 bam file with proper RG headers.
Should I not have merged? Should I not have Reduced Reads?





Merging and reducing should not have any effect on the calling, so I wonder if there's something else going on. Try running HC on the unmerged files over one of the intervals where you're seeing missing calls. See if you get your calls then.

Also, can you confirm that all of your samples were processed with the same version of GATK at every step?

Los Angeles

Yes, I can confirm that the same version 2.7-2 was used everywhere.

I have been able to call SNPs in all samples using a pre-BQSR non-Read-Reduced merged bam. I am now trying this on a pre-BQSR Read Reduced merged bam (since the earlier was taking too long).

I have a feeling it has something to do with the BQSR. I will try calling SNPs post-BQSR to confirm this and will repost when I know more.