Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Emitting variants with low mapping quality

pinnarocpinnaroc University of IowaMember
Hello,

I am validating a new exome sequencing analysis pipeline using in-house-generated data from sample NA12878 and the Genome in a Bottle high confidence calls for this sample. I am using GATK 4.1.0.0. I have controlled for the different target regions (ie. only comparing calls within the exome) and further limited the comparison to just those regions in which the in-house data has 10x fold coverage or more. I am using the Real Time Genomics vcfeval tool to compare the calls from my pipeline to the high confidence calls from the GIB consortium. My precision is over 99%, however, my sensitivity is not as high as I would have thought at only 96%. I looked into the false negatives more, and the vast majority of them look like this in IGV (attached picture below). I realize that all the reads in this region have a mapping quality of zero. I’m sure this is in part due to the fact that the in house generated data was 150bp paired end Illumina reads and GIB had access to data from several long read platforms, thus, they were probably able to better map these regions and make these calls. I also realize that under normal circumstances I wouldn’t want to put much confidence into a variant call when it is based entirely on reads with mapping quality of zero. That said, there are approximately 700 of these variants that GIB says are real, and I can’t get HaplotypeCaller to call. These variants don’t show up in my VCF file, and I nothing I’ve tried has been able to get HaplotypeCaller to emit these variants. I’ve tried
-RF MappingQualityReadFilter \
--minimum-mapping-quality 0 \. I have also tried making this a negative number.

I have also disabled some of the filters:
--disable-read-filter NotSecondaryAlignmentReadFilter \
--disable-read-filter MappingQualityReadFilter \
--disable-read-filter MappingQualityNotZeroReadFilter \

to no avail.

Thus, is there any way I can get HaplotypeCaller to emit these variants? I know it will likely blow up my false positives but I’d like to be able to examine these variants in more detail to see if there are features that would allow us to retain these MQ=0 variants that are real but discard those that are not. It may be a wild goose chase and in the end I’ll probably end up using an older version of GATK and CallableLoci to discard these regions from consideration. But now I am committed to this question.

Thanks,
Cat

Best Answer

Answers

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hi @pinnaroc ! Looking at your question and I will get back to you shortly.

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    One thing that I noticed that ARHGEF locus should not have too many mapq0 even under hg19 b37. So one thing that might be the issue is the lack of alt file for hg38 during bwa mem step. Can you check if hg38 alt file is present and alt aware mapping was enabled back when you aligned your reads?

Sign In or Register to comment.