If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.
Workflow for fungal WGS
I'm in a bit of a daze going through all the documentation and I wanted to do a sanity check on my workflow with the experts. I have ~120 WGS of a ~24Mb fungal pathogen. The end-product of my GATK workflow would be a high quality call set of SNPs, restricted to the sites for which we have confidence in the call across all samples (so sites which are not covered by sufficient high quality reads in one or more samples will be eliminated).
Therefore my workflow (starting from a sorted indexed BAM file of reads from a single sample, mapped to reference with
bwa mem) is this:
- 01- BAM: Local INDEL realignment (
- 02- BAM:
- 03- BAM: Local INDEL realignment second pass (
- 04- BAM: Calling variants using
- 05- VCF: Hard filter variants for truth set for BQSR (there is no known variant site databases so we can use our best variants from each VCF file for this). The filter settings are:
"QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0"and we also filter out heterozygous positions using "isHet == 1".
- 06- VCF: Calculate BQSR table using the high quality hard-filtered variants from step 05.
- 07- BAM: Apply BQSR recalibration from previous step to BAM file from step 04.
- 08- BAM: Calling variants on recalibrated BAM file from previous step using HaplotypeCaller, also emitting reference sites using
--output_mode EMIT_ALL_SITES \and
--emitRefConfidence GVCF \
Does this sound like a reasonable thing to do? What options should I use in step 8 in order for
HC to tell me how confident it is, site-by-site about it's calls, including those that are homozygous reference? I notice that when using
--output_mode EMIT_ALL_CONFIDENT_SITES \ and
--emitRefConfidence GVCF \ I am missing a lot of the annotation I get when just outputting variant sites (e.g. QD).