VQSR in dog

Hi team! I am working on running a bunch of dogs through GATK variant calling in Terra, as many of you know. I'm grappling with VQSR at this point. I have compiled a bunch of dog variant resources and was hoping you folks would be willing to offer opinions on my current plan for using them for VQSR.

I'm working with the joint-discovery-gatk4 WDL, v13, as acquired from the Terra library. I have removed the human-specific variant resources (hapmap, omni, etc). I have compiled to replace them, in order from most to least confidence:

  • axiom_klab: the intersection of the klab variants (see next entry) with Axiom 1.2 million array variants. These are our highest confidence variants.
  • klab: variants called by Karlsson lab on 20-30X dogs, filtered with hard filters
  • ostrander435: variants called by Ostrander lab on 20-30X dogs, filtered with VQSR (method details unknown, many more variants than the klab compilation so I assume more sensitive / less specific)
  • broad: variants from the Broad track on UCSC
  • axelsson: variants called for Axelsson et al., 2014
  • dogsd: variants downloaded from http://bigd.big.ac.cn/dogsdv2/

I'm putting these in to VariantRecalibrator calls in the WDL as below:

      --resource:axiom_klab,known=false,training=true,truth=true,prior=10 ${axiom_klab_vcf} \
      --resource:klab,known=false,training=true,truth=false,prior=8 ${klab_vcf} \
      --resource:ostrander435,known=false,training=true,truth=false,prior=7 ${ostrander435_vcf} \
      --resource:broad,known=false,training=true,truth=false,prior=5 ${broad_vcf} \
      --resource:axelsson,known=false,training=true,truth=false,prior=5 ${axelsson_vcf} \
      --resource:dogsd,known=true,training=false,truth=false,prior=2 ${dogsd_vcf}

I would love feedback on whether my decisions are generally sensible here or if I am completely missing the boat on how VQSR is supposed to work.



