VQSR error - no MQ annotation detected

Kelly135Kelly135 KoreaMember
edited February 2016 in Ask the GATK team

Hi, I am trying to run VQSR and an error occurred.

Here are my commands
java -Xmx240g -jar GenomeAnalysisTK.jar \
-T VariantRecalibrator \
-R /ref/ucsc.hg19.fasta \
-input input_raw.vcf \
-recalFile snp.recal.vcf \
-tranchesFile snp.tranches \
-rscriptFile recalibrate_SNP_plots.R \
-nt 8 \
-resource:hapmap,known=false,training=true,truth=true,prior=15.0 /ref/hapmap_3.3.hg19.sites.vcf \
-resource:omni,known=false,training=true,truth=true,prior=12.0 /ref/1000G_omni2.5.hg19.sites.vcf \
-resource:1000G,known=false,training=true,truth=false,prior=10.0 /ref/1000G_phase1.snps.high_confidence.hg19.sites.vcf \
-resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /ref/dbsnp_138.hg19.vcf \
-an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an InbreedingCoeff \
-mode SNP \
-tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \

and error messages.

ERROR MESSAGE: Bad input: Values for MQ annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.

However, there MQ are annotated in my input files. Sample size is large enough (around 150 individuals) for VQSR and a combined vcf file of these samples has around 280,000 variants (it's exome data). Genome build version checked with my input data and reference, and GATK version is 3.5

I know similar errors have been reported before, but couldn't find the right solution.

Tagged:

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Can you check what is the size of the overlap between your dataset and your training set?

  • Kelly135Kelly135 KoreaMember

    Thanks always, @Geraldine_VdAuwera

    I used three training sets as recommended. And there are overlapping sites when comparing with all three files as below.

    For hapmap,
    Comparing sites in VCF files...
    Found 60138 sites common to both files.
    Found 226734 sites only in main file.
    Found 4105183 sites only in second file.
    Found 238 non-matching overlapping sites.
    After filtering, kept 287110 out of a possible 287110 Sites

    For omni,
    Found 121091 sites common to both files.
    Found 165520 sites only in main file.
    Found 28474851 sites only in second file.
    Found 499 non-matching overlapping sites.
    After filtering, kept 287110 out of a possible 287110 Sites

    For indels,
    Found 6528 sites common to both files.
    Found 279620 sites only in main file.
    Found 1436024 sites only in second file.
    Found 962 non-matching overlapping sites.
    After filtering, kept 287110 out of a possible 287110 Sites

  • Kelly135Kelly135 KoreaMember

    In the hapmap file I used (in resource bundle), there is no MQ. It could be the cause of this error?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    No, the program doesn't use any annotations from the resource files, only the positions. The annotations are taken from your own files. Could you please post a few records from your dataset? Also, try running without multithreading arguments.

  • Kelly135Kelly135 KoreaMember

    Sorry, when I checked my vcf again, there is no "MQ" in info field. (I was confused with "MQRankSum")

    So I tried to add "MQ" with variantAnnotator, but there is no "MQ" in the available annotations. (using --list option)
    Available annotations regarding "MQ" are only...
    AS_MappingQualityRankSumTest
    AS_RMSMappingQuality
    *MappingQualityRankSumTest
    MappingQualityZero
    *RMSMappingQuality
    MappingQualityZeroBySample
    LowMQ

    Anyway, I removed "MQ" and then it works.

  • Kelly135Kelly135 KoreaMember
    edited February 2016

    Thanks!

    Post edited by Kelly135 on
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Kelly135
    Hi,

    Just a heads up, MQ is called RMSMappingQuality :smile:

    -Sheila

Sign In or Register to comment.