Error in running VQSR

blueskypyblueskypy Member ✭✭
edited October 2013 in Ask the GATK team

I used a cohort in HC and VQSR. Each of my cohort contains 50 reduced bam files. The HC step is OK. But I got errors in two cohorts:

VariantRecalibration gives the following error for cohort 1:

"Bad input: Values for MQRankSum annotation not detected for ANY training variant in the input callset."

In a separate running using cohort 2, VariantRecalibration gives the following error:

"Bad input: Found annotations with zero variance. They must be excluded before proceeding."

Could anyone help?

Thanks a lot!

Answers

  • blueskypyblueskypy Member ✭✭

    btw, I'm using GATK version 2.7-2-g6bda569

  • LaviniaLavinia Member

    If you rerun the VariantRecalibration without -A MQRankSum then it should be ok.

  • blueskypyblueskypy Member ✭✭

    Thanks for the help, Lavinia! What about the 2nd error?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    If you rerun the VariantRecalibration without -A MQRankSum then it should be ok.

    Unfortunately if you just do this you may be underpowering the recalibration model. MQRankSum is one of the core annotations that we recommend using, so I would recommend going to the trouble of finding out why it's not working. Have you checked that your variants have the annotation?

  • blueskypyblueskypy Member ✭✭
    edited October 2013

    Hi, Geraldine,
    Thanks for the help! Here is the command I use:

    java -Xmx4g -jar $gatkDir/GenomeAnalysisTK.jar -nt $nThreads -T VariantRecalibrator \
     -R $refGenome \
     -input $cohortID.raw.snp.vcf \
     -resource:hapmap,known=false,training=true,truth=true,prior=15.0 $snp_hapMap \
     -resource:omni,known=false,training=true,truth=false,prior=12.0 $snp_omni1kG \
     -resource:1000G,known=false,training=true,truth=false,prior=10.0 $snp_g1k \
     -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 $dbSNP \
     -an QD -an MQRankSum -an ReadPosRankSum -an FS -an DP \
     -mode SNP \
     -recalFile $cohortID.snp.recal \
     -tranchesFile $cohortID.snp.tranches \
     -rscriptFile $cohortID.snp.plots.R
    

    Here the $refGenome and $cohortID.raw.snp.vcf are only chr22.
    Some variants in $cohortID.raw.snp.vcf have MQRankSum, some don't. The error msg says "training variant", so I wonder whether it's complaining about the -input file or the -resource files?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah, the issue may just be that because you're only running on chr22. You may not have any variants with the annotation that overlap with the training sets. That can happen when running on subsets of the genome. We generally don't recommend running VQSR per chromosome, even just for testing.

  • blueskypyblueskypy Member ✭✭

    The cohort consists of 50 Caucasian exome seq samples from 1KG. Is it likely that the whole chr22 doesn't have any variants with the annotation that overlap with the training sets? Or is it because I used samtools index to index the reduced bam files, instead of using the produced index file from ReduceReads? I didn't know ReduceReads also create an index file.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    edited October 2013

    I don't think the separate indexing has anything to do with your problem, no. You should really be running VQSR on the whole exome, not per chromosome.

Sign In or Register to comment.