It looks like you're new here. If you want to get involved, click one of these buttons!
bishwo
Posts: 11Member ✭
I had annotated raw indel file (given by UnifiedGenotyper), 1000G_omni2.5.b37.sites.vcf and hapmap_3.3.b37.sites.vcf with all possible annotations including QD (QualByDepth) using VariantAnnotator. However, i got an error when i tried to run VariantRecalibrator. It was complaing that QD has not been found in training variant. Is QD important annotation for indel filtering. Can it be ignored ?
P.S. - i did not use sample bam file while annotating training data set.
.
.
.
INFO 15:11:55,999 RMDTrackBuilder - Loading Tribble index from disk for file NCBI_dbsnp_for_GATK.vcf
INFO 15:12:21,650 TraversalEngine - chr1:128346793 1.98e+07 30.0 s 1.5 s 4.1% 12.1 m 11.6 m
INFO 15:12:51,650 TraversalEngine - chr9:130658800 5.26e+07 60.0 s 1.1 s 53.9% 111.2 s 51.2 s
INFO 15:13:13,618 VariantDataManager - QD: mean = NaN standard deviation = NaN
INFO 15:13:16,417 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.1-13-g1706365):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations. See http://www.broadinstitute.org/gsa/wiki/index.php/VariantAnnotator
##### ERROR ------------------------------------------------------------------------------------------
Answers
A good rule of thumb is that if the program refuses to run without a certain input, then yes, that input is important... ;)
According to the error message, the file that does not have the annotations is this one:
NCBI_dbsnp_for_GATK.vcf. You don't mention it in the files you annotated. Have you tried annotating it?Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •NCBI_dbsnp_for_GATK.vcf is dbsnp file which has not been used as training set (training=false). Is it important to annotate non-training data set as well ?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •All the variants that are going to be used in the model must have the necessary annotations. Please see the documentation for full details on how this tool works and what inputs must be given.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I annotated NCBI_dbsnp_for_GATK.vcf , but i still got an error.
. . . INFO 13:15:42,304 RMDTrackBuilder - Loading Tribble index from disk for file NCBI_dbsnp_for_GATK-annotated.vcf INFO 13:15:42,599 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING] INFO 13:15:42,600 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining INFO 13:16:07,855 TraversalEngine - chr1:85835656 1.50e+07 30.0 s 2.0 s 2.8% 18.0 m 17.5 m INFO 13:16:37,855 TraversalEngine - chr3:128202943 3.21e+07 60.0 s 1.9 s 20.0% 5.0 m 4.0 m INFO 13:17:07,856 TraversalEngine - chr8:12438529 4.77e+07 90.0 s 1.9 s 45.4% 3.3 m 108.3 s INFO 13:17:37,857 TraversalEngine - chr14:37574960 6.33e+07 2.0 m 1.9 s 72.3% 2.8 m 46.0 s INFO 13:18:05,298 VariantDataManager - QD: mean = NaN standard deviation = NaN INFO 13:18:08,441 GATKRunReport - Uploaded run statistics report to AWS S3
ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 2.1-13-g1706365):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations. See http://www.broadinstitute.org/gsa/wiki/index.php/VariantAnnotator
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Can you post your command line?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Command Line
Output
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Oh, I see what's going on: you are not following our best practices recommendations. Please go back and read them, especially as it concerns the statistical filtering of indels.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •According to best practices recommendations for statistical filtering, i need to select one of the following option because i have small whole exome sample. 1. adding more sample 2. Running VSQR with the arguments --maxGaussians 4 --percentBad 0.12 3. using hard filters
I selected the second option. I added --maxGaussians 4 and --percentBad 0.12 .
I need to use training set Mills_and_1000G_gold_standard.indels.b37.sites.vcf
I am still doing in a wrong way? Please correct me if i understood wrong.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •You need to read this page: http://gatkforums.broadinstitute.org/discussion/1259/what-vqsr-training-sets-arguments-should-i-use-for-my-specific-project
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thanks !
I did according to the given link. However i got yet another problem. I have only one exome sample to call indel and filter them. Am i getting this error because of small sample data ?
##### ERROR MESSAGE: Bad input: Error during negative model training. Minimum number of variants to use in training is larger than the whole call set. One can attempt to lower the --minNumBadVariants arugment but this is unsafe.I tried to decrease the value of --minNumBadVariants argument. When i run VariantRecalibrator with --minNumBadVariants 382 i got following error.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •If you are running with only one sample then you should not be using VQSR. You need to use hard filters.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •So, i need to use VariantFiltration. I did not understand what mask file is? Is it mandatory to provide mask file ?
I had used VQSR for SNP call for single sample. I did not get any error running VQSR for SNP. Shouldn't i use VQSR also for SNP call ?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •@bishwo, your previous runs were probably okay because you had enough SNPs, but you have fewer indels, so it doesn't work.
The mask file is not required for using VariantFiltration. To understand how to use this tool, please read the following documentation carefully:
http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_filters_VariantFiltration.html
http://www.broadinstitute.org/gatk/guide/article?id=51
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thanks for the information.
Now, i have used hard filters for indels. Even after using hard filters i got 0% of indels filtered. For snp i have used VQSR. I got only 10% of snps filtered. It is usual to get almost the same number of snps/indels even after filtering step ? In my opinion it is not worth doing filtering.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •