GATK 4.0.11.0 Variant Recalibrator ERROR

Could someone please provide me with a help to run Variant Recalibrator for GATK4.0.11.0?
when running the tool using GATK 4.0.11.0 with the following command line:

time ~/gatk-4.0.11.0/gatk VariantRecalibrator
-R ~/reference/hg19.fa -V ~/MT-1/outname.HC.vcf.gz
--resource hapmap,known=false,training=true,truth=true,prior=15.0:~/reference/hg19/hapmap_3.3.hg19.sites.vcf
--resource omni,known=false,training=true,truth=false,prior=12.0:~/reference/hg19/1000G_omni2.5.hg19.sites.vcf
--resource 1000G,known=false,training=true,truth=false,prior=10.0:~/reference/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf
--resource dbsnp,known=true,training=false,truth=false,prior=6.0:~/reference/hg19/dbsnp_138.hg19.vcf
--use-annotation DP --use-annotation QD --use-annotation FS --use-annotation SOR --use-annotation ReadPosRankSum --use-annotation MQRankSum
--mode SNP
--truth-sensitivity-tranche 100.0 --truth-sensitivity-tranche 99.9 --truth-sensitivity-tranche 99.0 --truth-sensitivity-tranche 95.0 --truth-sensitivity-tranche 90.0
--rscript-file ~/MT-1/outname.HC.snps.plots.R
--tranches-file ~/MT-1/outname.HC.snps.tranches
--output ~/MT-1/outname.HC.snps.recal

I met this questiion: A USER ERROR has occurred: Couldn't read file file:///home/chenjie1/~/reference/hg19/hapmap_3.3.hg19.sites.vcf. Error was: It doesn't exist

The command syntax follows the same pattern as version 4.0.9.0.
Has the syntax been changed for GATK version 4.0.11.0?
Thanks.
Best regards.

Best Answers

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @JieChen

    The error looks like it has to do with the hapmap vcf file and not the syntax.
    Would you please post the result of ls -l ~/reference/hg19
    Thank you.

    Regards
    Bhanu

  • JieChengJieCheng Member

    Hi @Bhanu

    Here are the result of ls -l ~/reference/hg19.

    total 21673492
    208737488 11月 2 13:36 1000G_omni2.5.hg19.sites.vcf
    1529200 11月 2 13:27 1000G_omni2.5.hg19.sites.vcf.idx
    242018150 11月 2 13:36 1000G_phase1.indels.hg19.sites.vcf
    1238920 11月 2 13:29 1000G_phase1.indels.hg19.sites.vcf.idx
    7398865818 11月 5 09:49 1000G_phase1.snps.high_confidence.hg19.sites.vcf
    10796220779 11月 3 13:14 dbsnp_138.hg19.vcf
    12381528 11月 2 13:38 dbsnp_138.hg19.vcf.idx
    238391040 11月 6 11:49 hapmap_3.3.hg19.sites.vcf
    2618906 11月 2 13:40 hapmap_3.3.hg19.sites.vcf.idx
    90196895 11月 2 13:45 Mills_and_1000G_gold_standard.indels.hg19.sites.vcf
    1484596 11月 2 13:46 Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.idx
    12689 11月 3 15:07 ucsc.hg19.dict
    3199905909 11月 3 16:35 ucsc.hg19.fasta
    3534 11月 2 13:24 ucsc.hg19.fasta.fai
    1430 11月 2 17:12 wget-log
    3812 11月 2 17:32 wget-log.1

    Thank you.

    Best regards.
    JieCheng

  • JieChengJieCheng Member

    Hi,@Bhanu

    I found that maybe I found the cause of this problem. This is because I'm not getting shell expansion of the "~" now because its embedded. I tried expanding that manually on the command line. It worked.

    But I met a new problem. It is

    A USER ERROR has occurred: Input /home/chengjie1/reference/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf must support random access to enable queries by interval. If it's a file, please index it using the bundled tool IndexFeatureFile

    I tried to find a way to solve this problem online, but it seems that there is no good result. Could you please give me some guidance on this issue? Many thanks.

    Best regards.
    Jie

  • JieChengJieCheng Member

    Hi @Bhanu
    It works.
    Thank you for your help.

    Best regards.

    Jie

  • JieChengJieCheng Member

    Hi @Bhanu
    Sorry to ask you again. Do you know how to call GATK on the GPU? My lab wants to try to use GATK on the GPU to see if there is any way to get GATK to complete the genetic analysis process faster. Could you please provide some help in this direction? Thank you again.

    Best regards.
    Jie

  • JieChengJieCheng Member

    Hi @bhanuGandham

    Thank you for your help.
    Best regards.

    Jie

  • VipulVipul Member
    Hello, @bhanuGandham I am trying to run HaplotypeCaller on Exome data, with the UCSC hg19 reference genome. Also, I wanted run against dbsnp.vcf (which I downloaded from the NCBI site) I also created the index file of the dbSNP.vcf file (.idx file was generated). Now when I ran HaplotypeCaller using the command:

    ./gatk HaplotypeCaller -R /home/vipul/vipul/wes/hg19.fa -I /home/vipul/vipul/wes/IITK-P4-TD/IITK-P4-TD.recal.bam --dbsnp /home/vipul/vipul/wes/dbSNP_ALL_hg19_151_contig_modified.vcf -O variantsP4TDSNP.vcf

    I am getting an error:

    ***********************************************************************

    A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found.
    reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21, chr6_ssto_hap7, chr6_mcf_hap5, chr6_cox_hap2, chr6_mann_hap4, chr6_apd_hap1, chr6_qbl_hap6, chr6_dbb_hap3, chr17_ctg5_hap1, chr4_ctg9_hap1, chr1_gl000192_random, chrUn_gl000225, chr4_gl000194_random, chr4_gl000193_random, chr9_gl000200_random, chrUn_gl000222, chrUn_gl000212, chr7_gl000195_random, chrUn_gl000223, chrUn_gl000224, chrUn_gl000219, chr17_gl000205_random, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chr9_gl000199_random, chrUn_gl000211, chrUn_gl000213, chrUn_gl000220, chrUn_gl000218, chr19_gl000209_random, chrUn_gl000221, chrUn_gl000214, chrUn_gl000228, chrUn_gl000227, chr1_gl000191_random, chr19_gl000208_random, chr9_gl000198_random, chr17_gl000204_random, chrUn_gl000233, chrUn_gl000237, chrUn_gl000230, chrUn_gl000242, chrUn_gl000243, chrUn_gl000241, chrUn_gl000236, chrUn_gl000240, chr17_gl000206_random, chrUn_gl000232, chrUn_gl000234, chr11_gl000202_random, chrUn_gl000238, chrUn_gl000244, chrUn_gl000248, chr8_gl000196_random, chrUn_gl000249, chrUn_gl000246, chr17_gl000203_random, chr8_gl000197_random, chrUn_gl000245, chrUn_gl000247, chr9_gl000201_random, chrUn_gl000235, chrUn_gl000239, chr21_gl000210_random, chrUn_gl000231, chrUn_gl000229, chrM, chrUn_gl000226, chr18_gl000207_random]
    features contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT]

    ***********************************************************************

    I am unable to troubleshoot, how to create a right index file of dbSNP.vcf.
    I know the fact that UCSC hg19.fa file has "Chr" added to its indexed or fasta file, so is not the case with dbSNP fasta files (I guess).

    Your help would be of great help. I am new to this WES analysis. Look forward to your response.
  • bshifawbshifaw moonMember, Broadie, Moderator admin

    ^ Vipul question is also posted here

  • LindaLinda Member
    Please help me!!!!
    I am running VariantRecalibrator for GATK4.1.1.0. The command line are as follows:

    time $gatk --java-options "-Xmx$mem -Djava.io.tmpdir=./" VariantRecalibrator \
    -V $sample.raw.germline.vcf \
    -O $sample.snp.recal \
    -R $gatk_bundle/Homo_sapiens_assembly38.fasta \
    -mode SNP \
    --resource hapmap,known=false,training=true,truth=true,prior=15.0:$gatk_bundle/hapmap_3.3.hg38.vcf \
    --resource omni,known=false,training=true,truth=true,prior=12.0:$gatk_bundle/1000G_omni2.5.hg38.vcf \
    --resource 1000G,known=false,training=true,truth=false,prior=10.0:$gatk_bundle/1000G_phase1.snps.high_confidence.hg38.vcf \
    --resource dbsnp,known=true,training=false,truth=false,prior=2.0:$gatk_bundle/dbsnp_146.hg38.vcf \
    -an DP -an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum \
    -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \
    --max-gaussians 8 \
    --tranches-file $sample.snprecal.tranches \
    --rscript-file $sample.snprecal.plots.R && echo "SNP VariantRecalibrator done"


    But it always get the error:
    "A USER ERROR has occurred: Couldn't read file file:///data2/gminix/project_new/fangling/NA12878/data/hapmap,known=false,training=true,truth=true,prior=15.0:/data2/gminix/project_new/fangling/database/gatk_bundle/hapmap_3.3.hg38.vcf. Error was: It doesn't exist."

    This is my list in $gatk_bundle:
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @Linda

    The syntax for specifying argument tags has changed (and the documentation was out of sync for a while, though it is now fixed). The tags must now be specified with the argument name, not with the argument value, like this:

    --resource:hapmap,known=false,training=true,truth=true,prior=15.0 /trainee/ref/hapmap_3.3.hg38.vcf

    Note that the ":" and tags are listed with the argument name ("-resource"), not with the file name.

Sign In or Register to comment.