We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

VariantAnnotator

Hello,
I am running VariantAnnotator and noticed that from the annotation options in the input vcf header, only DP and FS are accepted in the command-line:
java -Xmx100g -jar GenomeAnalysisTK.jar -R /home/gp53/tophat/genome.fa -T VariantAnnotator -I ./input.bam -o ./output-gatk-varannot.vcf --variant ./input-gatk.vcf -A DepthOfCoverage -A FisherStrand --dbsnp ./dbsnp_137.hg19.vcf

Other annotation options like AC, AF are not recognized and VariantAnnotator crashes.
I tried looking for the available options with java -Xmx100g -jar GenomeAnalysisTK.jar -T VariantAnnotator --list, but instead of the list of available annotations, I get the usage.
Also, what are most important annotations to consider in VariantRecalibration? Are DP and FS good enough?

Thanks,
G.

Best Answers

Answers

  • Hello Geraldine,
    I followed your recommendations and ran the VariantAnnotator tool with the following command-line:
    java -Xmx15g -jar GenomeAnalysisTK.jar -R ./genome.fa -T VariantAnnotator -I ./.input.bam -o ./-gatk-varannot.vcf --variant ./input.vcf -A DepthOfCoverage -A FisherStrand -A DepthOfCoverage -A DepthPerAlleleBySample -A --dbsnp ./dbsnp_137.hg19.vcf

    I inspected the header of the variant annotated vcf file and read the following:
    'VariantContextWriterStub annotation=[DepthOfCoverage, FisherStrand, DepthOfCoverage, DepthPerAlleleBySample, HomopolymerRun]'

    Now, when I run the VariantRecalibrator tool, it crashes once it encounters the first annotated variant argument:
    Example: "Values for FisherS annotation not detected for ANY training variant in the input callset"

    My command-line here is the following:
    java -Xmx15g -jar GenomeAnalysisTK.jar -T VariantRecalibrator -R ./genome.fa -input gatk-varannot.vcf
    -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /home/gp53/gatk-bundle/hapmap_3.3.hg19.vcf
    -resource:omni,known=false,training=true,truth=false,prior=12.0 /home/gp53/gatk-bundle/1000G_omni2.5.hg19.vcf
    -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 /home/gp53/gatk-bundle/dbsnp_137.hg19.vcf
    -an DepthOfCoverage -an FisherStrand -an DepthOfCoverage -an DepthPerAlleleBySample -an HomopolymerRun
    -mode SNP -recalFile ./file.recal -tranchesFile ./file.tranches

    Help on this will be appreciated.
    Genaro

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Genaro,

    I believe this means that your callset hasn't been annotated with those annotations (that would normally be done during variant calling). You'll need to run VariantAnnotator on your callset to add the missing annotations before you can run VariantRecalibrator.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Oops, and now I read your message and realize that you already did that. Sorry, I tried going too fast.

    Can you post a couple of lines from your VCF? Are the annotations present at all?

  • genaro_pimientagenaro_pimienta Member
    edited February 2013

    Yes, I did run VariantAnnotator. However, from your response I went to look at the UnifiedGenotyper's usage and noticed that there is an -annotation option. Was about to rerun UnifiedGenotyper, including the desired annotations.
    Not sure that this will help.
    Here is a sampling of my annotated vcf file.

    <<<
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  tophat2-merge-ctl-1st-2nd
    chrM    2485    .       C       T       4275.77 .       AC=2;AF=1.00;AN=2;BaseQRankSum=1.697;DP=132;Dels=0.00;FS=0.000;HRun=0;HaplotypeScore=14.5589;MLEAC=2;MLEAF
    =1.00;MQ=50.00;MQ0=0;MQRankSum=-0.247;QD=32.39;ReadPosRankSum=-0.884    GT:AD:DP:GQ:PL  1/1:1,130:130:99:4304,367,0
    chrM    2619    .       A       G,T     3208.29 .       AC=1,1;AF=0.500,0.500;AN=2;BaseQRankSum=-0.070;DP=132;Dels=0.00;FS=6.267;HaplotypeScore=20.5326;MLEAC=1,1;
    MLEAF=0.500,0.500;MQ=50.00;MQ0=0;MQRankSum=-0.512;QD=24.31;ReadPosRankSum=-0.466        GT:AD:DP:GQ:PL  1/2:13,26,91:126:99:3546,2773,2934,562,0,520
    chrM    4587    .       T       C       4310.77 .       AC=2;AF=1.00;AN=2;DP=127;Dels=0.00;FS=0.000;HRun=1;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=50.00;MQ0=0
    ;QD=33.94       GT:AD:DP:GQ:PL  1/1:0,127:127:99:4339,379,0
    >>>
    
    Post edited by Geraldine_VdAuwera on
  • Hello Geraldine,
    I have added the -annon/-A/-an option with the same annotations each time in UnifiedGenotyper, VariantAnnotator and VariantReCalibrator. Again it crashes in the last step, giving the same error as before:
    "Values for DepthOfCoverage annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations".
    G.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Can you post the command lines you used at each step in this latest attempt? Also, you are running the latest version of GATK, right?

  • genaro_pimientagenaro_pimienta Member
    edited February 2013

    I am using GATK-2.3-9
    I used the variant annotators DepthOfCoverage, FisherStrand, DepthPerAlleleBySample, HomopolymerRun in UnifiedGenotyper and VariantAnnotator. I got stuck in VariantRecalibrator, where only AD and FS were accepted. All the other variant annotations made the toold crash. Note that only the abbreviated form AD/FS worked.

    My command lines are as follows:

    UnifiedGenotyper

    java -jar -Xmx15g GenomeAnalysisTK.jar -R ./genome.fa -T UnifiedGenotyper
    -I ./input.bam --dbsnp /home/gp53/gatk-bundle/dbsnp_137.hg19.vcf
    -o ./ output.vcf --min_base_quality_score 25 -stand_call_conf 50 -stand_emit_conf 10 -dcov 200 -A DepthOfCoverage
    -A FisherStrand -A DepthPerAlleleBySample -A HomopolymerRun -L ./intervals.intervals
    

    VariantAnnotator

    java -Xmx15g -jar GenomeAnalysisTK.jar -R ./genome.fa -T VariantAnnotator -I ./input.bam -o ./ variant-annotated.vcf --variant ./output.vcf -A DepthOfCoverage -A FisherStrand -A DepthPerAlleleBySample -A HomopolymerRun --dbsnp /home/gp53/gatk-bundle/dbsnp_137.hg19.vcf
    

    VariantRecalibrator

    java -Xmx15g -jar GenomeAnalysisTK.jar -T VariantRecalibrator -R ./genome.fa -input ./variant-annotated.vcf
    -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /home/gp53/gatk-bundle/hapmap_3.3.hg19.vcf
    -resource:omni,known=false,training=true,truth=false,prior=12.0 /home/gp53/gatk-bundle/1000G_omni2.5.hg19.vcf
    -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 /home/gp53/gatk-bundle/dbsnp_137.hg19.vcf 
    -an DP -an FS -mode SNP --percentBadVariants 0.05 --maxGaussians 4 -recalFile ./file.recal -tranchesFile ./file.tranches
    

    ApplyRecalibration

    java -Xmx15g -jar GenomeAnalysisTK.jar -T ApplyRecalibration -R ./genome.fa -input ./-variant-annotated.vcf
    --ts_filter_level 99.0 -tranchesFile ./file.tranches -recalFile ./file.recal -mode SNP
    -o ./apply-recalibration.vcf
    
  • Thanks for the suggestion. I will rerun UnifiedGenotyper without the -L option and see what happens.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Genaro,

    If that doesn't work out, have a look at this other thread:
    http://gatkforums.broadinstitute.org/discussion/comment/3879#Comment_3879

    The poster had a problem that sounded very similar to yours, and solved it by fixing an issue with their training file.

  • OK, just had a look at the thread. I see that this poster used his own training files. I am using the ones found in gatk-bundle

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Also, we released the new version (2.4) today, so be sure to update and run again.

Sign In or Register to comment.