VariantAnnotator regression: --comp command overrides itself

freeseekfreeseek Posts: 16Member

It used to be possible to annotate a VCF files using multiple VCF track files. Basically the --comp command could be used multiple times. Now only the last track gets annotated. That means, if I use "-T VariantAnnotator --comp:TR1 tr1.vcf --comp:TR2 tr2.vcf, only the TR2 flag is used for annotation, and the TR1 flag is not. This is inconsistent and must be a recent regression.

It works fine with GenomeAnalysisTK-2.5-2 but it doesn't work anymore with GenomeAnalysisTK-2.6-2.

Answers

  • ebanksebanks Posts: 684GATK Developer mod

    Hi there, The change that went in between 2.5 and 2.6 is that we now check for consistency of the alternate alleles before annotating with the comps. So if your variant file has an A-C SNP and your comp file has an A-T SNP then the comp will not get transferred/annotated because they do not match. Is that what you are seeing in your data?

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • freeseekfreeseek Posts: 16Member

    On the Broad Servers, this command will only annotate KGP variants:

    /broad/software/free/Linux/redhat_5_x86_64/pkgs/oracle-java-jdk_1.7.0-17_x86_64/bin/java -Xmx3g -jar /humgen/gsa-hpprojects/GATK/bin/current/GenomeAnalysisTK.jar -T VariantAnnotator -R /humgen/1kg/reference/human_g1k_v37_decoy.fasta -V in.vcf.gz -L in.vcf.gz --comp:ESP /psych/genetics_data/working/giulio/b37/ESP/ESP6500SI-V2-SSA137.snps_indels/ESP6500SI-V2-SSA137.snps_indels.vcf.gz --comp:KGP /humgen/1kg/DCC/ftp/release/20110521/ALL.wgs.phase1_release_v3.20101123.snps_indels_sv.sites.vcf.gz -o out.vcf.gz

    While this command (swapping the order of the --comp commands) will only annotate ESP variants:

    /broad/software/free/Linux/redhat_5_x86_64/pkgs/oracle-java-jdk_1.7.0-17_x86_64/bin/java -Xmx3g -jar /humgen/gsa-hpprojects/GATK/bin/current/GenomeAnalysisTK.jar -T VariantAnnotator -R /humgen/1kg/reference/human_g1k_v37_decoy.fasta -V in.vcf.gz -L in.vcf.gz --comp:KGP /humgen/1kg/DCC/ftp/release/20110521/ALL.wgs.phase1_release_v3.20101123.snps_indels_sv.sites.vcf.gz --comp:ESP /psych/genetics_data/working/giulio/b37/ESP/ESP6500SI-V2-SSA137.snps_indels/ESP6500SI-V2-SSA137.snps_indels.vcf.gz -o out.vcf.gz

    So there must be some overriding of the command going on. Nevertheless, both files will have this tags included:

    INFO=<ID=KGP,Number=0,Type=Flag,Description="KGP Membership">
    INFO=<ID=ESP,Number=0,Type=Flag,Description="ESP Membership">

    The main issue is also that the GATK outputs no error, so this will not be noticed until the end of a pipeline.

  • ebanksebanks Posts: 684GATK Developer mod

    Is there a specific position you can point us to in order to make this go faster?

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • freeseekfreeseek Posts: 16Member

    Sorry, I am not familiar the code, but I would imagine it is something minor. The two command lines above should make you reproduce the error with no problems. Just change in.vcf.gz with your favorite small vcf file.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,822Administrator, GATK Developer admin

    @freeseek, we have a fix for this. The fix should be available in the next nightly build (ie tomorrow).

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.