I do not get the annotations I specified with -A

SheilaSheila Broad InstituteMember, Broadie, Moderator admin
edited April 2016 in Solutions to Problems

The problem

You specified -A <some annotation> in a command line invoking one of the annotation-capable tools (HaplotypeCaller, MuTect2, UnifiedGenotyper and VariantAnnotator), but that annotation did not show up in your output VCF.

Keep in mind that all annotations that are necessary to run our Best Practices are annotated by default, so you should generally not need to request annotations unless you're doing something a bit special.

Why this happens & solutions

There can be several reasons why this happens, depending on the tool, the annotation, and you data. These are the four we see most often; if you encounter another that is not listed here, let us know in the comments.

  1. You requested an annotation that cannot be calculated by the tool

    For example, you're running MuTect2 but requested an annotation that is specific to HaplotypeCaller. There should be an error message to that effect in the output log. It's not possible to override this; but if you believe the annotation should be available to the tool, let us know in the forum and we'll consider putting in a feature request.

  2. You requested an annotation that can only be calculated if an optional input is provided

    For example, you're running HaplotypeCaller and you want InbreedingCoefficient, but you didn't specify a pedigree file. There should be an error message to that effect in the output log. The solution is simply to provide the missing input file. Another example: you're running VariantAnnotator and you want to annotate Coverage, but you didn't specify a BAM file. The tool needs to see the read data in order to calculate the annotation, so again, you simply need to provide the BAM file.

  3. You requested an annotation that has requirements which are not met by some or all sites

    For example, you're looking at RankSumTest annotations, which require heterozygous sites in order to perform the necessary calculations, but you're running on haploid data so you don't have any het sites. There is no workaround; the annotation is not applicable to your data. Another example: you requested InbreedingCoefficient, but your population includes fewer than 10 founder samples, which are required for the annotation calculation. There is no workaround; the annotation is not applicable to your data.

  4. You requested an annotation that is already applied by default by the tool you are running

    For example, you requested Coverage from HaplotypeCaller, which already annotates this by default. There is currently a bug that causes some default annotations to be dropped from the list if specified on the command line. This will be addressed in an upcoming version. For now the workaround is to check what annotations are applied by default and NOT request them with -A.

Post edited by Geraldine_VdAuwera on

Comments

  • johnmajohnma Member

    I'm not sure whether the behavior mentioned in point (4) is still a MuTect2 behavior in 3.6-0. My experience is that if I supply a list of -A's to the command line, then the default annotations won't be written. This means, if I want to output some annotations on top of the default, I'd have to request for all the annotations, including those marked as default.

    Issue · Github
    by Sheila

    Issue Number
    1474
    State
    closed
    Last Updated
    Closed By
    vdauwera
  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @johnma,

    Can you give us the list of -As that you are supplying where you see the default annotations being dropped? Thanks.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @johnma
    Hi,

    Yes, I think that must still be an issue in MuTect2. For now, you will need to use the workaround. I will put in a request to have this changed in the future.

    -Sheila

  • johnmajohnma Member

    @shlee
    Hi,

    My command used is as follows:
    java -Xmx64G -jar $GATK_JAR -T MuTect2 -R $REFGENOME -I:tumor $TUMOR_INPUT -I:normal $NORMAL_INPUT --dbsnp "All_20160527.vcf.gz" --cosmic "Cosmic_v78_GRCh38_all_variants.vcf.gz" -A DepthPerAlleleBySample -A BaseQualitySumPerAlleleBySample -A TandemRepeatAnnotator -A OxoGReadCounts -o $MUTECT2_OUTPUT_FILE

    The log does not seem remarkable with the exception of the "Null qss" error that I'd expect from FisherStrand.

    INFO 16:37:40,286 HelpFormatter - ---------------------------------------------------------------------------------- INFO 16:37:40,291 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.6-0-g89b7209, Compiled 2016/06/01 22:27:29 INFO 16:37:40,291 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute INFO 16:37:40,291 HelpFormatter - For support and documentation go to https://www.broadinstitute.org/gatk INFO 16:37:40,292 HelpFormatter - [Fri Sep 09 16:37:40 CDT 2016] Executing on Linux 2.6.32-431.23.3.el6.x86_64 amd64 INFO 16:37:40,292 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14 JdkDeflater INFO 16:37:40,297 HelpFormatter - Program Args: -T MuTect2 -R GRCh38.primary_assembly.genome.fa -I:tumor [redacted]_aligned.sorted.markduplicates.realigned.recal.bam -I:normal [redacted]_aligned.sorted.markduplicates.realigned.recal.bam --dbsnp All_20160527.vcf.gz --cosmic Cosmic_v78_GRCh38_all_variants.vcf.gz -A QualByDepth -A MappingQualityRankSumTest -A ReadPosRankSumTest -A FisherStrand -A StrandOddsRatio -o [redacted] INFO 16:37:40,311 HelpFormatter - Executing as [redacted] on Linux 2.6.32-431.23.3.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14. INFO 16:37:40,311 HelpFormatter - Date/Time: 2016/09/09 16:37:40 INFO 16:37:40,311 HelpFormatter - ---------------------------------------------------------------------------------- INFO 16:37:40,312 HelpFormatter - ---------------------------------------------------------------------------------- INFO 16:37:40,385 GenomeAnalysisEngine - Strictness is SILENT INFO 16:37:40,558 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 INFO 16:37:40,569 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 16:37:40,672 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.10 WARN 16:37:41,119 IndexDictionaryUtils - Track cosmic doesn't have a sequence dictionary built in, skipping dictionary validation WARN 16:37:41,120 IndexDictionaryUtils - Track dbsnp doesn't have a sequence dictionary built in, skipping dictionary validation INFO 16:37:41,230 GenomeAnalysisEngine - Preparing for traversal over 2 BAM files INFO 16:37:41,676 GenomeAnalysisEngine - Done preparing for traversal INFO 16:37:41,676 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 16:37:41,677 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 16:37:41,678 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime INFO 16:37:41,796 MuTect2 - Using global mismapping rate of 45 => -4.5 in log10 likelihood units Using un-vectorized C++ implementation of PairHMM INFO 16:37:44,703 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file INFO 16:37:44,704 VectorLoglessPairHMM - Using vectorized implementation of PairHMM ERROR 16:37:50,339 MuTect2 - Null qss at 867116 ... INFO 11:18:09,312 VectorLoglessPairHMM - Time spent in setup for JNI call : 64.688102505 INFO 11:18:09,312 PairHMM - Total compute time in PairHMM computeLikelihoods() : 186546.49047094802 INFO 11:18:09,313 MuTect2 - Ran local assembly on 44824 active regions INFO 11:18:09,576 ProgressMeter - done 3.099750718E9 66.7 h 77.0 s 100.0% 66.7 h 0.0 s INFO 11:18:09,576 ProgressMeter - Total runtime 240027.90 secs, 4000.46 min, 66.67 hours INFO 11:18:09,577 MicroScheduler - 25818756 reads were filtered out during the traversal out of approximately 143208294 total reads (18.03%) INFO 11:18:09,577 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter INFO 11:18:09,577 MicroScheduler - -> 25481557 reads (17.79% of total) failing DuplicateReadFilter INFO 11:18:09,578 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter INFO 11:18:09,578 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter INFO 11:18:09,578 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter INFO 11:18:09,579 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter INFO 11:18:09,579 MicroScheduler - -> 337199 reads (0.24% of total) failing UnmappedReadFilter

  • shleeshlee CambridgeMember, Broadie, Moderator admin
Sign In or Register to comment.