Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

How to use UnifiedGenotyper --annotation option

ericminikelericminikel Posts: 26Member
edited January 2013 in Ask the GATK team

I am doing human exome sequencing with hg19 as a reference, and I want UnifiedGenotyper to give me whatever annotations are available and I will worry later about which ones are useful and which are not.

I am confused about the behavior of the --annotation option in UnifiedGenotyper. The default value is listed as [], implying that unless we explicitly list what annotations we want, we get no annotations at all? Is that correct? Then in order to get a list of available annotations, we are directed to the VariantAnnotator --list option but it appears that it is not possible to just run:

java -Xmx2g -jar GenomeAnalysisTK.jar \
-R ref.fasta \
-T VariantAnnotator \
--list

In order to get a list of annotations. Instead, one not only needs to include a --variants flag, but the vcf file you point to actually has to be well-formatted, etc., otherwise you get errors like this

##### ERROR MESSAGE: Argument with name '--variant' (-V) is missing.

or this:

##### ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file could not be
determined dynamically. Please add an explicit type tag :NAME listing the correct type from among the supported types:

So, that having failed, is anyone able to just provide me with a list of possible arguments to the UnifiedGenotyper --annotation option?

Post edited by Geraldine_VdAuwera on

Best Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,191 admin
    Answer ✓

    That's right - it's because downsampling is done by the engine, and so the -dcov argument is documented with the other core engine command-line arguments.

    Re: your annotation question, the UG uses a certain number of annotations by default. The core set are called "standard annotations" and are used by default by all tools (unless otherwise specified by the 'exclude' argument).

    Currently, the standard annotations are the following:

    • BaseQualityRankSumTest
    • ChromosomeCounts
    • DepthOfCoverage
    • DepthPerAlleleBySample
    • FisherStrand
    • HaplotypeScore
    • InbreedingCoeff
    • MappingQualityRankSumTest
    • MappingQualityZero
    • QualByDepth
    • ReadPosRankSumTest
    • RMSMappingQuality
    • SpanningDeletions
    • TandemRepeatAnnotator

    For the record, the HaplotypeCaller uses the same standard set, except it excludes the last two in the list for technical reasons.

    We will add this information to the documentation in the near future.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,191 admin
    Answer ✓

    And the full list of annotations is available in the technical documentation at this link: http://www.broadinstitute.org/gatk/gatkdocs/#VariantAnnotatorannotations

    It is silly that VariantAnnotator refuses to list options without a fully valid command line -- we'll try to fix that in the next release.

Answers

Sign In or Register to comment.