DiagnoseTargets: annotating output VCF file with gene names

Hi GATK !

I am using DiagnoseTargets on exome data with the 1000 Genomes human v37 ref genome and Illumina exome interval .bed file.
It seems that DiagnoseTargets (unlike DepthOfCoverage) doesn't accept the -genelist argument.

1) What would be a possible alternative to add the gene name on the output interval stat .vcf file (and optionally the --missing intervals)?
Is VariantAnnotator (with the --comp argument) would work?

2) What annotation file should I use?
The sortByRef.pl script (mentioned here) not being available anymore, did it only (i) discard records with non-Chr1-22/X/Y/M, and (ii) sort by Chr?
Does VariantAnnotator automatically adjust from zero-based half-open intervals (UCSC standard) to one-based closed intervals or should I modify the file during the previous steps as well?

Issue · Github
by Sheila

Issue Number
812
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    edited April 2016

    Hi @user31888, sorry for the late reply.

    VariantAnnotator can take a variety of external resources from which to import annotations. You could look into using --resource:foo resource.vcf; I'm pretty sure this could be used to tag intervals with a gene name annotation. But you would still have to provide the gene names in a VCF format which is going to be awkward. Depending on what you're looking to do with the results, you could make things easier on yourself by converting your DiagnoseTargets results to a table (using VariantsToTable) and adding gene names to that. To be frank I'm not sure why the developer who wrote DT made the output be a VCF, it's a pain to work with. He usually makes excellent decisions (he's the guy who hired me) but on that one I question his judgment ;)

    I can't remember the exact details of what the sorting script did but FYI there is a Picard tool that replaces it (and works better).

    And yes GATK tools adjust the interpretation of intervals when you give them a BED file.

Sign In or Register to comment.