The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

HaplotypeCaller rs annotation

flescaiflescai Member Posts: 67 ✭✭

Hi there,
I'm running now the new GATK 2.2-2 version and I noticed an issue with HaplotypeCaller I had in the previous version I was using.
Despite adding the dbSNP ROD to the walker, the emitted VCF doesn't contain rs names in the name field.
On the contrary, UnifiedGenotyper annotates the variants with the appropriate names.

In my .scala code I wrote:

 class HaplotypeCallerArguments (t: Target) extends HaplotypeCaller with UNIVERSAL_GATK_ARGS {
   this.reference_sequence = qscript.referenceFile
   this.intervals = if (qscript.intervals == null) Nil else List(qscript.intervals)
   // Set the memory limit to 6 gigabytes on each command.
   this.memoryLimit = 6
   this.input_file :+= qscript.bamFile
   this.D = qscript.dbSNP_b37
 }

and that is correctly reflected when queue launches the job as

 INFO  16:07:30,655 FunctionEdge - Starting:  'java'  '-Xmx6144m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/SAN/biomed/analysis/tmp'  '-cp' '/share/apps/genomics/Queue-2.2-2-gf44cc4e/Queue.jar'  
 'org.broadinstitute.sting.gatk.CommandLineGATK'  '-T' 'HaplotypeCaller'  '-I' '/SAN/biomed/analysis/recal.list'  '-L' '/SAN/biomed/analysis/.queue/scatterGather/HaplotypeCaller-sg/temp_016_of_300/scatter.intervals'  '-R' '/share/apps/genomics/reference/human_g1k_v37.fasta'  
 '-l' 'INFO'  '-o' '/SAN/biomed/analysis/.queue/scatterGather/HaplotypeCaller-sg/temp_016_of_300/comparisonHC.raw.vcf' '-D' '/share/apps/genomics/reference/gatkresources_hg19_1.5/ftp.broadinstitute.org/bundle/1.5/b37/dbsnp_135.b37.vcf'  

However, my VCF still looks like

grep -v \# HC.raw.vcf | cut -f 1,2,3,4,5 | more
1   762273  .   G   A
1   865738  .   A   G
1   866319  .   G   A
1   866511  .   C   CCCCT
1   871042  .   C   CA
1   874734  .   C   T

Am I doing something wrong?
It would be quite time consuming to launch VariantAnnotation if not necessary, as I understand now the covariates used by VQSR are already emitted by the caller.

thanks,
Francesco

Tagged:

Best Answer

  • rpoplinrpoplin Dev Posts: 122 ✭✭✭
    Answer ✓

    Unfortunately the HaplotypeCaller can't annotate the rsIDs yet. We'll work on getting this added for the next release. Thanks for letting us know that you need this functionality.

    Cheers,

Answers

Sign In or Register to comment.