Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

HaplotypeCaller

KathKath Posts: 36Member

Hi,

I am adapting a two-year old pipeline, which includes UnifiedGenotyper to call variants. I would like to update this to the HaplotypeCaller. In the command for the UnifiedGenotyper there is an option --metrics_file. I cannot find reference to this in the GATK documentation - do you know what it is and whether I can use it with HaplotypeCaller as well?

Also, do I understand correctly that using HaplotypeCaller to call variants means you don't have to carry out the local realignment around indels step prior to this?

Thanks very much,

Kath

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    Hi Kath,

    I think the metrics file argument was deprecated quite some time ago. That is no longer neede with recent version of UG or with the HC.

    It is still useful to do the local realignment step, even if you are going to use HaplotypeCaller to call variants. Best not to skip it.

    Geraldine Van der Auwera, PhD

  • KathKath Posts: 36Member

    Thanks Geraldine. So, there is no option I need to use instead of the --metrics_file argument? It is fine to use the "typical" command line architecture as outlined in the HaplotypeCaller documentation?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    That's correct. We have changed very many things over the past two years. There is rarely a one-to-one correspondence between old and new arguments. So I would encourage you to start from the basic commands given as examples in the documentation, and then build on that, adding arguments based on what you want to achieve, rather than trying to emulate old commands.

    Geraldine Van der Auwera, PhD

  • SharonCoxSharonCox Posts: 4Member
    edited May 2013

    Dear Geraldine I also passed from UG to HC (GATK version 2.3-9-ge5ebf34), I used the following command :

    java -Xmx4g -jar PATH/GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R PATH/hg19_2.fa -T HaplotypeCaller -I PATH/recalibrated.bam --dbsnp PATH/dbsnp_137.hg19.vcf -o HC.vcf --minPruning 5 -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 200 -A DepthOfCoverage -A AlleleBalance -A FisherStrand -L PATH/converted_TruSeq_exome_targeted_regions_GRCh37.bed

    The output was ok apart from the fact that ID line of the vcf format was not compiled:

    CHROM:chr3 POS:191860262 ID:. REF:AT ALT:A QUAL:171.77 FILTER:PASS INFO:etc FORMAT:GT:GQ:DP:PL:AD

    Is there a way that I can add the ID afterwards since I have called variants for 24 samples and I would not want to do launch the command again since it takes quite a long time to process? and why is the ID missing in the first place? I also tried to sustitute --dbsnp with -D but I obtained the same results. Thanks for your help Sharon

    Post edited by SharonCox on
  • CarneiroCarneiro Posts: 275Administrator, GATK Developer admin
    edited May 2013

    The HaplotypeCaller does not cross populate the ID field with the rsID of dbSNP yet, this is in the shortlist to be implemented though.

    Post edited by Carneiro on
Sign In or Register to comment.