We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

In haplotype variant caller for the Influenza virus, do I need to mention or remove any parameters?

NandaNanda CanadaMember
edited November 2017 in Ask the GATK team

I have sequenced influenza virus and interested in finding variants (SNVs and INDELs). I am planning to use Haplotype caller.


  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @Nanda,

    Please check out our Best Practices page at https://software.broadinstitute.org/gatk/best-practices/ for a high level overview from which you can access more detailed documentation. If you are new to genomics, consider also attending a GATK workshop (calendar; more info).

  • NandaNanda CanadaMember
    edited November 2017

    Thanks Shlee. I am working on the best practices.

    1) What is the impact of the following warning "StrandBiasTest" with regards to the downstream steps?
    2) What is the impact of the following warning "HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper"?

    I used following commands for generating raw.gVCF and raw.VCF files.

    Step1: Haplotype Caller to generate raw gVCF files.

    java -Xmx16g -Djava.io.tmpdir=tmp -jar /software/gatk/GenomeAnalysisTK-3.7/GenomeAnalysisTK.jar \
    -T HaplotypeCaller \
    -R /reference/Influenza/Influenza_virus.fa \
    -I sample1.bam \
    --emitRefConfidence GVCF \
    --genotyping_mode DISCOVERY \
    -ploidy 1 \
    -nct 12 \
    -stand_call_conf 30 \
    -o sample1.raw.g.vcf

    Note: Step1 carried out for other samples in the project.

    Step2: Joint genotyping for all the samples

    java -Xmx6g -Djava.io.tmpdir=tmp -jar /software/gatk/GenomeAnalysisTK-3.7/GenomeAnalysisTK.jar \
    -T GenotypeGVCFs \
    -R /reference/Influenza/Influenza_virus.fa \
    --variant sample1.raw.g.vcf  \
    --variant sample2.raw.g.vcf  
    --variant sample30.raw.g.vcf \
    -o Jointgenotyping.projectname.raw.vcf

    Contents from the log file:

    INFO  08:47:55,916 HelpFormatter - --------------------------------------------------------------------------------
    INFO  08:47:55,919 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
    INFO  08:47:55,919 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
    INFO  08:47:55,919 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
    INFO  08:47:55,919 HelpFormatter - [Tue Nov 14 08:47:55 EST 2017] Executing on Linux 2.6.32-220.el6.x86_64 amd64
    INFO  08:47:55,919 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_121-b13
    INFO  08:47:55,923 HelpFormatter - Program Args: -T GenotypeGVCFs -R /reference/Influenza/Influenza_virus.fa  --variant sample1.raw.g.vcf  --variant sample2.raw.g.vcf .... --variant sample30.raw.g.vcf -o Jointgenotyping.projectname.raw.vcf

    Post edited by shlee on
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @Nanda,

    You can ignore the warning for HaplotypeScore as it and UnifiedGenotyper (a deprecated pileup caller) are no longer part of our Best Practices. You can read more about the annotation here.

    Here's the similar link for an explanation of StrandBiasBySample: https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_annotator_StrandBiasBySample.php. In your output VCF, do you see an annotation SB? This is what this annotation refers to. The warning you are showing, I think it refers to the possibility that if a site's samples do not provide the SB metric, then GenotypeGVCFs will not perform StrandBiasTest at the cohort level for the site. This then means that some rows may have the related INFO level annotation whereas others may not. The warning merely points to this possibility in case your downstream tools expect this annotation for every row of the VCF.

Sign In or Register to comment.