Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

StrandBiasbySample, FisherStrand Annotation

meduamedua InnsbruckMember

Hi, I am using GATK version 3.2-2 to analyze miseq data from a human snp panel, aligned to it's "own" reference. I use unifiedgenotyper to call all desired SNPs (ref or non-ref variants) from the panel and it works very well. I would like to know reverse and forward reads for each allele. I have used FisherStrand values, but they are all 0.00, meaning there is no strand bias ? I assume strandbias SB or StrandBiasBySample are not used anymore ? Is there any other way I can get forward and reverse reads without having to walk through the bam file ?

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    The tools still use the StrandBiasBySample annotation internally in order to calculate a better end-result metric called SOR, for StrandOddsRatio, which improves on the FS annotation. It is described (albeit rather briefly) here: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_StrandOddsRatio.html

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @medua‌

    Hi,

    If your Fisher Strand values are all 0, then yes, this could mean there is no bias. However, please note that if you have all homozygous variant calls (1/1) and you do not see any reference reads, then the Fisher Strand value is not valid. This is because the Fisher Strand value is relative to the reference-forward and reference-reverse reads. Please refer to this thread for more information: http://gatkforums.broadinstitute.org/discussion/4342/sb-flag-in-vcf-file#latest

    As for the Strand Bias By Sample, it still shows up if you request it. This will give you the ref-forward, ref-reverse, alt-forward, and alt-reverse reads. You can find SB here GT:AD:GQ:PL:SB

    -Sheila

  • meduamedua InnsbruckMember

    hi,
    thanks so much for your quick answers. I am not being very successful with SB and SOR. I use the following command to run unfiedgenotyper:

    java -jar GATK/GenomeAnalysisTK.jar \
    -T UnifiedGenotyper \
    -R $1 \
    -I $f.bsqr.bam \
    -glm SNP \
    --alleles $2 \
    --genotyping_mode GENOTYPE_GIVEN_ALLELES \
    -stand_emit_conf 10 \
    -stand_call_conf 30 \
    -o $f.vcf \
    --output_mode EMIT_ALL_SITES \
    --downsampling_type none \
    --dbsnp $2 \
    -A FisherStrand \
    -A AlleleBalance \
    -A BaseCounts \
    -A StrandOddsRatio \
    -A StrandBiasBySample

    I have attached one of the output files (slightly modified for names). The SB tag is in the Header under Format, but is not displayed in the output. I can't find StrandOddsRatio. I am outputting the genotype of all SNPs in my panel not only the ones that are variants. Also, for multiallelic SNPs, I don't get Allele Balance on heterzygotes, is that how it should be ?
    What am I doing wrong ?
    best regards
    Mayra

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @medua‌

    Hi Mayra,

    Sorry for my delayed response. My earlier response was based on the assumption that you are using Haplotype Caller. We recommend using Haplotype Caller over Unified Genotyper. Can you use Haplotype Caller instead of Unified Genotyper?

    Sheila

  • meduamedua InnsbruckMember

    Hi Sheila,
    thank you, for some reason I thought I read it was the other way around. I ran HaplotypeCaller now:

    java -jar GATK/GenomeAnalysisTK.jar \
    -T HaplotypeCaller \
    -R $1 \
    -I $f.bsqr.bam \
    --dbsnp $2 \
    --alleles $2 \
    --genotyping_mode GENOTYPE_GIVEN_ALLELES \
    -stand_emit_conf 10 \
    -stand_call_conf 30 \
    -o $f.vcf \
    --downsampling_type none \
    --output_mode EMIT_ALL_SITES \
    -A FisherStrand \
    -A AlleleBalance \
    -A BaseCounts \
    -A StrandOddsRatio \
    -A StrandBiasBySample

    I know get SB in the FORMAT section, but I loose BaseCounts and still no StrandOddsRatio, am I doing something wrong now ?

    thanks for your help again.

    Mayra

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @medua‌

    Hi Mayra,

    I am happy you got the SB annotation to work now.

    Unfortunately, BaseCounts is disabled in HaplotypeCaller.

    To get the SOR annotation, you will need to use VariantAnnotator and specify that you want the SOR annotation. Please read about VariantAnnotator here: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_VariantAnnotator.html

    I hope this helps.

    -Sheila

  • meduamedua InnsbruckMember

    thanks sheila, you have been a great help !

  • meduamedua InnsbruckMember

    just tested it, works great thanks !

  • cpsosa0006cpsosa0006 RochesterMember

    Hello, Perhaps this information is in all these postings but I have not been able to get StrandBiasBySample to work. I am currently using this version: The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12. I am passing these parameters to Haplotype Caller "HaplotypeCaller_params="-stand_call_conf 30 -stand_emit_conf 10 -A StrandBiasBySample" but I do not seem to get the output advertised in the documentation here: https://software.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_StrandBiasBySample.php Is there anything that I am overlooking? Thanks

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @cpsosa0006
    Hi,

    I am not sure which version the annotation was introduced in, but can you try with the latest version?

    Thanks,
    Sheila

  • cpsosa0006cpsosa0006 RochesterMember

    Thank you Sheila. This is what we'll be trying. I'll report on our findings.

  • cpsosa0006cpsosa0006 RochesterMember

    Hi Sheila, FYI. This option "StrandAlleleCountsBySample" provides the information that we need and works fine with 3.4. I can see the output as advertised in the documentation. I'll continue testing 3.6 to see if it works when invoking StrandBiasBySample.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @cpsosa0006
    Hi,

    I can confirm StrandBiasBySample works in 3.6 :smile:

    Yes, StrandAlleleCountsBySample gives you the same information as StrandBiasBySample, except it gives the allele counts on forward and reverse strands for all alternate alleles. StrandBiaseBySample only gives the counts for the first alternate allele.

    -Sheila

Sign In or Register to comment.