gvcf fields

hello,
I had a couple basic questions about the gvcf genotype fields:

A T, GT:AD:DP:GQ:PL:SB 0/1:42,19,0:61:99:527,0,1958,729,2015,2744:33,9,18,1

  1. for the PL field, are the likelihoods in order of the following genotypes? AA AT TT A/nonref T/nonref nonref/nonref

  2. the SB field it says 'Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.' Could you explain how to interpret the four numbers?

thank you!

Young Wha

Best Answer

Answers

  • leeyoungwhaleeyoungwha Member

    thank you for the quick response! I wonder if I might folllow up:

    1. I sometimes see variants like this:
      T C, 1710.77 . DP=49;MLEAC=2,0;MLEAF=1.00,0.00;MQ=99.17;MQ0=0 GT:AD:DP:GQ:PL:SB 1/1:0,41,0:41:99:1744,123,0,1744,123,1744:0,0,0,0
      in the SB field, is it a red flag that all the numbers are 0 even if the QUAL and GQ is high?

    2. and a perhaps related question, how are QUAL=0 assigned for variants? example here:
      C T, 0 . BaseQRankSum=1.143;ClippingRankSum=0.095;DP=90;MLEAC=0,0;MLEAF=0.00,0.00;MQ=83.95;MQ0=0;MQRankSum=-1.969;ReadPosRankSum=1.461 GT:AD:DP:GQ:PL:SB 0/0:61,3,0:64:81:0,81,4933,207,4942,5068:0,0,0,0

    best,
    Young Wha

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Are you using version 3.0? If so, what is your command-line?

  • leeyoungwhaleeyoungwha Member

    this is from BP_RESOLUTION:

    java -Xmx32g -jar /GenomeAnalysisTK-3.0-0/GenomeAnalysisTK.jar -T HaplotypeCaller -R ~/XX.fasta -ERC BP_RESOLUTION -variant_index_type LINEAR -variant_index_parameter 128000 -I 119.bam -nct 20 --max_alternate_alleles 50 -L scaffold_1:1-100000 -o 119.gvcf 2> err &

  • leeyoungwhaleeyoungwha Member

    hello,

    I was wondering if I might ask for an update on this thread? (sorry, I know the forum is officially on hiatus!) I was wanting to make sure I understood the output before committing the server time to re-running my dataset (for indels)

    thanks as always for your time,

    YW

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @leeyoungwha,

    I'm going to update the docs for the gVCF format within the next two days, hopefully that will answer your question.

  • mikedmiked Member

    Hello,

    How can I include the information in the SB tag in the multi-sample VCF generated by GenotypeGVCFs? I'm running version v3.2-2-gec30cee and it's not included in each sample column. This would be essential for our variant QC processes. Any help is appreciated.

    Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hmm, it looks like the SB annotation is not enabled. Let me check with the devs if this is intentional.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah, here's the trick: the SB annotation requires a BAM file to look up the stranded counts, so it can't be added with GenotypeGVCFs. But you should be able to add it with VariantAnnotator, if you pass in the BAM files.

  • mikedmiked Member

    Thanks. I looked at VariantAnnotator however I don't believe it fits well with the HC gVCF multi-sample pipeline. It's great to be able to generate a gVCF and remove/archive the BAM files as a way to overcome their significant storage requirements.

    I noticed that after CombineGVCFs the SB tag is lost. Anyway it can be retained there so it can then be included in GenotypeGVCFs?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    As far as I know the SB tag should be maintained through CombineGVCFs (but if I'm wrong let me know and I'll take it up with the devs, because it should be). However in the latest version at least it is intentionally stripped out by GenotypeGVCFs because it is replaced by FS and the new SOR annotation, which is meant to replace SB in the gVCF-based multi-sample pipeline. So the current thinking is that once you have the final VCF, you shouldn't need SB. If you want it anyway, you need to re-annotate SB in the final VCF before archiving the BAMs.

  • mikedmiked Member

    @Geraldine_VdAuwera,

    I double checked the gVCFs from CombineGVCFs and the SB tag is present (my mistake earlier).

    I'm not able to see the SOR annotation in my final GenotypeGVCFs multi-sample pVCF. I looked at the documentation for GenotypeGVCFs at http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_GenotypeGVCFs.html

    and nothing about SOR. Can you provide more info on this new annotation?

    Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    The new annotation is here (may not be annotated by default, so you'll need to request its annotation in your command): http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_StrandOddsRatio.html

    We'll try to document it a little more fully in the near future; basically it adds onto the FS annotation, but covers additional forms of bias.

Sign In or Register to comment.