Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Several Annotations not working in GATK Haplotype Caller

sicottesicotte Member
edited May 2018 in Ask the GATK team

I am using Genotype Given Allele with Haplotype Caller
I am trying to explicitely request all annotations that the documentation says are compatible with the Haplotype caller (and that make sense for a single sample .. e.g. no hardy weinberg ..)

the following annotations all have "NA"
GCContent(GC) HomopolymerRun(Hrun) TandemRepeatAnnotator (STR RU RPA)
.. but are valid requests because I get no errors from GATK.

This is the command I ran (all on one line)

java -Xmx40g -jar /data5/bsi/bictools/alignment/gatk/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller --input_file /data2/external_data/[...]/s115343.beauty/Paired_analysis/secondary/Paired_10192014/IGV_BAM/pair_EX167687/s_EX167687_DNA_Blood.igv-sorted.bam --alleles:vcf /data2/external_data/[...]m026645/s109575.ez/Sequencing_2016/OMNI.vcf --phone_home NO_ET --gatk_key /projects/bsi/bictools/apps/alignment/GenomeAnalysisTK/3.1-1/Hossain.Asif_mayo.edu.key --reference_sequence /data2/bsi/reference/sequence/human/ncbi/hg19/allchr.fa --minReadsPerAlignmentStart 1 --disableOptimizations --dontTrimActiveRegions --forceActive --out /data2/external_data/[...]m026645/s109575.ez/Sequencing_2016/EX167687.0.0375.chr22.vcf --logging_level INFO -L chr22 --downsample_to_fraction 0.0375 --downsampling_type BY_SAMPLE --genotyping_mode GENOTYPE_GIVEN_ALLELES --standard_min_confidence_threshold_for_calling 20.0 --standard_min_confidence_threshold_for_emitting 0.0 --annotateNDA --annotation BaseQualityRankSumTest --annotation ClippingRankSumTest --annotation Coverage --annotation FisherStrand --annotation GCContent --annotation HomopolymerRun --annotation LikelihoodRankSumTest --annotation MappingQualityRankSumTest --annotation NBaseCount --annotation QualByDepth --annotation RMSMappingQuality --annotation ReadPosRankSumTest --annotation StrandOddsRatio --annotation TandemRepeatAnnotator --annotation DepthPerAlleleBySample --annotation DepthPerSampleHC --annotation StrandAlleleCountsBySample --annotation StrandBiasBySample --excludeAnnotation HaplotypeScore --excludeAnnotation InbreedingCoeff

Log file is below( Notice "weird" WARNings about) "StrandBiasBySample annotation exists in input VCF header"..
which make no sense because the header is empty other than the barebone fields.

This is the barebone VCF
head /data2/external_data/[...]_m026645/s109575.ez/Sequencing_2016/OMNI.vcf

fileformat=VCFv4.2

CHROM POS ID REF ALT QUAL FILTER INFO

chr1 723918 rs144434834 G A 30 PASS .
chr1 729632 rs116720794 C T 30 PASS .
chr1 752566 rs3094315 G A 30 PASS .
chr1 752721 rs3131972 A G 30 PASS .
chr1 754063 rs12184312 G T 30 PASS .
chr1 757691 rs74045212 T C 30 PASS .
chr1 759036 rs114525117 G A 30 PASS .
chr1 761764 rs144708130 G A 30 PASS .

This is the output

INFO 10:03:06,080 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:03:06,082 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12
INFO 10:03:06,083 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 10:03:06,083 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 10:03:06,086 HelpFormatter - Program Args: -T HaplotypeCaller --input_file /data2/external_data/[...]/s115343.beauty/Paired_analysis/secondary/Paired_10192014/IGV_BAM/pair_EX167687/s_EX167687_DNA_Blood.igv-sorted.bam --alleles:vcf /data2/external_data/[...]m026645/s109575.ez/Sequencing_2016/OMNI.vcf --phone_home NO_ET --gatk_key /projects/bsi/bictools/apps/alignment/GenomeAnalysisTK/3.1-1/Hossain.Asif_mayo.edu.key --reference_sequence /data2/bsi/reference/sequence/human/ncbi/hg19/allchr.fa --minReadsPerAlignmentStart 1 --disableOptimizations --dontTrimActiveRegions --forceActive --out /data2/external_data/[...]m026645/s109575.ez/Sequencing_2016/EX167687.0.0375.chr22.vcf --logging_level INFO -L chr22 --downsample_to_fraction 0.0375 --downsampling_type BY_SAMPLE --genotyping_mode GENOTYPE_GIVEN_ALLELES --standard_min_confidence_threshold_for_calling 20.0 --standard_min_confidence_threshold_for_emitting 0.0 --annotateNDA --annotation BaseQualityRankSumTest --annotation ClippingRankSumTest --annotation Coverage --annotation FisherStrand --annotation GCContent --annotation HomopolymerRun --annotation LikelihoodRankSumTest --annotation MappingQualityRankSumTest --annotation NBaseCount --annotation QualByDepth --annotation RMSMappingQuality --annotation ReadPosRankSumTest --annotation StrandOddsRatio --annotation TandemRepeatAnnotator --annotation DepthPerAlleleBySample --annotation DepthPerSampleHC --annotation StrandAlleleCountsBySample --annotation StrandBiasBySample --excludeAnnotation HaplotypeScore --excludeAnnotation InbreedingCoeff
INFO 10:03:06,093 HelpFormatter - Executing as [email protected] on Linux 2.6.32-573.8.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26.
INFO 10:03:06,094 HelpFormatter - Date/Time: 2016/01/19 10:03:06
INFO 10:03:06,094 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:03:06,094 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:03:06,545 GenomeAnalysisEngine - Strictness is SILENT
INFO 10:03:06,657 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Fraction: 0.04
INFO 10:03:06,666 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 10:03:07,012 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.35
INFO 10:03:07,031 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
INFO 10:03:07,170 IntervalUtils - Processing 51304566 bp from intervals
INFO 10:03:07,256 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 10:03:07,595 GenomeAnalysisEngine - Done preparing for traversal
INFO 10:03:07,595 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 10:03:07,595 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 10:03:07,596 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime
INFO 10:03:07,596 HaplotypeCaller - Disabling physical phasing, which is supported only for reference-model confidence output
WARN 10:03:07,709 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
WARN 10:03:07,709 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
INFO 10:03:07,719 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
INFO 10:03:37,599 ProgressMeter - chr22:5344011 0.0 30.0 s 49.6 w 10.4% 4.8 m 4.3 m
INFO 10:04:07,600 ProgressMeter - chr22:11875880 0.0 60.0 s 99.2 w 23.1% 4.3 m 3.3 m
Using AVX accelerated implementation of PairHMM
INFO 10:04:29,924 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file
INFO 10:04:29,925 VectorLoglessPairHMM - Using vectorized implementation of PairHMM
WARN 10:04:29,938 AnnotationUtils - Annotation will not be calculated, genotype is not called
WARN 10:04:29,938 AnnotationUtils - Annotation will not be calculated, genotype is not called
WARN 10:04:29,939 AnnotationUtils - Annotation will not be calculated, genotype is not called
INFO 10:04:37,601 ProgressMeter - chr22:17412465 0.0 90.0 s 148.8 w 33.9% 4.4 m 2.9 m
INFO 10:05:07,602 ProgressMeter - chr22:18643131 0.0 120.0 s 198.4 w 36.3% 5.5 m 3.5 m
INFO 10:05:37,603 ProgressMeter - chr22:20133744 0.0 2.5 m 248.0 w 39.2% 6.4 m 3.9 m
INFO 10:06:07,604 ProgressMeter - chr22:22062452 0.0 3.0 m 297.6 w 43.0% 7.0 m 4.0 m
INFO 10:06:37,605 ProgressMeter - chr22:23818297 0.0 3.5 m 347.2 w 46.4% 7.5 m 4.0 m
INFO 10:07:07,606 ProgressMeter - chr22:25491290 0.0 4.0 m 396.8 w 49.7% 8.1 m 4.1 m
INFO 10:07:37,607 ProgressMeter - chr22:27044271 0.0 4.5 m 446.4 w 52.7% 8.5 m 4.0 m
INFO 10:08:07,608 ProgressMeter - chr22:28494980 0.0 5.0 m 496.1 w 55.5% 9.0 m 4.0 m
INFO 10:08:47,609 ProgressMeter - chr22:30866786 0.0 5.7 m 562.2 w 60.2% 9.4 m 3.8 m
INFO 10:09:27,610 ProgressMeter - chr22:32908950 0.0 6.3 m 628.3 w 64.1% 9.9 m 3.5 m
INFO 10:09:57,610 ProgressMeter - chr22:34451306 0.0 6.8 m 677.9 w 67.2% 10.2 m 3.3 m
INFO 10:10:27,611 ProgressMeter - chr22:36013343 0.0 7.3 m 727.5 w 70.2% 10.4 m 3.1 m
INFO 10:10:57,613 ProgressMeter - chr22:37387478 0.0 7.8 m 777.1 w 72.9% 10.7 m 2.9 m
INFO 10:11:27,614 ProgressMeter - chr22:38534891 0.0 8.3 m 826.8 w 75.1% 11.1 m 2.8 m
INFO 10:11:57,615 ProgressMeter - chr22:39910054 0.0 8.8 m 876.4 w 77.8% 11.4 m 2.5 m
INFO 10:12:27,616 ProgressMeter - chr22:41738463 0.0 9.3 m 926.0 w 81.4% 11.5 m 2.1 m
INFO 10:12:57,617 ProgressMeter - chr22:43113306 0.0 9.8 m 975.6 w 84.0% 11.7 m 112.0 s
INFO 10:13:27,618 ProgressMeter - chr22:44456937 0.0 10.3 m 1025.2 w 86.7% 11.9 m 95.0 s
INFO 10:13:57,619 ProgressMeter - chr22:45448656 0.0 10.8 m 1074.8 w 88.6% 12.2 m 83.0 s
INFO 10:14:27,620 ProgressMeter - chr22:46689073 0.0 11.3 m 1124.4 w 91.0% 12.5 m 67.0 s
INFO 10:14:57,621 ProgressMeter - chr22:48062438 0.0 11.8 m 1174.0 w 93.7% 12.6 m 47.0 s
INFO 10:15:27,622 ProgressMeter - chr22:49363910 0.0 12.3 m 1223.6 w 96.2% 12.8 m 29.0 s
INFO 10:15:57,623 ProgressMeter - chr22:50688233 0.0 12.8 m 1273.2 w 98.8% 13.0 m 9.0 s
INFO 10:16:12,379 VectorLoglessPairHMM - Time spent in setup for JNI call : 0.061128124000000006
INFO 10:16:12,379 PairHMM - Total compute time in PairHMM computeLikelihoods() : 22.846350295
INFO 10:16:12,380 HaplotypeCaller - Ran local assembly on 25679 active regions
INFO 10:16:12,434 ProgressMeter - done 5.1304566E7 13.1 m 15.0 s 100.0% 13.1 m 0.0 s
INFO 10:16:12,435 ProgressMeter - Total runtime 784.84 secs, 13.08 min, 0.22 hours
INFO 10:16:12,435 MicroScheduler - 727347 reads were filtered out during the traversal out of approximately 4410423 total reads (16.49%)
INFO 10:16:12,435 MicroScheduler - -> 2 reads (0.00% of total) failing BadCigarFilter
INFO 10:16:12,436 MicroScheduler - -> 669763 reads (15.19% of total) failing DuplicateReadFilter
INFO 10:16:12,436 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO 10:16:12,436 MicroScheduler - -> 57582 reads (1.31% of total) failing HCMappingQualityFilter
INFO 10:16:12,437 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO 10:16:12,437 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO 10:16:12,437 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
INFO 10:16:12,438 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter

Post edited by Geraldine_VdAuwera on

Best Answer

Answers

  • sicottesicotte Member

    I am still interested in the answer to this post..

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @sicotte
    Hi,

    Can you try adding -G none to your command line? I think specifying all the annotations might be screwing with the default annotations HaplotypeCaller outputs. https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php#--group

    If that does not work, can you try using VariantAnnotator to add the annotations that are missing?

    Thanks,
    Sheila

  • sicottesicotte Member

    Adding the -GC none option did not work.

    I'll try running the VariantAnnotator. That should work because I am doing genotype given allele ... and these features do not require the haplotype sequence (only the reference sequence or the alt-allele).

  • sicottesicotte Member

    The VariantAnnotator seems to be able to generate most of the annotation not output when running the HaplotypeCaller.
    -G none --annotation GCContent --annotation HomopolymerRun --annotation NBaseCount --annotation TandemRepeatAnnotator

    I still didn't see any annotation for TandemRepeatAnnotator though.(RPA,RU, STR)

    I'll try with 3.5 and let you know.

  • sicottesicotte Member

    I get the exact same behavior with 3.5, so this would either be a bug .. or an intentional behavior that needs to be documented.

    I don't mind that GC or HRun are not run by HC because they do not need the HC alignment (reference only annotation).
    It's a little more problematic for NBaseCount because it may need the "corrected" HC alignment.

    Note that those 3 works just fine if I specify the UnifiedGenotyper.

    I also tried TandemRepeatAnnotator with UG, HC or VariantAnnnotator .. and it does not work in any case? Does it work at all?

    Thanks for the help and have a great day.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @sicotte, can you check whether the annotations you requested are listed in the VCF header, or not at all? That will help us differentiate between two possible reasons for why this might be happening.

  • sicottesicotte Member
    edited January 2016

    Of course.

    In all cases above, when I said I "ran it", the annotation appeared in the header (and the original VCF had no INFO fields).
    .. and there were no errors/warning/info in the log files.

    So for all cases that I ran the TandemRepeatAnnotator, the INFO tags STR RU RPA appeared in the header.

    INFO=<ID=RPA,Number=.,Type=Integer,Description="Number of times tandem repeat unit is repeated, for each allele (including reference)">

    INFO=<ID=RU,Number=1,Type=String,Description="Tandem repeat unit (bases)">

    INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">

    .. and whenever I ran GC, Hrun, PercentNBase (did not work for HC, but OK for UG and VariantAnnotator) I got those entries

    INFO=<ID=GC,Number=1,Type=Float,Description="GC content around the variant (see docs for window size details)">

    INFO=<ID=HRun,Number=1,Type=Integer,Description="Largest Contiguous Homopolymer Run of Variant Allele In Either Direction">

    INFO=<ID=PercentNBase,Number=1,Type=Float,Description="Percentage of N bases in the pileup">

  • sicottesicotte Member

    For the TandemRepeatAnnotator, I must admit I don't know how this annotation works.. so it may be OK.

    There are probably no tandem repeats called in my variants... so if the tandem repeat annotation only works when the called alleles are part of a tandem repeat (as opposed to overlapping a tandem repeat) ... then I am not worried I see no annotation (all variants in my vcf are SNPs )

    , but the GC and Hrun and PercentNBase .. do work with UG/VA but not HC.

    Issue · Github
    by Sheila

    Issue Number
    499
    State
    closed
    Last Updated
    Assignee
    Array
    Closed By
    vdauwera
  • DaveCurtisDaveCurtis London, UKMember

    Is there any news on this? I want to use --annotation StrandAlleleCountsBySample because I am hoping it will help me identify some of the bad calls HaplotypeCaller is making. But I get this error message:
    ERROR MESSAGE: Invalid command line: Argument annotation has a bad value: Annotation StrandAlleleCountsBySample was not found; please check that you have specified the annotation name correctly

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @DaveCurtis
    Hi,

    Can you please confirm you are using the latest version (3.6) and please post the exact command you ran?

    I just tried, and SAC is annotated for me.

    -Sheila

  • DaveCurtisDaveCurtis London, UKMember

    Hi.

    Thanks. Sorry, you're right. I was using 3.5. It works with 3.6.

    • Dave
  • JaqueWangJaqueWang BrazilMember

    Hello!
    I'm using GATK4 and I have this erros message:
    USER ERROR has occurred: Argument annotation has a bad value: Annotation [StrandAlleleCountsBySample] not found; please check that you have specified the name correctly

    And my command line is:
    gatk --java-options "-Xmx64g" HaplotypeCaller -R hs38DH.fa -I Sample1.BQSR_recalibrated.bam -O Sample1.AS.g.vcf.gz -ERC GVCF -G Standard -G AS_Standard --annotation StrandAlleleCountsBySample --interval-padding 40 -TMP_DIR tmp/

    When I checked the tool documentation, it says that I can check the --list argument of VariantAnnotator. And when I checked this second one, it still doesn't give the information.
    I found in this post: https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_annotator_StrandAlleleCountsBySample.php

    Is the option StrandAlleleCountsBySample deprecated?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @JaqueWang
    Hi,

    It looks like that annotation has indeed been deprecated. You can find a complete list of annotations available in GATK4 here.

    -Sheila

  • JaqueWangJaqueWang BrazilMember
    edited October 2018

    @Sheila
    Thank you for your help.

    I discovered the annotation StrandBiasBySample and started using it in GATK 4.0.9.0
    However, at the GenotypeGVCF, I got this WARN:

    WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    My command was this one:

    gatk-4.0.9.0/gatk --java-options "-Xmx32g" GenotypeGVCFs -R hs38DH.fa -V CombinedGVCF.AS.g.vcf -O Genotype.AS.vcf -G StandardAnnotation -G AS_StandardAnnotation -A StrandBiasBySample

    And the output doesn't have the SB information.
    What do I have to put in my command to keep this flag?

  • manolismanolis Member ✭✭

    gatk-4.0.9.0 / gatk-4.0.10.0

    Hi, I was also looking to add to my final VCFs the StrandBiasBySample information.

    If I'm correct, I can have this information only in the HaplotypeCaller step, in my case in the g.VCF file.

    After GenomicsDBImport, with GenotypeGVCFs step I'm loosing this information.

    Is there any GATK way / script to pass the SB information from the g.VCF to the final VCF (post VQSR)?

    Many thanks

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @manolis

    You can use this argument to allow StrandBiasBySample information in the GenotypeGVCFs step.

    I hope this helps.

    Regards
    Bhanu

  • manolismanolis Member ✭✭
    edited October 2018

    Hi!

    I added to the original command the " -A StrandBiasBySample " option.

    ${GATK} "-Xmx5g" GenotypeGVCFs \
    -R ${hg38} \
    -O chrY.vcf \
    -V gendb://chr/Y \
    -G StandardAnnotation \
    -A StrandBiasBySample \
    --only-output-calls-starting-in-intervals \
    --use-new-qual-calculator \
    -L "chrY"
    

    I can not find the SB information in the output vcf file.

    chrY 2841972 . A G 2000.28 . AC=4;AF=0.039;AN=102;DP=1378;ExcessHet=0.0007;FS=0.000;InbreedingCoeff=0.8041;MLEAC=8;MLEAF=0.078;MQ=60.00;QD=28.99;SOR=0.844 GT:AD:DP:GQ:PL ./.:0,0:0:.:0,0,0 0/0:33,0:33:99:0,99,966 ./.:0,0:0:.:0,0,0 0/0:39,0:39:99:0,105,1575

    Here are the logs of the code with the " -A StrandBiasBySample " option:

    Using GATK jar /home/manolis/bin/gatk-4.0.10.0/gatk-package-4.0.10.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx5g -jar /home/manolis/bin/gatk-4.0.1
    0.0/gatk-package-4.0.10.0-local.jar GenotypeGVCFs -R /home/shared/resources/hgRef/hg38/Homo_sapiens_assembly38.fasta -O standard_Y.vcf -V gendb://20181020_105WEShg38pipe05_GATK4090_BQSRexHCex_GATK40100_GD
    BIchrGGVCFchr/Y -G StandardAnnotation -A StrandBiasBySample --only-output-calls-starting-in-intervals --use-new-qual-calculator -L chrY
    21:28:22.717 WARN  GATKAnnotationPluginDescriptor - Redundant enabled annotation group (StandardAnnotation) is enabled for this tool by default
    21:28:22.782 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/manolis/bin/gatk-4.0.10.0/gatk-package-4.0.10.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    21:28:24.628 INFO  GenotypeGVCFs - ------------------------------------------------------------
    21:28:24.629 INFO  GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.0.10.0
    21:28:24.629 INFO  GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
    21:28:24.629 INFO  GenotypeGVCFs - Executing as [email protected] on Linux v4.4.0-128-generic amd64
    21:28:24.630 INFO  GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_121-b15
    21:28:24.630 INFO  GenotypeGVCFs - Start Date/Time: October 20, 2018 9:28:22 PM CEST
    21:28:24.630 INFO  GenotypeGVCFs - ------------------------------------------------------------
    21:28:24.630 INFO  GenotypeGVCFs - ------------------------------------------------------------
    21:28:24.631 INFO  GenotypeGVCFs - HTSJDK Version: 2.16.1
    21:28:24.632 INFO  GenotypeGVCFs - Picard Version: 2.18.13
    21:28:24.632 INFO  GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    21:28:24.632 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    21:28:24.632 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    21:28:24.632 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    21:28:24.632 INFO  GenotypeGVCFs - Deflater: IntelDeflater
    21:28:24.633 INFO  GenotypeGVCFs - Inflater: IntelInflater
    21:28:24.633 INFO  GenotypeGVCFs - GCS max retries/reopens: 20
    21:28:24.633 INFO  GenotypeGVCFs - Requester pays: disabled
    21:28:24.633 INFO  GenotypeGVCFs - Initializing engine
    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
    21:28:26.366 INFO  IntervalArgumentCollection - Processing 57227415 bp from intervals
    21:28:26.394 INFO  GenotypeGVCFs - Done initializing engine
    21:28:26.516 INFO  ProgressMeter - Starting traversal
    21:28:26.517 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
    21:28:36.772 INFO  ProgressMeter -         chrY:2787644              0.2                  1000           5851.9
    21:28:37.368 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    21:28:37.369 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    21:28:37.369 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    ...
    ...
    ...
    

    Here are the logs of the code without the " -A StrandBiasBySample " option:

    Using GATK jar /home/manolis/bin/gatk-4.0.10.0/gatk-package-4.0.10.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx5g -jar /home/manolis/bin/gatk-4.0.1
    0.0/gatk-package-4.0.10.0-local.jar GenotypeGVCFs -R /home/shared/resources/hgRef/hg38/Homo_sapiens_assembly38.fasta -O standard_senzaSB_Y.vcf -V gendb://20181020_105WEShg38pipe05_GATK4090_BQSRexHCex_GATK
    40100_GDBIchrGGVCFchr/Y -G StandardAnnotation --only-output-calls-starting-in-intervals --use-new-qual-calculator -L chrY
    21:33:52.792 WARN  GATKAnnotationPluginDescriptor - Redundant enabled annotation group (StandardAnnotation) is enabled for this tool by default
    21:33:52.854 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/manolis/bin/gatk-4.0.10.0/gatk-package-4.0.10.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    21:33:54.747 INFO  GenotypeGVCFs - ------------------------------------------------------------
    21:33:54.747 INFO  GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.0.10.0
    21:33:54.747 INFO  GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
    21:33:54.748 INFO  GenotypeGVCFs - Executing as [email protected] on Linux v4.4.0-128-generic amd64
    21:33:54.748 INFO  GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_121-b15
    21:33:54.749 INFO  GenotypeGVCFs - Start Date/Time: October 20, 2018 9:33:52 PM CEST
    21:33:54.749 INFO  GenotypeGVCFs - ------------------------------------------------------------
    21:33:54.749 INFO  GenotypeGVCFs - ------------------------------------------------------------
    21:33:54.750 INFO  GenotypeGVCFs - HTSJDK Version: 2.16.1
    21:33:54.750 INFO  GenotypeGVCFs - Picard Version: 2.18.13
    21:33:54.750 INFO  GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    21:33:54.750 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    21:33:54.751 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    21:33:54.751 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    21:33:54.751 INFO  GenotypeGVCFs - Deflater: IntelDeflater
    21:33:54.751 INFO  GenotypeGVCFs - Inflater: IntelInflater
    21:33:54.751 INFO  GenotypeGVCFs - GCS max retries/reopens: 20
    21:33:54.751 INFO  GenotypeGVCFs - Requester pays: disabled
    21:33:54.752 INFO  GenotypeGVCFs - Initializing engine
    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
    21:33:56.505 INFO  IntervalArgumentCollection - Processing 57227415 bp from intervals
    21:33:56.537 INFO  GenotypeGVCFs - Done initializing engine
    21:33:56.680 INFO  ProgressMeter - Starting traversal
    21:33:56.681 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
    21:34:07.262 INFO  ProgressMeter -         chrY:2787644              0.2                  1000           5671.6
    21:34:09.324 WARN  InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
    21:34:11.267 WARN  InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
    21:34:12.621 WARN  InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
    

    ...
    ...
    ...

    Moreover, in both vcf outputs, with and without "-A SB" option, in the headers is present:

    ##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">

    link SB

    Post edited by manolis on
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hello there again @manolis,

    Please see Article#10836 for general information about annotations. Basically, if an annotation requires read data but you are asking a tool to calculate it based on VCF annotations alone, the tool will not calculate it. The StrandBiasBySample annotation document states:

    This annotation can only be generated by HaplotypeCaller (it will not work when called from VariantAnnotator).

    From this, we can deduce that read data is required. GenotypeGVCFs will not be able to produce this. If you are interested in this annotation, please add it to your HaplotypeCaller command.

  • JaqueWangJaqueWang BrazilMember

    Hi @shlee

    I added this annotation on HaplotypeCaller:
    /usr/local/gatk-4.0.9.0/gatk --java-options "-Xmx32g" HaplotypeCaller --emit-ref-confidence GVCF -R $reference -I $bam_file --interval-padding 50 -G StandardAnnotation -G AS_StandardAnnotation -G StandardHCAnnotation -A StrandBiasBySample -L $bed_file -O $haplotype

    Then, I kept it on CombineGVCF of samples:
    /usr/local/gatk-4.0.9.0/gatk --java-options "-Xmx32g" CombineGVCFs -R $reference --variant sample1.g.vcf --variant sample2.g.vcf -O $combine -G StandardAnnotation -G AS_StandardAnnotation -A StrandBiasBySample

    And CombineGVCF of combined GVCFs:
    /usr/local/gatk-4.0.9.0/gatk --java-options "-Xmx32g" CombineGVCFs -R $reference --variant combined1.g.vcf --variant combined2.g.vcf -O $combine -G StandardAnnotation -G AS_StandardAnnotation -A StrandBiasBySample

    After all, I did the GenotypeGVCF:
    gatk-4.0.9.0/gatk --java-options "-Xmx32g" GenotypeGVCFs -R hs38DH.fa -V CombinedGVCF.AS.g.vcf -O Genotype.AS.vcf -G StandardAnnotation -G AS_StandardAnnotation -A StrandBiasBySample

    However, I have the following WARN:
    WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is nul

    I kept the annotation in all the steps, since HaplotypeCaller, and it still didn't give me the SB flag.

    What do I have to do to keep the SB flag?

  • JaqueWangJaqueWang BrazilMember

    Hi @shlee @Geraldine_VdAuwera @Sheila @bhanuGandham

    I was wondering if you have any kind of answer to this.

    At version 3.7, I was able to annotate the StrandAlleleCountsBySample from HaplotypeCaller until GenotypeGVCF, keeping this information in the gvcf file.

    With the version 4.0.9, the problem with some annotation started, and you deprecated some (including the one that I was using). So, I started to try to use StrandBiasBySample and I kept this annotation from HaplotypeCaller until GenotypeGVCFs, but for my surprise it wasn't working (you can read about it above).

    So, I tested today again (and still waiting for some response from you). And I realize that the annotation ONLY works if I do a VCF with HaplotypeCaller.

    Actually, I really don't understand why you are taking off some things that were working before. The improved version (in my point of view) is worst than the previous. Unless, you think that the annotation that you were doing before were wrong.

    Please, I really need a answer for this.
    Thank you.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @JaqueWang

    Can you please try this with the latest GATK4.1.1.0 and check if the issue persists?

  • JaqueWangJaqueWang BrazilMember

    Hi @bhanuGandham.

    Sorry for taking so long to give you an answer.
    This issue persists with GATK4.1.1.0:
    17:10:14.214 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    Hi @JaqueWang

    This is quite a bit of information about this warning in this post on the forum here

    1) Not all variants in a vcf are annotated and that results in the WARN messages you see here.The messages state why that particular variant was not annotated.
    
    2)in GATK4, there are 4 logging levels (ERROR, WARNING, INFO, and DEBUG). If you want to silence warnings you can set --verbosity ERROR.
    
    
  • JaqueWangJaqueWang BrazilMember

    Hi @AdelaideR .

    I know that, my point is that I'm trying to annotate de StrandBias and it is not working. I put the flag on HaplotypeCaller and hept it until GenotypeGVCF. However, somehow this annotation vanished.

    I looked into the single GVCF and the combined GVCF and this annotation was there.
    When I looked into the vcf created by GenotypeGVCF, it wasn't there anymore.

    I said this previously:
    "Actually, I really don't understand why you are taking off some things that were working before. The improved version (in my point of view) is worst than the previous. Unless, you think that the annotation that you were doing before were wrong."
    Do you have any answer to this?

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    HI JaqueWang -

    The only thing I can think of is that the settings that are removing these annotations are linked to a value that indicates that they should be filtered out. What do your PL and GQ scores say for this particular allele? If it has low coverage or low mapping quality, it may not pass the filters.

  • JaqueWangJaqueWang BrazilMember

    Hi @AdelaideR .

    The removing is happening in ALL sites. Not only some of them.

    Here is a line from Combine GVCF file:
    chr1 17033181 . C A, . . AS_RAW_BaseQRankSum=|1.8,2|;AS_RAW_MQ=7200.00|2772000.00|0.00;AS_RAW_MQRankSum=|0.0,2|;AS_RAW_ReadPosRankSum=|0.5,1,1.5,1|;AS_SB_TABLE=0,2|256,514|0,0;BaseQRankSum=1.84;DP=772;ExcessHet=3.01;MQRankSum=0.00;RAW_MQandDP=2779200,772;ReadPosRankSum=1.59 GT:AD:DP:GQ:PL:SB ./.:0,65,0:65:99:2136,195,0,2136,195,2136:0,0,20,45 ./.:0,53,0:53:99:1559,157,0,1559,157,1559:0,0,19,34 ./.:0,75,0:75:99:2398,225,0,2398,225,2398:0,0,25,50 ./.:0,39,0:39:99:1310,117,0,1310,117,1310:0,0,13,26 ./.:0,52,0:52:99:1734,156,0,1734,156,1734:0,0,17,35 ./.:0,69,0:69:99:2105,206,0,2105,206,2105:0,0,25,44 ./.:1,77,0:78:99:2487,224,0,2490,231,2497:0,1,24,53 ./.:0,66,0:66:99:2138,197,0,2138,197,2138:0,0,25,41 ./.:0,70,0:70:99:2158,209,0,2158,209,2158:0,0,21,49 ./.:0,76,0:76:99:2304,226,0,2304,226,2304:0,0,23,53 ./.:0,69,0:69:99:2164,206,0,2164,206,2164:0,0,24,45 ./.:1,59,0:60:99:1763,169,0,1765,176,1772:0,1,20,39

    And here is the same line from vcf file after Genotype GVCF:
    chr1 17033181 . C A 24347.26 . AC=24;AF=1.00;AN=24;AS_BaseQRankSum=1.800;AS_FS=0.000;AS_InbreedingCoeff=-0.0000;AS_MQ=60.00;AS_MQRankSum=0.000;AS_QD=31.54;AS_ReadPosRankSum=1.000;AS_SOR=0.369;BaseQRankSum=1.84;DP=772;ExcessHet=3.0103;FS=0.000;InbreedingCoeff=-0.0000;MLEAC=24;MLEAF=1.00;MQ=60.00;MQRankSum=0.00;QD=31.54;ReadPosRankSum=1.59;SOR=0.369 GT:AD:DP:GQ:PL 1/1:0,65:65:99:2136,195,0 1/1:0,53:53:99:1559,157,0 1/1:0,75:75:99:2398,225,0 1/1:0,39:39:99:1310,117,0 1/1:0,52:52:99:1734,156,0 1/1:0,69:69:99:2105,206,0 1/1:1,77:78:99:2487,224,0 1/1:0,66:66:99:2138,197,0 1/1:0,70:70:99:2158,209,0 1/1:0,76:76:99:2304,226,0 1/1:0,69:69:99:2164,206,0 1/1:1,59:60:99:1763,169,0

    As I said, the SB annotation vanishes and the variant continues.

    Isn't it odd?

Sign In or Register to comment.