Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

StrandBias annotation generated by HaplotypeCaller is absent in output of GenotypeGVCFs

luciolucio InvernessMember

Dear GATK staff and forum community,

My question follows on from a comment posted in a broader topic, on annotations not working for haplotype caller
https://gatkforums.broadinstitute.org/gatk/discussion/comment/58658#Comment_58658
In my case I want to include the StrandBiasBySample (SB) annotation (https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_annotator_StrandBiasBySample.php), and HaplotypeCaller generated the annotation successfully across all my files (e.g. GT:AD:DP:GQ:PL:SB 1/1:0,2,0:2:6:73,6,0,73). However, the output from GenotypeGVCFs does not have any SB annotation. My commands and log below specify java version and gatk package. I cannot seem to find any answers posted on this, it would be great to know if there is a way of incorporating the annotation with GenotypeGVCFs or not.

Many thanks for your help!

All the best,

Lucio

/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -jar ~/Mito_reads/gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar HaplotypeCaller \
-R ~/Mito_reads/data/ref_seqs/NC_001960.1_Salmo.fa \
-I $i \
-O "$i".SB.g.vcf \
--emit-ref-confidence GVCF \
-A StrandBiasBySample \
-A AS_StrandOddsRatio \
-A QualByDepth

/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -jar ~/Mito_reads/gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar GenomicsDBImport \
--genomicsdb-workspace-path ~/Mito_reads/data/Demultiplexed_Salmon/db_SB/ \
--sample-name-map ~/Mito_reads/data/Demultiplexed_Salmon/vcf_SB.map \
--intervals NC_001960.1:1-16665

/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -jar ~/Mito_reads/gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar GenotypeGVCFs \
-R ~/Mito_reads/data/ref_seqs/NC_001960.1_Salmo.fa \
-V gendb:///home/lmarcello/Mito_reads/data/Demultiplexed_Salmon/db_SB/ \
-O ~/Mito_reads/data/Demultiplexed_Salmon/GATKsalmon_SB.vcf

if I run grep -c ":SB" on any of my samples I get a number corresponding to the number of variants (i.e. SB has been added to all variants as annotation), whereas if I run grep -c ":SB" on GATKsalmon_SB.vcf I get 0.

Here is the log from running GenotypeGVCFs, there does not seem to be any mention of SB

(base) [email protected]:~/Mito_reads/data/Demultiplexed_Salmon$ sh GATK_GenotypeGVCFs.sh
23:28:32.694 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/lmarcello/Mito_reads/gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
23:28:34.323 INFO GenotypeGVCFs - ------------------------------------------------------------
23:28:34.324 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.0.11.0
23:28:34.324 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
23:28:34.324 INFO GenotypeGVCFs - Executing as [email protected] on Linux v4.15.0-54-generic amd64
23:28:34.324 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10
23:28:34.324 INFO GenotypeGVCFs - Start Date/Time: 17 October 2019 23:28:32 BST
23:28:34.324 INFO GenotypeGVCFs - ------------------------------------------------------------
23:28:34.324 INFO GenotypeGVCFs - ------------------------------------------------------------
23:28:34.324 INFO GenotypeGVCFs - HTSJDK Version: 2.16.1
23:28:34.324 INFO GenotypeGVCFs - Picard Version: 2.18.13
23:28:34.324 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
23:28:34.324 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
23:28:34.324 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
23:28:34.324 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
23:28:34.324 INFO GenotypeGVCFs - Deflater: IntelDeflater
23:28:34.324 INFO GenotypeGVCFs - Inflater: IntelInflater
23:28:34.324 INFO GenotypeGVCFs - GCS max retries/reopens: 20
23:28:34.324 INFO GenotypeGVCFs - Requester pays: disabled
23:28:34.325 INFO GenotypeGVCFs - Initializing engine
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
23:28:34.752 INFO GenotypeGVCFs - Done initializing engine
23:28:34.783 INFO ProgressMeter - Starting traversal
23:28:34.784 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
23:28:36.213 WARN ReferenceConfidenceVariantContextMerger - Detected invalid annotations: When trying to merge variant contexts at location NC_001960.1:639 the annotation AS_SB_TABLE=0,0|0,0|0,0 was not a numerical value and was ignored
23:28:41.267 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
23:28:43.364 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
23:28:43.366 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
23:28:45.448 INFO ProgressMeter - NC_001960.1:7595 0.2 3000 16879.2
23:28:45.896 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
23:28:47.749 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
23:28:49.051 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
23:28:53.357 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
23:28:53.392 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
23:28:55.493 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
23:28:55.549 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
23:28:56.594 INFO ProgressMeter - NC_001960.1:15684 0.4 7000 19257.2
23:28:58.321 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),3.0794550240000023,Cpu time(s),3.064977544000002
23:28:58.333 INFO GenotypeGVCFs - No variants filtered by: AllowAllVariantsVariantFilter
23:28:58.333 INFO ProgressMeter - NC_001960.1:15684 0.4 7853 20008.5
23:28:58.333 INFO ProgressMeter - Traversal complete. Processed 7853 total variants in 0.4 minutes.
23:28:58.336 INFO GenotypeGVCFs - Shutting down engine
[17 October 2019 23:28:58 BST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.43 minutes.
Runtime.totalMemory()=1613758464

Answers

Sign In or Register to comment.