GATK ERROR MESSAGE: 38 HaplotipeCaller

Hello everyone,

I´m using HaplotypeCaller program in whole sheep genome.

The next paragraph is the command used for all 158 samples. We use nodes of 16 cores (-ntc 16) and 28 Gb of memory RAM.

Could you tell me what the next mistake might be?
thank you in advance
-------------------------------------------------------------------------------------------------------------------------------------------------------------

ERROR --
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: 38
at org.broadinstitute.gatk.tools.walkers.annotator.BaseQualityRankSumTest.getElementForRead(BaseQualityRankSumTest.java:96)
at org.broadinstitute.gatk.tools.walkers.annotator.RankSumTest.getElementForRead(RankSumTest.java:209)
at org.broadinstitute.gatk.tools.walkers.annotator.RankSumTest.fillQualsFromLikelihoodMap(RankSumTest.java:187)
at org.broadinstitute.gatk.tools.walkers.annotator.RankSumTest.annotate(RankSumTest.java:104)
at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContextForActiveRegion(VariantAnnotatorEngine.java:315)
at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContextForActiveRegion(VariantAnnotatorEngine.java:260)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.annotateCall(HaplotypeCallerGenotypingEngine.java:328)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:290)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:970)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:252)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.8-0-ge9d806836):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR

##### ERROR MESSAGE: 38

##### ERROR ------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------------------------------------------------------------

Best Answer

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭
    Accepted Answer

    16 threads might be depleting possible sources to process. It is better you try with nct no higher than 4 or just try single threaded.

  • Hi all,
    I am trying to run Realignment stage from SNP calling for whole genome sequencing and getting the following error massage. any idea what this might be?
    thanks!
    Mostafa

    Code i run:
    java -jar /home/m.rafiepour222/GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar –R /home/m.rafiepour222/GCF_000298355.1_BosGru_v2.0_genomic.fa -T RealignerTargetCreator –o /home/m.rafiepour222/SRR3112430/SRR3112430indels_Realigner.intervals -I /home/m.rafiepour222/SRR3112430/SRR3112430.sort.rmdup.bam

    My error massage:
    INFO 20:15:55,513 HelpFormatter - ----------------------------------------------------------------------------------
    INFO 20:15:55,518 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50
    INFO 20:15:55,518 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
    INFO 20:15:55,518 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
    INFO 20:15:55,519 HelpFormatter - [Sun Feb 04 20:15:55 IRST 2018] Executing on Linux 2.6.32-431.el6.x86_64 amd64
    INFO 20:15:55,521 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_131-b11
    INFO 20:15:55,527 HelpFormatter - Program Args: -R /home/m.rafiepour222/GCF_000298355.1_BosGru_v2.0_genomic.fa -T RealignerTargetCreator -o /home/m.rafiepour222/SRR3112430/SRR3112430indels_Realigner.intervals -I /home/m.rafiepour222/SRR3112430/SRR3112430.sort.rmdup.bam
    INFO 20:15:55,536 HelpFormatter - Executing as [email protected] on Linux 2.6.32-431.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_131-b11.
    INFO 20:15:55,537 HelpFormatter - Date/Time: 2018/02/04 20:15:55
    INFO 20:15:55,537 HelpFormatter - ----------------------------------------------------------------------------------
    INFO 20:15:55,537 HelpFormatter - ----------------------------------------------------------------------------------
    ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/home/m.rafiepour222/GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
    ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
    INFO 20:15:55,780 GenomeAnalysisEngine - Deflater: IntelDeflater
    INFO 20:15:55,783 GenomeAnalysisEngine - Inflater: IntelInflater
    INFO 20:15:55,784 GenomeAnalysisEngine - Strictness is SILENT
    INFO 20:33:00,339 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 20:33:00,378 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 20:33:03,778 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 3.33
    INFO 20:50:24,247 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
    INFO 21:07:31,116 GenomeAnalysisEngine - Done preparing for traversal
    INFO 21:07:31,117 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 21:07:31,119 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 21:07:31,119 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 3.8-0-ge9d806836):
    ERROR
    ERROR This means that one or more arguments or inputs in your command are incorrect.
    ERROR The error message below tells you what is the problem.
    ERROR
    ERROR If the problem is an invalid argument, please check the online documentation guide
    ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ERROR
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions https://software.broadinstitute.org/gatk
    ERROR
    ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ERROR
    ERROR MESSAGE: SAM/BAM/CRAM file [email protected]19bb60d appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 62. Please see https://software.broadinstitute.org/gatk/documentation/article?id=6470 for more details and options related to this error.
    ERROR ------------------------------------------------------------------------------------------
  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    this tells you the exact story

    [email protected]19bb60d appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 62. Please see

    How did you generate the bam file? What is the sequencing technology?

  • for generate bam file code i use:
    samtools view -b -S -o SRR3112430.bam SRR3112430.sam
    and sequencing technology is whole genome sequencing

  • I have another problem and that's it, i do not have the VCF file of the reference genome for stage Realignment.

  • Thank you very much SkyWarrior.
    I'll try it even though the time will increase a lot, I hope it ends soon and I can reflect the result in a commentary.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited February 8

    @m_rafiepour222
    Hi,

    Did this article in the error message help you?

    If you do not have a known sites VCF file, you can still run Indel Realignment step without it, but it will take longer. If you are using HaplotypeCaller or Mutect2, you do not need to run that step. Have a look at this blog post.

    -Sheila

  • Hi Sheila, many thanks for your reply,

    yes i using HaplotypeCaller and ignored the indel realignment step. But I have a problem in the HaplotypeCaller Step for Whole Genome Sequencing Data. after run this step i getting the following error massage (In the attached image).Due to two errors created (in the image), the run step begins and remains in this position for more than two days without having an output? any idea what this might be?

    Code i run:

    java –Xmx64g -jar /home/m.rafiepour222/GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar –R /home/m.rafiepour222/GCF_000298355.1_BosGru_v2.0_genomic.fa -T HaplotypeCaller -I /home/m.rafiepour222/SRR3112417/SRR3112417.sort.rmdup.bam -o /home/m.rafiepour222/SRR3112417/SRR3112417.raw.snps.indels.g.vcf

    Best Regard

    Mostafa

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @m_rafiepour222
    Hi,

    Yep, that was a bug in 3.8 that is fixed in later versions. Have a look at this thread. Is it possible to upgrade to version 4? If not, 3.8-1 is not available in the downloads right now, but I have asked the team to add it.

    -Sheila

  • Hi Sheila, many thanks for guidance. Ok good, I'm waiting for you to send me the download link for 3.8-1.
    It's worth mentioning that I used the latest version (gatk-4.0.1.1 & gatk-4.0.1.2), but I've gotten the error again. But not similar the shared error.

    Best regard
    mostafa

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @m_rafiepour222
    Hi Mostafa,

    Yes, hopefully the download is available soon. But, are you saying you got the same error in GATK4? Can you post the exact command you ran and log output from GATK4?

    Thanks,
    Sheila

  • edited February 15

    Hi Sheila,

    Yes Sure,

    Code i use:

    java -jar /home/m.rafiepour222/gatk-4.0.1.2/gatk-pack age-4.0.1.2-local.jar –R /home/m.rafiepour222/GCF_000298355.1_BosGru_v2.0_genomi c.fa -T HaplotypeCaller -I /home/m.rafiepour222/SRR3112430/SRR3112430.sort.r mdup.bam -o /home/m.rafiepour222/SRR3112430/SRR3112430.raw.snps.indels.gvcf

    and my error:

    Post edited by m_rafiepour222 on
  • edited February 15

    @m_rafiepour222 said:
    Hi Sheila,

    Yes Sure,

    Code i use:

    java -jar /home/m.rafiepour222/gatk-4.0.1.2/gatk-pack age-4.0.1.2-local.jar –R /home/m.rafiepour222/GCF_000298355.1_BosGru_v2.0_genomi c.fa -T HaplotypeCaller -I /home/m.rafiepour222/SRR3112430/SRR3112430.sort.r mdup.bam -o /home/m.rafiepour222/SRR3112430/SRR3112430.raw.snps.indels.gvcf

    and my error:


    Post edited by m_rafiepour222 on
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @m_rafiepour222
    Hi Mostafa,

    Ah, yes, the commands have changed slightly in GATK4. Have a look at the tool doc. The -T is gone :smile:

    Also, best to upgrade to GATK4, but 3.8-1 is now available on the downloads page.

    -Sheila

  • Hi Sheila,

    many thanks for your reply, But on the download page is the only GATK-4.0.1.2 version for download??

    Can not find the link to download for 3.8-1??

    Best Regard

    mostafa

  • Hi Sheila,

    I found the 3.8-1 on the download page.

    I'm running my code and I think the error has already been resolved ?? Is my guess right?
    See the photo in the attachment.


  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @m_rafiepour222
    Hi Mostafa,

    Yep, looks like it is running fine now :smile:

    -Sheila

  • Hi Sheila,

    many thanks for your support and guidance. I have created g.vcf file and i have a question about it. i want to know if my g.vcf file is properly created or not?
    My sort.rmdup.bam file size is 25,746,082 kb and size g.vcf file created 1,711,132 kb. is it right??

    Please see the photo in the attachment:

  • Hi Sheila,

    I also have another question that is related to my code for create g.vcf gile. First, I will explain to you that my goal is the SNP Calling for samples that are whole genome sequenced (DNA-Seq).

    Code i use now:

    And you suggested the two following code in the best practice for DNA-Seq. So, which item do you suggest according to my goal?? Or do you think the code I'm using now is right and no problem?
    This is a challenge for me, please help me.


  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @m_rafiepour222
    Hi Mostafa,

    1) You can use ValidateVariants with --validate-GVCF to find out if there are issues with your GVCF.

    2) Have a look at this thread.

    -Sheila

  • Hi Sheila,

    You have already told me that If you have a lot of samples (i have 50 whole genome sequencing samples (1 TB size)), use the GVCF workflow to SNP Calling. Alright?
    So, How can I access the GVCF workflow? Because I could not find it !! please send me GVCF workflow or link for download.

    Best Regard

    Mostafa

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @m_rafiepour222
    Hi Mostafa,

    Here is a link to running the GVCF workflow.

    -Sheila

  • Hi Sheila,

    I am running the Recalibration step,

    You previously told me that if you use HaplotypeCaller, No need to do the Indel Realignment step, right? But what about the Recalibration step? Do not need this step when using HaplotypeCaller?

    as you said, I did not do the Indel Realignment step. But I've done the Recalibration and i encounter the following error (Picture in Attachment):

    It should be noted that For my reference genome, do not have a known sites VCF file. So the -knownSites option is removed from my command line.

    the run step begins and remains in this position for long time without having an output? any idea what this might be?

    Code i run:

    java -Xmx64g -jar /home/m.rafiepour222/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef/GenomeAnalysisTK.jar -R /home/m.rafiepour222/GCF_000471725.1_UMD_CASPUR_WB_2.0_genomic.fa -T BaseRecalibrator -I /home/m.rafiepour222/2_BBKHU02_M/2_BBKHU02_M.sort.rmdup.bam -o /home/m.rafiepour222/2_BBKHU02_M/2_BBKHU02_M.grp

    My Error:

  • Hi Sheila,

    In connection with the question shared above, I encountered the following error after a day:

    As seen in the image, the error is associated with the same known sites VCF file...

    what should I do? What do you think about it?

    My Error:

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @m_rafiepour222
    Hi Mostafa,

    Yes, we still recommend Base Recalibration step. Geraldine calls it "fire insurance", as it can help if there are any issues in your data.

    In your case, you need to input a known variant sites file, as it is very important for BaseRecalibrator to make the models. You can read more about it here. Particularly, have a look under "No Excuses".

    Also, this thread may help.

    -Sheila

  • Hi Sheila,

    many thanks for your reply,

    I've gone to the Address links that you shared, and I've read about this problem. But other people who have had the same problem with me have come to the conclusion that the solutions offered do not work.

    For example, this address (http://evodify.com/gatk-the-best-practice-for-genotype-calling-in-a-non-model-organism/)

    Are the content at this address approved by you?

    Best Regard

    Mostafa

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @m_rafiepour222
    Hi Mostafa,

    No, we don't formally approve of anything that is not on our forum :smile: But, it is true our Best Practices cater to human data. In your case, it would be best to test out other ways and see if that works better for your non-model organism. If you do find that other "Best Practices" work better for your particular organism, please do let us know on the forum. Hopefully it will help other users in your position.

    -Sheila

  • Hi Sheila,

    I study the genome of Buffalo (whole genome sequencing data). I'm running the HaplotypeCaller stage.
    But I have a problem at this stage.

    my problem is that after running the command line, Long time remains in run stage? (Please see the photo in the attachment): GATK-3.8-1-0

    code i run:

    java -jar /home/m.rafiepour222/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef/GenomeAnalysisTK.jar -R /home/m.rafiepour222/GCF_000471725.1_UMD_CASPUR_WB_2.0_genomic.fa -T HaplotypeCaller -nct 30 -I /home/m.rafiepour222/BBKHU01_F/BBKHU01_F.sort.rmdup.bam --emitRefConfidence GVCF -o /home/m.rafiepour222/BBKHU01_F/BBKHU01_F.raw.snps.indels.g.vcf

    But when i change the GATK version (Latest version, GATK-4.0.4.0), run is done (Please see the photo in the attachment):

    code i run:

    ./gatk HaplotypeCaller -R /home/m.rafiepour222/GCF_000471725.1_UMD_CASPUR_WB_2.0_genomic.fa --native-pair-hmm-threads 4 -I /home/m.rafiepour222/2_BBKHU02_M/2_BBKHU02_M.sort.rmdup.bam -O /home/m.rafiepour222/2_BBKHU02_M/2_BBKHU02_M_variants.g.vcf -ERC GVCF

    But as seen in the photo, there seems to be a problem?? and that's about these phrases (Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null). Which is seen in a large number at runtime. As you see in the photo.

    Is the run correct?

    What about these phrases (Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null)? Is it normal or can there be a solution to remove them?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @m_rafiepour222
    Hi Mostafa,

    Those are just WARN statements letting you know the Depth annotation is not calculated (probably there are no reads present at the site). Can you let the tool run to completion and see if there is an output with variants produced?

    -Sheila

    P.S. So cool you are working with buffalo!

  • edited May 4

    Hi Sheila,

    I am using the GATK 4.0.4.0 for HaplotypeCaller. and i want to run 50 samples in parallel to generate g.vcf. But I can not find the -nct option that available in the old version of GATK, for the GATK 4.0.4.0 version. did something replace -nct in the GATK version 4.0.4.0?

    What is the best idea?

    This question was previously asked by prasundutta, but I did not get a result and did not find anything (https://gatkforums.broadinstitute.org/gatk/discussion/11304/nct-not-present-in-gatk-4-0-0-0).

    Best Regard

    Mostafa

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    -nt -nct are removed for good in GATK4. You need to use spark versions of the tools for multithreading.

  • Hi SkyWarrior,

    many thanks for your reply,

    what do you mean "Spark versions of the tools"... any link or example?

    Mostafa

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Check the list of tools with gatk --list command.

    There you will see some of the tool names with Spark. Those are the spark versions but they are still in beta and not ready for production use (AFAIK).

  • Not ready to use? when is it ready to use?

    Do you mean HaplotypeCallerSpark in the attached photo?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @m_rafiepour222
    Hi Mostafa,

    Yes, that is the Spark version of HaplotypeCaller which replaces -nt/-nct. The Spark versions are still in beta, but the team is working on taking them out of beta. I am not sure of the exact time frame for taking them out of beta.

    -Sheila

  • Hi Sheila,

    What is the difference between HaplotypeCaller and UnifiedGenotyper?

    I want to know what is the difference between HaplotypeCaller and UnifiedGenotyper for SNP Calling in the whole genome sequencing? In other words, which one is better for SNP Calling?

    Best Regard

    Mostafa

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @m_rafiepour222
    Hi Mostafa,

    Have a look at this thread.

    -Sheila

  • edited August 30

    Hi Sheila,

    i hope you are fine. Also, thank you for the good solutions you always send me.

    Dear Sheila, i want to use the VariantsToTable option to convert VCF to a table. But the fields that i want to use, can not find? I want to have columns SNP, InDels, Bi-allelic, Multi-allelic in my output table.

    It is worth mentioning that I found TRANSITION option for SNP and MULTI-ALLELIC option for Multi-allelic. But for InDels and Bi-allelic, i could not find an option??

    Best Regard

    Mostafa

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @m_rafiepour222
    Hi Mostafa,

    Thank you for your kind words :smiley: I am happy I have helped.

    It is worth mentioning that I found TRANSITION option for SNP and MULTI-ALLELIC option for Multi-allelic. But for InDels and Bi-allelic, i could not find an option??

    What do you mean by this?

    I am not sure if this is possible in GATK4, but in GATK3, there is an annotation called VariantType that may help.

    -Sheila

Sign In or Register to comment.