Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

NaN error when running FilterMutectCalls with gatk 4.0.12.0 (phredScaleLog10ErrorRate)

pwaltmanpwaltman New York, NYMember

I'm getting a NaN error when I'm trying to run the FilterMutectCalls walker, following the instructions in step 4 of the How To guide (https://gatkforums.broadinstitute.org/gatk/discussion/11136/how-to-call-somatic-mutations-using-gatk4-mutect2#4)

This is actually the 2nd sample vcf that I've gotten this error with, and I can provide both vcfs to you for your own testing/debugging, if you would like.

Thanks.

The complete error message and stack-trace is:
Using GATK jar /ifs/scratch/c2b2/ac_lab/pw2470/devel/mskilab/flows/modules/FilterMutectCalls/gatk-4.0.12.0/gatk-package-4.0.12.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx8g -jar /ifs/scratch/c2b2/ac_lab/pw2470/devel/mskilab/flows/modules/FilterMutectCalls/gatk-4.0.12.0/gatk-package-4.0.12.0-local.jar FilterMutectCalls -V /ifs/scratch/c2b2/ac_lab/pw2470/PROJECTS/patient0/Flow/Mutect2_rsync_b37/CUAC1857/hand_combined_partial_result_with_contig_specific_results/manually_combined_mutect2_mutations.vcf --contamination-table 02_small_exac_calcualatecontamination.table -O 03_gatk4_m2_snvs_indels_1st_pass_filter_with_cont_table.vcf.gz
13:27:55.287 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/ifs/scratch/c2b2/ac_lab/pw2470/devel/mskilab/flows/modules/FilterMutectCalls/gatk-4.0.12.0/gatk-package-4.0.12.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
13:27:58.696 INFO FilterMutectCalls - ------------------------------------------------------------
13:27:58.696 INFO FilterMutectCalls - The Genome Analysis Toolkit (GATK) v4.0.12.0
13:27:58.696 INFO FilterMutectCalls - For support and documentation go to https://software.broadinstitute.org/gatk/
13:27:58.697 INFO FilterMutectCalls - Executing as [email protected] on Linux v4.15.0-43-generic amd64
13:27:58.697 INFO FilterMutectCalls - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_191-b12
13:27:58.697 INFO FilterMutectCalls - Start Date/Time: January 3, 2019 1:27:55 PM EST
13:27:58.697 INFO FilterMutectCalls - ------------------------------------------------------------
13:27:58.697 INFO FilterMutectCalls - ------------------------------------------------------------
13:27:58.698 INFO FilterMutectCalls - HTSJDK Version: 2.18.1
13:27:58.698 INFO FilterMutectCalls - Picard Version: 2.18.16
13:27:58.698 INFO FilterMutectCalls - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:27:58.698 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:27:58.699 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:27:58.699 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:27:58.699 INFO FilterMutectCalls - Deflater: IntelDeflater
13:27:58.699 INFO FilterMutectCalls - Inflater: IntelInflater
13:27:58.699 INFO FilterMutectCalls - GCS max retries/reopens: 20
13:27:58.699 INFO FilterMutectCalls - Requester pays: disabled
13:27:58.699 INFO FilterMutectCalls - Initializing engine
13:28:00.140 INFO FeatureManager - Using codec VCFCodec to read file file:///ifs/scratch/c2b2/ac_lab/pw2470/PROJECTS/patient0/Flow/Mutect2_rsync_b37/CUAC1857/hand_combined_partial_result_with_contig_specific_results/manually_combined_mutect2_mutations.vcf
13:28:00.623 INFO FilterMutectCalls - Done initializing engine
13:28:00.979 INFO ProgressMeter - Starting traversal
13:28:00.979 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
13:28:00.980 INFO FilterMutectCalls - Starting first pass through the variants
13:28:03.482 INFO FilterMutectCalls - Shutting down engine
[January 3, 2019 1:28:03 PM EST] org.broadinstitute.hellbender.tools.walkers.mutect.FilterMutectCalls done. Elapsed time: 0.14 minutes.
Runtime.totalMemory()=1011351552
java.lang.IllegalArgumentException: errorRateLog10 must be good probability but got NaN
at org.broadinstitute.hellbender.utils.QualityUtils.phredScaleLog10ErrorRate(QualityUtils.java:321)
at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2FilteringEngine.lambda$applyGermlineVariantFilter$10(Mutect2FilteringEngine.java:207)
at java.util.stream.DoublePipeline$3$1.accept(DoublePipeline.java:231)
at java.util.Spliterators$DoubleArraySpliterator.forEachRemaining(Spliterators.java:1198)
at java.util.Spliterator$OfDouble.forEachRemaining(Spliterator.java:822)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:545)
at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
at java.util.stream.IntPipeline.toArray(IntPipeline.java:502)
at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2FilteringEngine.applyGermlineVariantFilter(Mutect2FilteringEngine.java:207)
at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2FilteringEngine.calculateFilters(Mutect2FilteringEngine.java:436)
at org.broadinstitute.hellbender.tools.walkers.mutect.FilterMutectCalls.firstPassApply(FilterMutectCalls.java:120)
at org.broadinstitute.hellbender.engine.TwoPassVariantWalker.lambda$traverseVariants$0(TwoPassVariantWalker.java:76)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.TwoPassVariantWalker.traverseVariants(TwoPassVariantWalker.java:74)
at org.broadinstitute.hellbender.engine.TwoPassVariantWalker.traverse(TwoPassVariantWalker.java:27)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)

Answers

  • pwaltmanpwaltman New York, NYMember

    I should add that these bams were aligned with bwa-aln, and recalibrated, using GATK4. Not sure if that's relevant, but I've now run into this issue witha a 3rd vcf, so I'd love it if we could figure out this issue.

    My exact workflow for generating the bams was, as follows:
    1. align lane/library-specific fastq's with bwa-aln
    2. recalibrate the bams generated in step 1
    3. merge the bams from step 2 into a single bam
    4. recalibrate the bam from step 3
    5. run mutect2 from gatk4 on the bam from step 4
    6. run filterMutectCalls process on the raw calls from step 5 <-------- THIS PRODUCES THE NaN ERROR

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @pwaltman - This is an interesting error, it may be introduced during alignment, but some more information is needed for trouble shooting. What version of bwa-aln are you using and are you using a reference genome from the GATK Resource Bundle? It would be helpful also to see the exact commands with parameters used for steps 1-6. Are you using the latest version of GATK4?

  • pwaltmanpwaltman New York, NYMember
    edited January 4

    In order:
    1. I'm using bwa version 0.7.17-r1188
    2. I'm using both b37 and hg38 from the broad (this is occurring with samples that have been aligned to both, i.e. with some samples aligned to hg38, others to b37).
    3. I'm using gatk-4.0.12.0
    4. it will be difficult to produce the exact command in a way that is meaningful to you, but I can provide the scripts that I've used. That said the relevant commands are listed in the comments below, including the variable names

    Currently, I'm re-aligning one of the tumor-normal (exome) pairs with bwa-mem to see if that changes the outcome, however, this will take a little time as I have to put them through the whole pipeline.

  • pwaltmanpwaltman New York, NYMember
    edited January 4

    for alignment, this is the code section:
    ...
    elif [ "$ALIGN_ALG" == "aln" ]; then

                    r1sai=${LIBRARY}_${REF_VERSION}_R1.sai
                    r2sai=${LIBRARY}_${REF_VERSION}_R2.sai
    
                    ## generate the read1 .sai file
                    if [ ! -f $r1sai ]; then
    
                        echo "$r1sai not found, starting bwa-aln for $FASTQ1"
                      bwa aln \
                            -q 5 -o 2 -n 0.04 -t ${NUM_BWA_THREADS} \
                            $REFERENCE \
                            <($CATEXE $FASTQ1) \
                            > $r1sai
                   else
                        echo "found $r1sai, skipping to next step..."
                    fi
    
                    ## generate the read2 .sai file IF NECESSARY
                    if [ ! -z $FASTQ2 ]; then
    
                        ## now, generate $r2sai, if necessary
                        if [ ! -f $r2sai ]; then
                            echo "$r2sai not found, starting bwa-aln for $FASTQ2"
                            bwa aln \
                                -q 5 -o 2 -n 0.04 -t ${NUM_BWA_THREADS} \
                                $REFERENCE \
                                <($CATEXE $FASTQ2) \
                                > $r2sai
    
                        else
                            echo "found $r2sai, skipping to next step..."
                        fi
                    fi
    
                    ## now, generate the bam file
                    if [ ! -f ${OUTFILE} ]; then
                        echo "${OUTFILE} not found, starting bwa-sampe for $LIBRARY"
    
                        if [ -z $FASTQ2 ]; then
                            ## this is SINGLE-END MODE
                            bwa samse \
                                -P \
                                -r "${RGSTR}" \
                                $REFERENCE \
                                $r1sai \
                                <($CATEXE $FASTQ1) \
                                | samblaster --addMateTags \
                                | samtools view -b -S -u - \
                                | sambamba sort -t ${NUM_BWA_THREADS} -m 4G --tmpdir "${PWD}/tmp" -o $OUTFILE /dev/stdin
                        else
                            ## this is PAIRED-END MODE
                            bwa sampe \
                                -P \
                                -r "${RGSTR}" \
                                $REFERENCE \
                                $r1sai $r2sai \
                                <($CATEXE $FASTQ1) <($CATEXE $FASTQ2) \
                                | samblaster --addMateTags \
                                | samtools view -b -S -u - \
                                | sambamba sort -t ${NUM_BWA_THREADS} -m 4G --tmpdir "${PWD}/tmp" -o $OUTFILE /dev/stdin
                ...
    
  • pwaltmanpwaltman New York, NYMember
    edited January 4

    Base-recalibration is performed using the following script code segment (in another script). This is reused after merging (which is done with sambamba sort):

    INFILE="${BAM}"
    ##OUTFILE=$( echo $BAM | sed "s:bam$:recal_table:" )
    OUTFILE=${INFILE}.recal_table
    
    if [ ! -f $OUTFILE ]; then
        echo "generate the BQSR recal table"
        $GATK_EXE \
           BaseRecalibrator \
            --reference $REFERENCE \
            --input $INFILE \
            --known-sites $DBSNP \
            --known-sites $ONEKG_INDELS \
            --known-sites $MILLS_INDELS \
            --output $OUTFILE
       retval=$?
        if [ $retval -ne 0 ]; then
            echo "cmd to generate $OUTFILE was not zero but $retval"
            if [ -f $OUTFILE ]; then
                rm $OUTFILE
            fi
            exit 1
        fi
    else
        echo "found $OUTFILE"
    fi
    
    
    ## generate the recal bam file
    RECAL_TABLE="$OUTFILE"
    INFILE="${INFILE}"
    
    OUTFILE=$( echo ${INFILE} | sed "s:bam$:recal.bam:" )
    if [ ! -f $OUTFILE ]; then
        echo "generate the recal bam file"
        $GATK_EXE \
            ApplyBQSR \
            --reference $REFERENCE \
            --input $INFILE \
            --bqsr-recal-file $RECAL_TABLE \
            --output $OUTFILE
    
        retval=$?
        if [ $retval -ne 0 ]; then
            echo "cmd to generate $OUTFILE was not zero but $retval"
            if [ -f $OUTFILE ]; then
                rm $OUTFILE
            fi
            exit 6
        fi
    else
        echo "found $OUTFILE"
    fi
    
  • pwaltmanpwaltman New York, NYMember

    the lane/library-specific bams are re-aligned with the following code segment:
    if [ ! -f $OUTFILE ]; then
    sambamba merge -t ${NUM_BWA_THREADS} $OUTFILE $bams
    fi

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @pwaltman May I ask why you are using samples aligned to two different references? What are the advantages of that?

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @pwaltman

    Currently, I'm re-aligning one of the tumor-normal (exome) pairs with bwa-mem to see if that changes the outcome, however, this will take a little time as I have to put them through the whole pipeline.

    How did this work out for you? Maybe using bwa mem agains the same reference will solve this issue, please send a response when you get a chance.

  • pwaltmanpwaltman New York, NYMember

    As a follow-up, I do not believe that this error is related to the underlying alignment method that was used. I put one of the WXS samples that is getting this error during the FilterMutectCalls step, after I re-aligned it with bwa-mem, and reprocessed the bwa-mem-aligned sample using the steps outlined above.> @AdelaideR said:

    @pwaltman May I ask why you are using samples aligned to two different references? What are the advantages of that?

    It's a long and uninteresting story, but I did it for purely CYA purposes. I got dropped into an existing project that a former postdoc in the lab bailed out on when he quit the lab. He had aligned the samples to GRCh37, and I wanted to migrate the project to hg38. Since some of these samples are being used in another project, they also had to be aligned to GRCh37 for consistency with prior work. Hence the reason I needed to be able to align them to both references.

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    Hi @pwaltman - I just realized this is an error that has been identified for a bug fix (5 days ago), so the latest docker build should correct the error. Here is the thread about the issue

    Let me know if this does not correct the problem and I will follow up with the development team.

  • pwaltmanpwaltman New York, NYMember

    Ugh. Ok, well, I can try running it on my workstation, but our IT staff doesn't let use docker on our HPC. I'll pull the latest one, and will give it a try in a day or so.

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin
    edited January 11

    Have you considered signing up for Broad's Firecloud and using the free credits to get past this one piece? I can ask the development team about a workaround.

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @pwaltman it is also possible to download the docker locally and run just the one tool from the docker, using the following:

    sudo docker run broadinstitute/gatk ./gatk FilterMutectCalls
    
  • pwaltmanpwaltman New York, NYMember
    edited January 16

    Sorry I tuned out, but had a deadline. With some effort, I'm now getting a new error, which is the following:

    [January 16, 2019 4:11:23 AM UTC] org.broadinstitute.hellbender.tools.walkers.mutect.FilterMutectCalls done. Elapsed time: 0.03 minutes.
    Runtime.totalMemory()=1237319680
    java.lang.IllegalStateException: Key P_CONTAM found in VariantContext field INFO at chr1:13418 but this key isn't defined in the VCFHeader.  We require all VCFs to have complete VCF headers by default.
    

    However, the vcf I'm passing is one that gatk v4.0.12 generated. Do I now need to re-generate the vcf with the dockerized version of gatk in order to get this to work? That might work with the exomes, but the WGS samples I have take days to run and re-generate.

    Also, is there an ETA on the latest release? Using docker for this is a bit of a drag. More than a bit, actually.

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @pwaltman

    I think you will need to rerun the files before this command using the new dockerized version of gatk. Did you do a docker pull? So each docker pull should be treated as a version and workflows should be run on the same version of gatk when possible.

    The other option is to define P_CONTAM in the VCF headers by adding a line. The headers are at the beginning of the files. I found a possible definition in an older version of GATK.

    ##INFO=<ID=P_CONTAM,Number=1,Type=Float,Description="Posterior probability for alt allele to be due to contamination">
    

    Whether this is the right information for this version of GATK is not quite clear, so this is just a quick and easy way to do a workaround to stop that error from cropping up. But, true biological information may be lost or confused.

    Just for general information, an advantage of using the Firecloud is that when workflows are rerun, the files that have been created from commands that have not changed are cached and remembered, which cuts down the future runtimes when only one item is altered.

Sign In or Register to comment.