Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Mutect2 - java.lang.IllegalArgumentException: Cannot construct fragment from more than two reads

trying the latest version of Mutect2

java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms4G -Xmx9G -jar /home/tools/gatk/ Mutect2 --annotation ClippingRankSumTest --annotation DepthPerSampleHC --annotation MappingQualityRankSumTest --annotation MappingQualityZero --annotation QualByDepth --annotation ReadPosRankSumTest --annotation RMSMappingQuality --annotation FisherStrand --annotation MappingQuality --annotation DepthPerAlleleBySample --annotation Coverage -R hs38DH.fa -I SynSet3_N.dedup.sorted.indel.dedup.bam -I SynSet3_T.dedup.sorted.indel.dedup.bam -normal SynSet3_N --panel-of-normals 1000g_pon.hg38.vcf.gz --germline-resource af-only-gnomad.hg38.vcf.gz --f1r2-tar-gz SynSet3.sg00.dedup.sorted.indel.dedup.f1f2.tar.gz -O SynSet3.sg00.dedup.sorted.indel.dedup.vcf --bam-output SynSet3.sg00.dedup.sorted.indel.dedup.mt2.bam -L 00.interval_list --native-pair-hmm-threads 1

I got the following error.

java.lang.IllegalArgumentException: Cannot construct fragment from more than two reads
        at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:725)
        at org.broadinstitute.hellbender.utils.read.Fragment.create(Fragment.java:36)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
        at org.broadinstitute.hellbender.utils.genotyper.AlleleLikelihoods.groupEvidence(AlleleLikelihoods.java:595)
        at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.callMutations(SomaticGenotypingEngine.java:93)
        at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.callRegion(Mutect2Engine.java:251)
        at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.apply(Mutect2.java:320)
        at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:308)
        at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:281)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
        at org.broadinstitute.hellbender.Main.main(Main.java:292)

The same instruction works with Mutect2 May you explain what happened?
Thank you


  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @andresguarahino

    I am looking into this and will get back to you shortly.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @andresguarahino and @micknudsen

    In 4.1.4 the default is to combine paired reads into fragments in order to force evidence from mates to travel together. (If a read has no mate it makes a fragment out of the single read). The error occurs when more than two reads share the same name. This could happen for a malformed VCF, as we have seen from some faulty adapter-trimming pipelines, but it could also happen for certain types of uMI sequencing or bams in which duplicates have not been removed. Regardless of the cause, the argument --independent-mates reverts to the old behavior.

  • micknudsenmicknudsen DenmarkMember ✭✭

    Hi @bhanuGandham,

    Thanks for the suggestion. I am testing it right now. The only thing that differs in my implementation of the Best Practices pipeline is that I start from FASTQ instead of uBAM. Could it be that MergeBamAlignment somehow takes care of problematic reads? I run Mutect2 separately on each chromosome, and only very few of those fail.

  • Hi @bhanuGandham and @micknudsen,

    thanks for your support. Actually, I am following the Best Practices with the only difference in the adapter trimming step where I use BBduk: with the trimmed FASTQ files I create the uBAM file to use as input for MergeBamAlignment (to merge the aligned reads with the unmapped ones). The ValidateSamFile command returns 'No errors found' for the BAM files used with Mutect2.

    As asked by @micknudsen, I'd like to know if MergeBamAlignment takes care of those problematic reads and (if yes) why it may not have worked in my case.

  • micknudsenmicknudsen DenmarkMember ✭✭

    Just a quick update. Adding the --independent-mates option solved the problem for me. I am still puzzled, though, why it is needed. I will try to see if I can increase the verbosity and figure out exactly what is causing it.

  • micknudsenmicknudsen DenmarkMember ✭✭

    @bhanuGandham Do you know a way to identify the reads causing problems? I have tried increasing verbosity to DEBUG, but I still only get the short error message. I would like to understand independent mates are and where they come from. Thanks.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @andresguarahino and @micknudsen

    As mentioned above, one of the reasons this might happen is if you are using bams in which duplicates have not been removed. Have you preprocessed you data as shown here: https://software.broadinstitute.org/gatk/best-practices/workflow?id=11165

    If not, can you please try that and see if that resolves the issue. If you have, then please share the steps you followed to preprocess your data.

  • micknudsenmicknudsen DenmarkMember ✭✭

    Hi @bhanuGandham ,

    I follow a usual Best Practices workflow. First adapter trimming and mapping:

    seqtk mergepe <(zcat {r1_file}) <(zcat {r2_file}) \
        | \
        cutadapt --cores={options['cores']} \
                 --interleaved \
                 --minimum-length=20 \
                 --error-rate=0.1 \
                 --quality-cutoff=20 \
                 --overlap=1 \
                 -a ${{adapter}} \
                 -A ${{adapter}} \
                 - \
        | \
        bwa mem -p -Y -K 100000000 -t {options['cores']} -R "{read_group}" {genome_fasta} - \
        | \
        gatk --java-options '-Xmx{int(options['memory'][:-1]) - 1}g -Djava.io.tmpdir='"${{tmp_dir}}"'' \
            SortSam \
            -I /dev/stdin \
            -O ${{scratch_output_bam_file}} \
            -SO coordinate

    This is then followed by MarkDuplicates:

        gatk --java-options '-Xmx{int(options['memory'][:-1]) - 1}g -Djava.io.tmpdir='"${{tmp_dir}}"'' \
            MarkDuplicates \
            {input_string} \
            --OUTPUT ${{scratch_markdup_bam_file}} \
            --METRICS_FILE ${{scratch_markdup_metrics_file}} \
            --OPTICAL_DUPLICATE_PIXEL_DISTANCE {optical_duplicate_pixel_distance} \
            --ASSUME_SORTED true

    Then comes BQSR and finally Mutect2.

  • Hi @bhanuGandham,
    for data preprocessing I I took inspiration (and the instructions) from the five-dollar-genome -analysis pipeline. From (bbduk-)adapter-trimmed FASTQ files converted in an unmapped BAM file, I follow the following steps:

    • SamToFastq | BwaMem | MergeBamAlignment
    • MarkDuplicates | SortSam | SetNmMdAndUqTags
    • IndelRealignment and/or BQSR are optional
  • cboursnellcboursnell Member
    I took my bam files that were giving this error and removed the secondary alignments (samtools view -F 256) and ran them again through Mutect2 4.1.4. I didn't get this error now. Does that mean that Mutect2 uses secondary alignments in its calling algorithm normally? BWA generates secondary alignments by default and doesn't have an option to turn them off.

    Issue · Github
    by bhanuGandham

    Issue Number
    Last Updated
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited October 21

    Hi all,

    I am going to test this and get back to you. This might take a few days. In the mean time please follow @cboursnell suggestion if that works for you and I will reach out to the developers and get back to you.

    Thank you @cboursnell for your input! Greatly appreciated!

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi All,

    So looks like this is an edge case we have not accounted for. If the secondary alignment is on the same assembly region as the primary then we see this issue. I am creating a bug report for it and you can track its progress here: https://github.com/broadinstitute/gatk/issues/6230

Sign In or Register to comment.