To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Does GATK 4 support multiple bam files as input?

In the command line help message, it says

--input,-I:String BAM/SAM/CRAM file containing reads This argument must be specified at least once.

However, if we actually give multiple input files, it says

org.broadinstitute.hellbender.exceptions.UserException: Sorry, we only support a single reads input for spark tools for now.

On the other hand, if we specify the input parameter as the folder containing all partial bam files, it actually works. Could you explain how this feature works now? We are using GATK 4 master branch, commit b82b5b6c5cbef8973b373edfb314cf42bea5eb1a, with Spark 2.0.2.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi @blaok, different tools have different requirements. Some tools allow multiple -I inputs, but some do not. Which tool are you trying to run and what is your command line?

  • blaokblaok Member

    Hi Geraldine,

    Thanks for asking my question. We are trying to run ReadsPipelineSpark and HaplotypeCallerSpark. Our command line looks like this:

    gatk-launch \
        ReadsPipelineSpark \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00000.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00001.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00002.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00003.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00004.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00005.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00006.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00007.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00008.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00009.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00010.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00011.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00012.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00013.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00014.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00015.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00016.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00017.bam \
        -I hdfs://ip-172-31-2-45:9000/user/blaok/ERR000097.sorted.bam/part-r-00018.bam \
        -R hdfs://ip-172-31-2-45:9000/genome/ref/human_g1k_v37.2bit \
        -O ~/get/ERR000097.after-bqsr.bam \
        --knownSites ~/get/dbsnp_138.b37.excluding_sites_after_129.vcf \
        --shardedOutput false \
        --emit_original_quals \
        --duplicates_scoring_strategy SUM_OF_BASE_QUALITIES \
        -- \
        --sparkRunner SPARK \
        --driver-memory 60G \
        --executor-memory 60G \
        --executor-cores 16 \
        --num-executors 2 \
        --sparkMaster spark://ip-172-31-78-182:7077

    and we get error looking like this

    A USER ERROR has occurred: Sorry, we only support a single reads input for spark tools for now.
    org.broadinstitute.hellbender.exceptions.UserException: Sorry, we only support a single reads input for spark tools for now.
            at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.initializeReads(
            at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.initializeToolInputs(
            at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(
            at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(
            at org.broadinstitute.hellbender.Main.mainEntry(
            at org.broadinstitute.hellbender.Main.main(
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(
            at java.lang.reflect.Method.invoke(
            at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
            at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
            at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
            at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
            at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

    The HaplotypeCallerSpark works in a similar way.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator


    Let me confirm with the developers if Spark tools do not accept more than one input BAM.


    Issue · Github
    by Sheila

    Issue Number
    Last Updated
    Closed By
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    Hi again,

    I have confirmation that the Spark tools and pipelines (which are all still experimental at this point) are restricted to a single reads input, at least for now.


  • Hi Sheila,

    We are using HaplotypeCallerSpark in GATK4 and we would be very much interested in processing multiple bam files, too. This functionality is crucial for our pipeline.

    Do you think processing of multiple bam files will be possible on HaplotypeCallerSpark anytime soon? Is it on the roadmap?

    What is the plan for moving HaplotypeCallerSpark from beta version to an official production version?

    Many thanks for your help, much appreciated!!!


  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    Hi Ivo,

    It seems there are plans for this to be done in the second quarter of this year. You can keep track of the issue here.


Sign In or Register to comment.