FireCloud Error: processing-for-variant-discovery-gatk4

I tried to run the method "processing-for-variant-discovery-gatk4" cloned from the featured workspace "Germline-SNPs-Indels-GATK4-b37" on some of my samples. I only made 2 changes to the configuration: ref_name changed from "hg38" to "b37", and sample_name changed from "NA12878" to this.sample_id. I got the following error:

"Failed to evaluate 'PreProcessingForVariantDiscovery_GATK4.flowcell_unmapped_bams' (reason 1 of 1): Evaluating read_lines(flowcell_unmapped_bams_list) failed: java.io.IOException: Could not read from gs://fc-d2239f7f-d4b9-45de-a3bb-49a002f0bc1e/uBAM/N0.unmapped.bam: File gs://fc-d2239f7f-d4b9-45de-a3bb-49a002f0bc1e/uBAM/N0.unmapped.bam is larger than 10000000 Bytes. Maximum read limits can be adjusted in the configuration under system.input-read-limits."

I don’t see any variable named system.input-read-limits in the method configuration. Can you please help me out?

Also, my samples are from whole exome sequencing instead of whole genome sequencing, do I need to make any other changes to the configuration?

Best Answer

Answers

  • bshifawbshifaw moonMember, Broadie, Moderator admin

    Hi @anshuman

    We don't currently make available a workflow that specifically handles exomes, though there is one in development due to some user interest . If you would like to modify the workflow we suggest reading the following docuemnt on restricting your analysis to specific intervals .

    The error above looks like its having trouble handling your input file. What are you providing as input? The workflow is expecting a file with a list of ubams like so:
    gs://gatk-test-data/wgs_ubam/NA12878_24RG/NA12878_24RG_small.txt

    gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HJYFJ.4.NA12878.downsampled.query.sorted.unmapped.bam
    gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HJYFJ.5.NA12878.downsampled.query.sorted.unmapped.bam
    gs://gatk-test-data/wgs_ubam/NA12878_24RG/small/HJYFJ.6.NA12878.downsampled.query.sorted.unmapped.bam
    

    The workflow is not expecting an unmapped bam.

  • anshumananshuman NJMember

    Hi @bshifaw

    Thanks for your reply. I used a list of ubams as input like you instructed, and the workflow ran for a few hours, but then failed. There were 17 error messages similar to this:

    message: Task PreProcessingForVariantDiscovery_GATK4.BaseRecalibrator:0:1 failed. Job exit code 2. Check gs://fc-d2239f7f-d4b9-45de-a3bb-49a002f0bc1e/9fc7e241-ca26-4495-8487-e8553c85f92f/PreProcessingForVariantDiscovery_GATK4/3ee29b16-c0df-458d-af92-707200213452/call-BaseRecalibrator/shard-0/stderr for more information. PAPI error code 5. 10: Failed to delocalize files: failed to copy the following files: "/mnt/local-disk/N0.b37.recal_data.csv -> gs://fc-d2239f7f-d4b9-45de-a3bb-49a002f0bc1e/9fc7e241-ca26-4495-8487-e8553c85f92f/PreProcessingForVariantDiscovery_GATK4/3ee29b16-c0df-458d-af92-707200213452/call-BaseRecalibrator/shard-0/N0.b37.recal_data.csv (cp failed: gsutil -q -m cp -L /var/log/google-genomics/out.log /mnt/local-disk/N0.b37.recal_data.csv gs://fc-d2239f7f-d4b9-45de-a3bb-49a002f0bc1e/9fc7e241-ca26-4495-8487-e8553c85f92f/PreProcessingForVariantDiscovery_GATK4/3ee29b16-c0df-458d-af92-707200213452/call-BaseRecalibrator/shard-0/N0.b37.recal_data.csv, command failed: CommandException: No URLs matched: /mnt/local-disk/N0.b37.recal_data.csv\nCommandException: 1 file/object could not be transferred.\n)"

    Attached is a screenshot of the stderr file mentioned in the above error message. Can you please advice how to proceed?

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    Hi @anshuman

    Your first error message which mentions: Maximum read limits can be adjusted in the configuration under system.input-read-limits is actually referring to a configuration file in Cromwell, which you do not have the ability to adjust as a user of FireCloud. This error message indicates to me that it is likely something else went wrong and this is where it stopped you.

    Your second error message tells you to look at the stderr file for more information, which is the correct thing to do. There is an error in your use of a GATK tool, so it is reported in that stderr file.

    The third error message is in the stderr file itself, and the important line is: A USER ERROR has ocurred: Read <numbers> is malformed: The input .bam file contains reads with no platform information. First observed at read with name: <name>

    The particular file that it mentions in that error is the one that has at least one malformed read. You should check your bam file to fix the issue where it didn't have platform information. If you don't know how to do that, I would recommend inquiring on the GATK forum, as my colleagues there may have more tips for you.

  • anshumananshuman NJMember

    Hi @KateN

    Thanks for your reply. Based on your reply, I assumed that something is wrong with the ubam files, and started the analysis from scratch. I ran the method gatk/paired-fastq-to-unmapped-bam, followed by the method gatk/processing-for-variant-discovery-gatk4, followed by the method gatk/haplotypecaller-gvcf-gatk4, and all the steps succeeded without any error message.

    However, when I checked the analysis ready bam files using the method gatk/validate-bam, all of them failed with error messages like this: "message: Job ValidateBamsWf.ValidateBAM:0:1 exited with return code 4 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details." Can you please explain what went wrong?

    I am attaching the screenshot of a stderr file just in case it helps.

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    Your instincts were correct that you should open the stderr file any time you get that message. I'm going to loop in a colleague of mine to take a look, as I'm not familiar with Picard's error messaging.

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    The error message you have in your stderr doesn't have any information on the error that could have occurred, as my colleague has confirmed. Could you share your workspace using the SHARE button in FireCloud with [email protected]?

    I will also need the following information:
    1. Your workspace name
    2. The submission ID for the analysis you launched where you encountered the above error message

  • anshumananshuman NJMember
    Accepted Answer

    Repeating the analysis after changing platform_name from Illumina_NextSeq_500_550 to illumina in the readgroup.list file resolved the issue.

Sign In or Register to comment.