Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Technical support request: SortAndFixTags Failiure

Hello,

My FireCloud analysis using method PreProcessingForVariantDiscovery_GATK4 (copied from Method repository: Snapshot 8), keeps failing during the SoftAndFixTags.log step. Failure seems to be due to copying outputs to Google Buckets. All other steps prior to SoftAndFixTags work without error (see attached screenshot). Would greatly appreciate any help on why I am experiencing this failure.

I have shared the workspace I am running in 'MC-preprocessGATK4' with [email protected]recloud.org, and am copying content from SoftAndFixTags.log below. Happy to provide any additional info required.

Thank you in advance,

2019/05/24 16:12:03 Starting container setup.
2019/05/24 16:12:10 Done container setup.
2019/05/24 16:12:15 Starting localization.
2019/05/24 16:12:21 Localizing input gs://fc-c9d6903a-b0e5-4e53-a36c-99cfd31f9a53/c9ef0454-8b5a-4647-b527-89c260be19e3/PreProcessingForVariantDiscovery_GATK4/4eae049a-317b-41a6-8845-9a5b9c645ce0/call-MarkDuplicates/hd753rep1.hg38.aligned.unsorted.duplicates_marked.bam -> /cromwell_root/fc-c9d6903a-b0e5-4e53-a36c-99cfd31f9a53/c9ef0454-8b5a-4647-b527-89c260be19e3/PreProcessingForVariantDiscovery_GATK4/4eae049a-317b-41a6-8845-9a5b9c645ce0/call-MarkDuplicates/hd753rep1.hg38.aligned.unsorted.duplicates_marked.bam
2019/05/24 16:13:01 Localizing input gs://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta -> /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta
2019/05/24 16:14:03 Localizing input gs://broad-references/hg38/v0/Homo_sapiens_assembly38.dict -> /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.dict
2019/05/24 16:14:16 Localizing input gs://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai -> /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai
2019/05/24 16:14:23 Localizing input gs://fc-c9d6903a-b0e5-4e53-a36c-99cfd31f9a53/c9ef0454-8b5a-4647-b527-89c260be19e3/PreProcessingForVariantDiscovery_GATK4/4eae049a-317b-41a6-8845-9a5b9c645ce0/call-SortAndFixTags/script -> /cromwell_root/script
2019/05/24 16:14:30 Done localization.
2019/05/24 16:14:36 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint= broadinstitute/[email protected]:039564efc566e62526627b023847d92fd0c98676988f99cbb1f4135509ecb1f8 /bin/bash /cromwell_root/script
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.5f968c15
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.5f968c15
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.

[Fri May 24 16:14:45 UTC 2019] picard.sam.SortSam INPUT=/cromwell_root/fc-c9d6903a-b0e5-4e53-a36c-99cfd31f9a53/c9ef0454-8b5a-4647-b527-89c260be19e3/PreProcessingForVariantDiscovery_GATK4/4eae049a-317b-41a6-8845-9a5b9c645ce0/call-MarkDuplicates/hd753rep1.hg38.aligned.unsorted.duplicates_marked.bam OUTPUT=/dev/stdout SORT_ORDER=coordinate CREATE_INDEX=false CREATE_MD5_FILE=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Fri May 24 16:14:45 UTC 2019] Executing as [email protected] on Linux 4.14.119+ amd64; OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2~bpo8+1-b14; Deflater: Intel; Inflater: Intel; Picard version: 2.10.5-SNAPSHOT
[Fri May 24 16:14:45 UTC 2019] picard.sam.SetNmAndUqTags INPUT=/dev/stdin OUTPUT=hd753rep1.hg38.aligned.duplicate_marked.sorted.bam CREATE_INDEX=true CREATE_MD5_FILE=true REFERENCE_SEQUENCE=/cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta IS_BISULFITE_SEQUENCE=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Fri May 24 16:14:45 UTC 2019] Executing as [email protected] on Linux 4.14.119+ amd64; OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2~bpo8+1-b14; Deflater: Intel; Inflater: Intel; Picard version: 2.10.5-SNAPSHOT
INFO 2019-05-24 16:16:29 SortSam Read 10,000,000 records. Elapsed time: 00:01:42s. Time for last 10,000,000: 102s. Last read position: chr4:54,280,302
INFO 2019-05-24 16:18:10 SortSam Read 20,000,000 records. Elapsed time: 00:03:23s. Time for last 10,000,000: 100s. Last read position: chr8:92,017,216
INFO 2019-05-24 16:19:31 SortSam Finished reading inputs, merging and writing to output now.
INFO 2019-05-24 16:23:56 SortSam Wrote 10,000,000 records from a sorting collection. Elapsed time: 00:09:09s. Time for last 10,000,000: 262s. Last read position: chr6:106,107,103
[Fri May 24 16:39:26 UTC 2019] picard.sam.SetNmAndUqTags done. Elapsed time: 24.72 minutes.
Runtime.totalMemory()=506986496
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.NullPointerException
at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:517)
at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:455)
at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:445)
at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:194)
at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:326)
at java.io.DataInputStream.read(DataInputStream.java:149)
at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:404)
at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:380)
at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:366)
at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:209)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:829)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:803)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:797)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:765)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:576)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:548)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at picard.sam.SetNmMdAndUqTags.doWork(SetNmMdAndUqTags.java:107)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:228)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
2019/05/24 16:39:36 Starting delocalization.
2019/05/24 16:39:41 Delocalizing output /cromwell_root/hd753rep1.hg38.aligned.duplicate_marked.sorted.bam -> gs://fc-c9d6903a-b0e5-4e53-a36c-99cfd31f9a53/c9ef0454-8b5a-4647-b527-89c260be19e3/PreProcessingForVariantDiscovery_GATK4/4eae049a-317b-41a6-8845-9a5b9c645ce0/call-SortAndFixTags/hd753rep1.hg38.aligned.duplicate_marked.sorted.bam
2019/05/24 16:39:59 Delocalizing output /cromwell_root/hd753rep1.hg38.aligned.duplicate_marked.sorted.bai -> gs://fc-c9d6903a-b0e5-4e53-a36c-99cfd31f9a53/c9ef0454-8b5a-4647-b527-89c260be19e3/PreProcessingForVariantDiscovery_GATK4/4eae049a-317b-41a6-8845-9a5b9c645ce0/call-SortAndFixTags/hd753rep1.hg38.aligned.duplicate_marked.sorted.bai
2019/05/24 16:40:06 Delocalizing output /cromwell_root/hd753rep1.hg38.aligned.duplicate_marked.sorted.bam.md5 -> gs://fc-c9d6903a-b0e5-4e53-a36c-99cfd31f9a53/c9ef0454-8b5a-4647-b527-89c260be19e3/PreProcessingForVariantDiscovery_GATK4/4eae049a-317b-41a6-8845-9a5b9c645ce0/call-SortAndFixTags/hd753rep1.hg38.aligned.duplicate_marked.sorted.bam.md5
2019/05/24 16:40:08 rm -f $HOME/.config/gcloud/gce && gsutil cp /cromwell_root/hd753rep1.hg38.aligned.duplicate_marked.sorted.bam.md5 gs://fc-c9d6903a-b0e5-4e53-a36c-99cfd31f9a53/c9ef0454-8b5a-4647-b527-89c260be19e3/PreProcessingForVariantDiscovery_GATK4/4eae049a-317b-41a6-8845-9a5b9c645ce0/call-SortAndFixTags/ failed
CommandException: No URLs matched: /cromwell_root/hd753rep1.hg38.aligned.duplicate_marked.sorted.bam.md5
2019/05/24 16:40:08 Waiting 5 seconds and retrying
2019/05/24 16:40:14 rm -f $HOME/.config/gcloud/gce && gsutil cp /cromwell_root/hd753rep1.hg38.aligned.duplicate_marked.sorted.bam.md5 gs://fc-c9d6903a-b0e5-4e53-a36c-99cfd31f9a53/c9ef0454-8b5a-4647-b527-89c260be19e3/PreProcessingForVariantDiscovery_GATK4/4eae049a-317b-41a6-8845-9a5b9c645ce0/call-SortAndFixTags/ failed
CommandException: No URLs matched: /cromwell_root/hd753rep1.hg38.aligned.duplicate_marked.sorted.bam.md5
2019/05/24 16:40:14 Waiting 5 seconds and retrying
2019/05/24 16:40:20 rm -f $HOME/.config/gcloud/gce && gsutil cp /cromwell_root/hd753rep1.hg38.aligned.duplicate_marked.sorted.bam.md5 gs://fc-c9d6903a-b0e5-4e53-a36c-99cfd31f9a53/c9ef0454-8b5a-4647-b527-89c260be19e3/PreProcessingForVariantDiscovery_GATK4/4eae049a-317b-41a6-8845-9a5b9c645ce0/call-SortAndFixTags/ failed
CommandException: No URLs matched: /cromwell_root/hd753rep1.hg38.aligned.duplicate_marked.sorted.bam.md5
2019/05/24 16:40:25 Delocalizing output /cromwell_root/stdout -> gs://fc-c9d6903a-b0e5-4e53-a36c-99cfd31f9a53/c9ef0454-8b5a-4647-b527-89c260be19e3/PreProcessingForVariantDiscovery_GATK4/4eae049a-317b-41a6-8845-9a5b9c645ce0/call-SortAndFixTags/stdout
2019/05/24 16:40:32 Delocalizing output /cromwell_root/stderr -> gs://fc-c9d6903a-b0e5-4e53-a36c-99cfd31f9a53/c9ef0454-8b5a-4647-b527-89c260be19e3/PreProcessingForVariantDiscovery_GATK4/4eae049a-317b-41a6-8845-9a5b9c645ce0/call-SortAndFixTags/stderr
2019/05/24 16:40:40 Delocalizing output /cromwell_root/rc -> gs://fc-c9d6903a-b0e5-4e53-a36c-99cfd31f9a53/c9ef0454-8b5a-4647-b527-89c260be19e3/PreProcessingForVariantDiscovery_GATK4/4eae049a-317b-41a6-8845-9a5b9c645ce0/call-SortAndFixTags/rc

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited May 24

    Hi @omw90

    From the error it looks like there could be an issue with the bam file. Lets start with validating your bam file by following steps described in this doc: https://software.broadinstitute.org/gatk/documentation/article?id=11095. The doc also describes solutions to fix erroneous bam file.

    Let me know if this fixes the issue.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @bhanuGandham , thanks so much for your quick response to my question.

    I have performed ValidateSamFile and no errors are reported for my BAM file. As some additional background, I am generating this BAM file from FASTQ files, following the instructions in the how to doc (https://software.broadinstitute.org/gatk/documentation/article?id=6484). Here is the output when I print read group tags.

    $ samtools view -H 10ng-HD753-rep1-High-FFPE-P1_clean.bam

    @HD VN:1.6 SO:queryname

    @RG ID:HD753rep1 SM:10ng-HD753-rep1-High-FFPE-P1 LB:NextSeq_CCP_Horizon PL:Illumina CN:DM DT:2019-05-20T20:00:00-0400

    I also tried generating a SAM file first when I apply FastqToSam to my FASTQ files, then convert this to a BAM file using: $ samtools view -b -S ${OUT}.sam > ${OUT}_sorted.bam. This BAM also passes ValidateSamFile without errors, but also fail in the PreProcessingForVariantDiscovery_GATK4 SortandFixTags step, however with no clear error as to why. Below is the output from SortAndFixTags.log when I run this BAM file created as a SAM file first.

    Any advice on how to proceed would be greatly appreciated.

    Thanks again for your help.

    2019/05/25 03:34:37 Starting container setup.

    2019/05/25 03:34:44 Done container setup.

    2019/05/25 03:34:49 Starting localization.

    2019/05/25 03:34:55 Localizing input gs://fc-db2ec1e5-5b5e-4fc8-b493-09ab42a1c150/797d2ec8-b1f1-4c47-8436-610e0c6854ef/PreProcessingForVariantDiscovery_GATK4/2521ec7b-11bf-4be2-847f-a855a6a350da/call-MarkDuplicates/hd753rep2.hg38.aligned.unsorted.duplicates_marked.bam -> /cromwell_root/fc-db2ec1e5-5b5e-4fc8-b493-09ab42a1c150/797d2ec8-b1f1-4c47-8436-610e0c6854ef/PreProcessingForVariantDiscovery_GATK4/2521ec7b-11bf-4be2-847f-a855a6a350da/call-MarkDuplicates/hd753rep2.hg38.aligned.unsorted.duplicates_marked.bam

    2019/05/25 03:35:41 Localizing input gs://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta -> /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta

    2019/05/25 03:36:40 Localizing input gs://broad-references/hg38/v0/Homo_sapiens_assembly38.dict -> /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.dict

    2019/05/25 03:36:53 Localizing input gs://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai -> /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai

    2019/05/25 03:37:01 Localizing input gs://fc-db2ec1e5-5b5e-4fc8-b493-09ab42a1c150/797d2ec8-b1f1-4c47-8436-610e0c6854ef/PreProcessingForVariantDiscovery_GATK4/2521ec7b-11bf-4be2-847f-a855a6a350da/call-SortAndFixTags/script -> /cromwell_root/script

    2019/05/25 03:37:08 Done localization.

    2019/05/25 03:37:14 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint= broadinstitute/[email protected]:039564efc566e62526627b023847d92fd0c98676988f99cbb1f4135509ecb1f8 /bin/bash /cromwell_root/script

    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.bdfe9fac

    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.bdfe9fac

    ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.

    [Sat May 25 03:37:24 UTC 2019] picard.sam.SortSam INPUT=/cromwell_root/fc-db2ec1e5-5b5e-4fc8-b493-09ab42a1c150/797d2ec8-b1f1-4c47-8436-610e0c6854ef/PreProcessingForVariantDiscovery_GATK4/2521ec7b-11bf-4be2-847f-a855a6a350da/call-MarkDuplicates/hd753rep2.hg38.aligned.unsorted.duplicates_marked.bam OUTPUT=/dev/stdout SORT_ORDER=coordinate CREATE_INDEX=false CREATE_MD5_FILE=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false

    [Sat May 25 03:37:24 UTC 2019] Executing as [email protected] on Linux 4.14.119+ amd64; OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2~bpo8+1-b14; Deflater: Intel; Inflater: Intel; Picard version: 2.10.5-SNAPSHOT

    [Sat May 25 03:37:24 UTC 2019] picard.sam.SetNmAndUqTags INPUT=/dev/stdin OUTPUT=hd753rep2.hg38.aligned.duplicate_marked.sorted.bam CREATE_INDEX=true CREATE_MD5_FILE=true REFERENCE_SEQUENCE=/cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta IS_BISULFITE_SEQUENCE=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false

    [Sat May 25 03:37:24 UTC 2019] Executing as [email protected] on Linux 4.14.119+ amd64; OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2~bpo8+1-b14; Deflater: Intel; Inflater: Intel; Picard version: 2.10.5-SNAPSHOT

    INFO 2019-05-25 03:39:04 SortSam Read 10,000,000 records. Elapsed time: 00:01:39s. Time for last 10,000,000: 99s. Last read position: chr11:118,480,235

    INFO 2019-05-25 03:40:46 SortSam Read 20,000,000 records. Elapsed time: 00:03:21s. Time for last 10,000,000: 102s. Last read position: chr9:5,054,676

    INFO 2019-05-25 03:42:17 SortSam Finished reading inputs, merging and writing to output now.

    INFO 2019-05-25 03:46:52 SortSam Wrote 10,000,000 records from a sorting collection. Elapsed time: 00:09:27s. Time for last 10,000,000: 273s. Last read position: chr6:69,324,807

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @omw90

    Looks like you are using an older version of Picard that does not explain the cause for this error. Could you please download the latest version of Picard, run SetNmMdAndUqTags on the bam file locally, and maybe set VERSBOSITY to DEBUG. This should at least avoid the NullPointerException bug and any potential problems with the environment, so we can find out the origin of the problem with the bam.

  • omw90omw90 Member
    edited May 30
    @bhanuGandham OK, thanks for the suggestion. I have downloaded the sorted and aligned .bam that is failing in the SortandFixTags step from my firecloud workspace, and run SetNmMdAndUqTags with this file locally using the updated version of Picard.

    This seems to output a 'fixed' file, although I am not sure how to interpret the output. Any thoughts you have would be great. Since I'm not able to post links I cant include my output here, so will send to the support email.

    Thanks again
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi Bhanu,

    Apologies to repsond here again, although I wasn’t able to include my output from picard in my response in the forum. I am including it below.

    Thanks again,

    Owen

    $ picard SetNmMdAndUqTags R=Homo_sapiens_assembly38.fasta I=SortAndFixTags_hd753rep1.hg38.aligned.duplicate_marked.sorted.bam O=fixed.bam VERBOSITY=DEBUG

    bash: picard: command not found...

    [[email protected] reference_data]$ conda activate GATK_BP

    (GATK_BP) [[email protected] reference_data]$ picard SetNmMdAndUqTags R=Homo_sapiens_assembly38.fasta I=SortAndFixTags_hd753rep1.hg38.aligned.duplicate_marked.sorted.bam O=fixed.bam VERBOSITY=DEBUG

    INFO 2019-05-30 10:30:05 SetNmMdAndUqTags

    ********** NOTE: Picard's command line syntax is changing.


    ********** For more information, please see:

    ********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


    ********** The command line looks like this in the new syntax:


    ********** SetNmMdAndUqTags -R Homo_sapiens_assembly38.fasta -I SortAndFixTags_hd753rep1.hg38.aligned.duplicate_marked.sorted.bam -O fixed.bam -VERBOSITY DEBUG


    10:30:06.159 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/dartfs-hpc/rc/home/d/d41294d/.conda/envs/GATK_BP/share/picard-2.20.2-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so

    [Thu May 30 10:30:06 EDT 2019] SetNmMdAndUqTags INPUT=SortAndFixTags_hd753rep1.hg38.aligned.duplicate_marked.sorted.bam OUTPUT=fixed.bam VERBOSITY=DEBUG REFERENCE_SEQUENCE=Homo_sapiens_assembly38.fasta IS_BISULFITE_SEQUENCE=false SET_ONLY_UQ=false QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false

    [Thu May 30 10:30:06 EDT 2019] Executing as [email protected] on Linux 3.10.0-957.5.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_192-b01; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.20.2-SNAPSHOT

    WARNING: BAM index file /dartfs-hpc/rc/lab/G/GSR_Active/Core_files_and_folders/gatk_workflows/reference_data/SortAndFixTags_hd753rep1.hg38.aligned.duplicate_marked.sorted.bai is older than BAM /dartfs-hpc/rc/lab/G/GSR_Active/Core_files_and_folders/gatk_workflows/reference_data/SortAndFixTags_hd753rep1.hg38.aligned.duplicate_marked.sorted.bam

    DEBUG 2019-05-30 10:30:06 BlockCompressedOutputStream Using deflater: IntelDeflater

    [Thu May 30 10:32:28 EDT 2019] picard.sam.SetNmMdAndUqTags done. Elapsed time: 2.37 minutes.

    Runtime.totalMemory()=966262784

    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp

    Exception in thread "main" htsjdk.samtools.FileTruncatedException: Premature end of file: /dartfs-hpc/rc/lab/G/GSR_Active/Core_files_and_folders/gatk_workflows/reference_data/SortAndFixTags_hd753rep1.hg38.aligned.duplicate_marked.sorted.bam

       at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:530)
    
       at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
    
       at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
    
       at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
    
       at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:331)
    
       at java.io.DataInputStream.read(DataInputStream.java:149)
    
       at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:421)
    
       at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:394)
    
       at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:380)
    
       at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:282)
    
       at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:866)
    
       at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:840)
    
       at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:834)
    
       at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:802)
    
       at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:574)
    
       at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:553)
    
       at java.util.Iterator.forEachRemaining(Iterator.java:116)
    
       at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    
       at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    
       at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    
       at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    
       at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    
       at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    
       at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    
       at picard.sam.SetNmMdAndUqTags.doWork(SetNmMdAndUqTags.java:141)
    
       at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
    
       at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    
       at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
    
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @omw90

    I checked with the dev team and this is what they said. The error log shows that the input bam is truncated. The most plausible scenario is that the process generating this bam ran out of disk space (or crashed for some other reason) while generating it. SetNmMdAndUqTags is processing the bam and producing output, then encounters the error and leaves another truncated bam as output. Our suggestions at this point are:
    1) Double check on the downloaded local bam that it passes ValidateSamFile (and try setting
    --MAX_OUTPUT=2147483647
    ). If it passes, maybe also try
    samtools quickcheck
    and
    samtools count -c
    just to see if they are okay.
    2) Delete the bam index file and recreate it with samtools index
    3) If all those checks pass and after recreating the index this truncation error message still occurs; then we are potentially in a scenario where there is a problem with the tools and not the data. It would be helpful at this point if you could provide us with the bam file, and here is a link to how.

  • omw90omw90 Member
    edited June 7
    Hello @bhanuGandham

    Thanks for the suggestions. Could you advise at which step I should test the .bam file with ValidateSamFile. My input unmapped BAM file still has No errors found by ValidateSamFile (including setting --MAX_OUTPUT=2147483647). samtools quickcheck for this file produces:

    10ng-HD753-rep2-High-FFPE-P1_picard_202.unmapped.bam had no targets in header.

    However when I print the header, with:
    (GATK_BP) [[email protected] reference_data]$ samtools view -H 10ng-HD753-rep2-High-FFPE-P1_picard_202.unmapped.bam

    OUTPUT:
    @HD VN:1.6 SO:queryname
    @RG ID:HD753rep2 SM:10ng-HD753-rep2-High-FFPE-P1 LB:Nextseq550_CCP_Horizon_only PL:Illumina PU:001 CN:dm DT:1969-12-31T19:00:00-0500

    Is there something wrong with my header perhaps?

    I was also not able to find the 'count' function within the samtools package. Am I missing something here?

    For the BAM file outputted from the MergeBamAlignment step of the GATK Best Practices workflow (FireCloud workspace: PreProcessingForVariantDiscovery_GATK4), which I believe is the step before SortandFixTags, where the analyses fails, ValidateSamFile detects no errors when run using IGNORE=MISSING_TAG_NM, however produces many warnings that the NM tag is missing if run without this. samtools quickcheck does not appear to provide any output on this file. I also wanted to note that ValidateSamFile would not run on this file until I removed the specification in the command for the reference file (for the GATK resource bundle) and an error was produced saying: 'ValidateSamFile Requesting earlier reference sequence: 0 < 15'. This did not occur when I used the reference file in ValidateSamFile on the unaligned BAM.

    I also tried running FASTQtoSam with MAX_RECORDS_IN_RAM='null' in an attempt to prevent any memory issues from truncating the BAM. Assuming I used this correctly, it made no difference, and the analysis also failed during SortAndFixTags.

    I will also make the .bam file available following the instructions you sent for a bug report submission.
  • omw90omw90 Member
    I now submitted the unmapped BAM to the ftp.broadinstitute.org as 'BUG_REPORT_06-07-19.zip'
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @omw90

    I checked with our dev team and this is what they have to say:
    1) "samtools count -c" was a typo, and should have been "samtools view -c"
    2) The step where you should test ValidateSamFile is the one that is failing, i.e. test the bam that is being passed to SetNmAndUqTags (the unmapped bam is not helpful).
    3) Previously you indicated that you had checked that bam and it didn't have problems, which indicates either a problem with ValidateSamFile or SetNmAndUqTags. It would be good to verify this and test that bam locally.

    • If the bam passes ValidateSamFile and SetNmAndUqTags runs successfully locally, then this indicates some error in Firecloud (possibly transient and unlikely to repeat, possibly due to insufficient memory or disk space).
    • If the bam passes ValidateSamFile and crashes with error in SetNmAndUqTags, this indicates a bug: please share that bam file with us so that we can find the bug and fix it.
    • If the bam does not pass ValidateSamFile, this indicates a problem earlier in the pipeline, probably with MergeBamAlignment. You should look at the error log of the process that generated it.
    • If your WDL does not include the line "set -e -o pipefail" near the top, and incomplete output files are generated before crashing, cromwell/Firecloud will pass these incomplete files to the next stage without warning, and the next stage will then crash on the incomplete files. However the log files for the job should still retain error messages.
  • omw90omw90 Member
    Thanks again for your response @bhanuGandham

    Following your recommendations, I have tested the BAM file that is failing from in the SortAndFixTags step FireCloud workspace PreProcessingForVariantDiscovery_GATK4. samtools view -c for this file returns: 28631089

    When I run ValidateSamFile on this BAM (with MO=2147483647) I obtain warnings indicating "WARNING: Record 25752, Read name NB551129:36:HWC3WBGX3:1:11101:2784:16559, NM tag (nucleotide differences) is missing" for each read. If I run ValidateSamFile with IGNORE_WARNINGS=TRUE or IGNORE=MISSING_TAG_NM, no errors are found.

    I did also realize that the FireCloud workspace I have been using (PreProcessingForVariantDiscovery_GATK4, Snapshot ID: 8) appears to use the deprecated SetNmAndUqTags instead of SetNmMdAndUqTags. Could this perhaps be related to the issue?

    When I use a Firecloud workspace that seems to be more up to date and uses SetNmMdAndUqTags (processing-for-variant-discovery-gatk4, Snapshot ID: 8, created by [email protected]), SetNmMdAndUqTags seems to run successfully to a point, but the SortAndFixTags step seems to fail due to failure to locate/produce the md5 file (as indicated in the stderr). The failure message I get from the SortAndFixTags step in the Firecloud workspace is:

    message: Task PreProcessingForVariantDiscovery_GATK4.SortAndFixTags:NA:1 failed. Job exit code 3. Check gs://fc-db2ec1e5-5b5e-4fc8-b493-09ab42a1c150/a74babc3-39b7-4461-af0a-0a1570f19842/PreProcessingForVariantDiscovery_GATK4/31000887-0d4f-4561-8b16-ca19e6b020b1/call-SortAndFixTags/stderr for more information. PAPI error code 9. Please check the log file for more details: gs://fc-db2ec1e5-5b5e-4fc8-b493-09ab42a1c150/a74babc3-39b7-4461-af0a-0a1570f19842/PreProcessingForVariantDiscovery_GATK4/31000887-0d4f-4561-8b16-ca19e6b020b1/call-SortAndFixTags/SortAndFixTags.log.

    Thanks again for your ongoing help with this issue. Apologies my responses are so convoluted..
  • omw90omw90 Member
    as an update here, if I run SetNmMdAndUqTags locally on the Bam that is failing in the SortAndFixTags step, SetNmMdAndUqTags fails, however if I change the --CREATE_INDEX flag to false, SetNmMdAndUqTags runs successfully.

    from looking around at other posts, this seems like it might be related to he BAM index format limit of of 512MB for contigs..

    Hopefully this is helpful additional info.
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @omw90

    I apologize I didn't check this earlier. You are using an outdated data preprocessing workflow.
    Try this workflow, it was updated more recently: https://portal.firecloud.org/#methods/gatk/processing-for-variant-discovery-gatk4/8

    The workflow you are using is an old GATK 4.beta.3 version.

    Let me know if the error persists.

  • omw90omw90 Member
    OK, thanks for the suggestion. I have run the this new workflow several times however it still fails each time during the SortAndFixTags step.

    The BAM file going into this step passes ValidateSamFile with no errors found, provided I set to IGNORE=MISSING_TAG_NM. Also, I am running ValidateSamFile without a reference sequence, as when I try top include the reference sequence (Homo_sapiens_assembly38.fasta, from the GATK resource bundle) I get an error: "ValidateSamFile Requesting earlier reference sequence: 0 < 15".

    The failure produced by the FireCloud workspace is:

    "message: Task PreProcessingForVariantDiscovery_GATK4.SortAndFixTags:NA:1 failed. The job was stopped before the command finished. PAPI error code 10. The assigned worker has failed to complete the operation".

    In case its helpful, below is the log from the SortAndFixTags step. Any thoughts you have would be much appreciated.

    2019/06/13 21:29:24 Starting container setup.
    2019/06/13 21:29:32 Done container setup.
    2019/06/13 21:29:38 Starting localization.
    2019/06/13 21:29:45 Localizing input gs://fc-db2ec1e5-5b5e-4fc8-b493-09ab42a1c150/82384e60-3b09-4c18-bb9b-7d418008bb8e/PreProcessingForVariantDiscovery_GATK4/8f0187fe-1fe7-40f6-a66b-676d3c20c455/call-MarkDuplicates/hd753rep2.hg38.aligned.unsorted.duplicates_marked.bam -> /cromwell_root/fc-db2ec1e5-5b5e-4fc8-b493-09ab42a1c150/82384e60-3b09-4c18-bb9b-7d418008bb8e/PreProcessingForVariantDiscovery_GATK4/8f0187fe-1fe7-40f6-a66b-676d3c20c455/call-MarkDuplicates/hd753rep2.hg38.aligned.unsorted.duplicates_marked.bam
    2019/06/13 21:30:21 Localizing input gs://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta -> /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta
    2019/06/13 21:31:16 Localizing input gs://broad-references/hg38/v0/Homo_sapiens_assembly38.dict -> /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.dict
    2019/06/13 21:31:29 Localizing input gs://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai -> /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai
    2019/06/13 21:31:37 Localizing input gs://fc-db2ec1e5-5b5e-4fc8-b493-09ab42a1c150/82384e60-3b09-4c18-bb9b-7d418008bb8e/PreProcessingForVariantDiscovery_GATK4/8f0187fe-1fe7-40f6-a66b-676d3c20c455/call-SortAndFixTags/script -> /cromwell_root/script
    2019/06/13 21:31:45 Done localization.
    2019/06/13 21:31:51 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint= broadinstitute/[email protected]:c0f45677e9af6bba65e7234c33a7447f615febd1217e80ea2231fab69bb350a0 /bin/bash /cromwell_root/script
    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.c5afbcae
    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.c5afbcae
    21:32:10.624 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    21:32:10.677 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    [Thu Jun 13 21:32:10 UTC 2019] SetNmMdAndUqTags --INPUT /dev/stdin --OUTPUT hd753rep2.hg38.aligned.duplicate_marked.sorted.bam --CREATE_INDEX true --CREATE_MD5_FILE true --REFERENCE_SEQUENCE /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta --IS_BISULFITE_SEQUENCE false --SET_ONLY_UQ false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false[Thu Jun 13 21:32:10 UTC 2019] SortSam --INPUT /cromwell_root/fc-db2ec1e5-5b5e-4fc8-b493-09ab42a1c150/82384e60-3b09-4c18-bb9b-7d418008bb8e/PreProcessingForVariantDiscovery_GATK4/8f0187fe-1fe7-40f6-a66b-676d3c20c455/call-MarkDuplicates/hd753rep2.hg38.aligned.unsorted.duplicates_marked.bam --OUTPUT /dev/stdout --SORT_ORDER coordinate --CREATE_INDEX false --CREATE_MD5_FILE false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false

    [Thu Jun 13 21:32:11 UTC 2019] Executing as [email protected] on Linux 4.14.104+ amd64; OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.2.0
    [Thu Jun 13 21:32:12 UTC 2019] Executing as [email protected] on Linux 4.14.104+ amd64; OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.2.0
    INFO 2019-06-13 21:32:12 SAMFileWriterFactory Unknown file extension, assuming BAM format when writing file: file:///dev/stdout
    INFO 2019-06-13 21:35:25 SortSam Read 10,000,000 records. Elapsed time: 00:03:12s. Time for last 10,000,000: 192s. Last read position: chr11:118,480,235
    INFO 2019-06-13 21:38:30 SortSam Read 20,000,000 records. Elapsed time: 00:06:17s. Time for last 10,000,000: 184s. Last read position: chr9:5,054,676
    INFO 2019-06-13 21:41:13 SortSam Finished reading inputs, merging and writing to output now.
    INFO 2019-06-13 21:41:14 SortingCollection Creating merging iterator from 58 files
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited June 14

    Hi @omw90

    So this looks like a new error compared to the last one, PAPI error code 10. @Tiffany_at_Broad who is our Firecloud/Terra lead can look into this for you. I will reach out to her and explain the issue we are facing. We will get back to you shortly.

  • omw90omw90 Member
    ok thanks @bhanuGandham
  • AdelaideRAdelaideR Member admin

    Hi @omw90

    I see you have a PAPI error 10 code.

    Have you taken a look at this recent article by @SChaluvadi?

    It may explain what is happening and how to work around it.

    Here is the link

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hi Owen, I just sent an email over to you last night. Let me know if you didn't get it. We can work on it there and come back and post the resolution so that the knowledge doesn't get lost.

  • omw90omw90 Member
    Thanks @Tiffany_at_Broad , I have just emailed back.

    Also thanks @AdelaideR , I will give that a try.
Sign In or Register to comment.