Problems with BAMs decompressed from CRAMs

I have a number of NGS samples, with 30x coverage. Samples were processed according to GATK Best Practices. For some samples, the processed BAMs were kept, for others BAMs were compressed to CRAMs. My task now is to use Downsample option in Picard to get various low coverage scenarios.
For the samples were original BAMs were kept, the downsampling is carried out as expected. However, BAMs obtained from CRAMs do not work at all.

First, I have decompressed CRAMs to BAMs using samtools. This seemed to work fine, as I was able to add index and get stats of the BAM file. However, when I tried to use the file for downsampling, I was getting errors that the file is truncated.
Thinking it might be an issue with decompressing, I tried using cramtools. This time the decompression failed, with following error:
ERROR 2018-02-02 09:23:13 ReferenceSource Downloaded sequence is corrupt: requested md5=971cb1c7a7f62a402dab61cfe84a93b1, received md5=d41d8cd98f00b204e9800998ecf8427e
ERROR 2018-02-02 09:23:13 ReferenceSource Downloaded sequence is corrupt: requested md5=971cb1c7a7f62a402dab61cfe84a93b1, received md5=d41d8cd98f00b204e9800998ecf8427e
ERROR 2018-02-02 09:23:13 Cram2Bam Can't find reference to validate slice md5: 0 CM000093.4
ERROR 2018-02-02 09:23:13 ReferenceSource Downloaded sequence is corrupt: requested md5=971cb1c7a7f62a402dab61cfe84a93b1, received md5=d41d8cd98f00b204e9800998ecf8427e
ERROR 2018-02-02 09:23:13 ReferenceSource Downloaded sequence is corrupt: requested md5=971cb1c7a7f62a402dab61cfe84a93b1, received md5=d41d8cd98f00b204e9800998ecf8427e
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at net.sf.cram.CramTools.invoke(CramTools.java:91)
at net.sf.cram.CramTools.main(CramTools.java:121)
Caused by: java.lang.RuntimeException: Reference sequence required but not found: CM000093.4, md5=971cb1c7a7f62a402dab61cfe84a93b1
at htsjdk.samtools.cram.build.CramNormalizer.restoreBases(CramNormalizer.java:228)
at htsjdk.samtools.cram.build.CramNormalizer.normalizeRecordsForReferenceSource(CramNormalizer.java:201)
at net.sf.cram.Cram2Bam.main(Cram2Bam.java:237)
... 6 more

According to this source https://github.com/enasequence/cramtools/issues/74, this is due to the fact that cramtools are not patched for use of https.

I have finally run the ValidateSamFile option of Picard, which revealed multiple (reach maximum output at 100) errors:

ERROR: Record x, Read name y, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned

Is this something that can be somehow fixed? Could it be that the original BAM was created using a previous version of the reference genome? I did not create the BAMs, so am trying to find any possible causes of the problem, and ideally, solution to avoid re-processing the samples!

Best Answer

Answers

  • Thank you! I will check it, though it is not an optimal solution, as we do have some samples downsampled using Piccard already... Either way I would have to re-do something that has been done already to keep the method consistent. I was hoping that there is some solution to the decompressing of the CRAMs into workable BAMs, rather than finding a way to downsample them.

  • TechnicalVaultTechnicalVault Cambridge, UKMember ✭✭✭

    Well I know samtools does convert CRAMs to BAMs correctly, the only time I've really run into problems is where I have run out of disk space/quota and forgotten to check exit codes. An alternative so to use scramble from the staden package (James prototyped CRAM support for samtools in there).

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    @jilska and @TechnicalVault, GATK4 supports CRAMs as input anywhere a BAM is accepted.

Sign In or Register to comment.