We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Sequences at index 0 don't match, but using the same reference genome

NiekdeKleinNiekdeKlein GroningenMember

I am trying to run CollectMultipleMetrics on a CRAM file but I get an "Sequences at index 0 don't match, but using the same reference genome". What can I do to resolve this?

Below what I have tried so far:

I am trying to run CollectMultipleMetrics on a CRAM file with the following code:

java -jar picard.jar CollectMultipleMetrics I=SRR1378155_SRR1378155.cram O=SRR1378155_SRR1378155.multiplemetrics R=/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa

but I get the error

htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequences at index 0 don't match: 0/248956422/chr1/M5=2648ae1bacce4ec4b6cf337dcae37816/UR=/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa 0/248956422/chr1/M5=1f9b4bfb2d6193a45e52901b8aa4339e/UR=file:/apps/data/ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa.gz

I am not sure where the "UR=file:/apps/data/ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa.gz" comes from, as when I look in the CRAM header it shows "UR:/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa" for the contigs.

I tried resorting using the reference from last post here: http://seqanswers.com/forums/archive/index.php/t-14635.html with

 java -jar picard.jar ReorderSam INPUT=SRR1378155_SRR1378155.cram  OUTPUT=SRR1378155_SRR1378155.sorted.cram REFERENCE=/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa`

And tried it suing the .gzipped reference file as well, but for both I got the same sequence don't match error.

When I use ValidateSAM on the CRAM file, I get "Exception in thread "main" htsjdk.samtools.cram.CRAMException: Contig chr1 not found in the reference file.", from answer here https://gatkforums.broadinstitute.org/gatk/discussion/7944/can-validatesamfile-check-on-cram-files I converted my CRAM to BAM file and then ran validateSAM, whic gave me:

HISTOGRAM java.lang.String

Error Type Count
ERROR:MISSING_READ_GROUP 1
WARNING:RECORD_MISSING_READ_GROUP 124248788

I then added the read group to the BAM file and after that there are no errors found in the BAM file, but when I try to use CollectMultipleMetrics I again get the "Sequences at index 0 don't match" error.

First couple of lines with CRAM header, shows UR:/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa:

@HD VN:1.5 SO:coordinate
@PG ID:STAR PN:STAR VN:STAR_2.5.1b CL:STAR --runThreadN 8 --genomeDir /apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/STAR/2.5.1b-foss-2015b/ --genomeLoad NoSharedMemory --readFilesIn /groups/umcg-biogen/tmp04/biogen/input/GTEx/sra/fastq/cart_prj8709_201809140901/SRR1378155_1.fastq.gz /groups/umcg-biogen/tmp04/biogen/input/GTEx/sra/fastq/cart_prj8709_201809140901/SRR1378155_2.fastq.gz --readFilesCommand zcat --outFileNamePrefix /local/5210636/SRR1378155_SRR1378155. --outSAMunmapped Within --outFilterMultimapNmax 1 --outFilterMismatchNmax 6 --quantMode GeneCounts --twopassMode Basic
@PG ID:scramble PN:scramble PP:STAR VN:1.14.6 CL:scramble -I bam -O cram -r /apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa -t 2 /local/5211312/SRR1378155_SRR1378155.fixmates.bam /groups//umcg-biogen/tmp04/biogen/input/GTEx/pipelines/results///cramFiles/SRR1378155_SRR1378155.cram
@CO user command line: STAR --outFileNamePrefix /local/5210636/SRR1378155_SRR1378155. --readFilesIn /groups/umcg-biogen/tmp04/biogen/input/GTEx/sra/fastq/cart_prj8709_201809140901/SRR1378155_1.fastq.gz /groups/umcg-biogen/tmp04/biogen/input/GTEx/sra/fastq/cart_prj8709_201809140901/SRR1378155_2.fastq.gz --readFilesCommand zcat --genomeDir /apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/STAR/2.5.1b-foss-2015b/ --genomeLoad NoSharedMemory --runThreadN 8 --outFilterMultimapNmax 1 --outFilterMismatchNmax 6 --twopassMode Basic --quantMode GeneCounts --outSAMunmapped Within
@SQ SN:chr1 LN:248956422 M5:2648ae1bacce4ec4b6cf337dcae37816 UR:/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa

Best Answers

  • NiekdeKleinNiekdeKlein Groningen
    Accepted Answer

    I hadn't looked at the .dict file, apparently this was made using the gzipped reference genome fasta. I remade the .dict file using the unzipped reference genome fasta and now it works.

    Thanks for the help @shlee!

Answers

Sign In or Register to comment.