Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

Sequences at index 0 don't match, but using the same reference genome

NiekdeKleinNiekdeKlein GroningenMember

I am trying to run CollectMultipleMetrics on a CRAM file but I get an "Sequences at index 0 don't match, but using the same reference genome". What can I do to resolve this?

Below what I have tried so far:

I am trying to run CollectMultipleMetrics on a CRAM file with the following code:

java -jar picard.jar CollectMultipleMetrics I=SRR1378155_SRR1378155.cram O=SRR1378155_SRR1378155.multiplemetrics R=/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa

but I get the error

htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequences at index 0 don't match: 0/248956422/chr1/M5=2648ae1bacce4ec4b6cf337dcae37816/UR=/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa 0/248956422/chr1/M5=1f9b4bfb2d6193a45e52901b8aa4339e/UR=file:/apps/data/ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa.gz

I am not sure where the "UR=file:/apps/data/ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa.gz" comes from, as when I look in the CRAM header it shows "UR:/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa" for the contigs.

I tried resorting using the reference from last post here: http://seqanswers.com/forums/archive/index.php/t-14635.html with

 java -jar picard.jar ReorderSam INPUT=SRR1378155_SRR1378155.cram  OUTPUT=SRR1378155_SRR1378155.sorted.cram REFERENCE=/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa`

And tried it suing the .gzipped reference file as well, but for both I got the same sequence don't match error.

When I use ValidateSAM on the CRAM file, I get "Exception in thread "main" htsjdk.samtools.cram.CRAMException: Contig chr1 not found in the reference file.", from answer here https://gatkforums.broadinstitute.org/gatk/discussion/7944/can-validatesamfile-check-on-cram-files I converted my CRAM to BAM file and then ran validateSAM, whic gave me:

HISTOGRAM java.lang.String

Error Type Count
ERROR:MISSING_READ_GROUP 1
WARNING:RECORD_MISSING_READ_GROUP 124248788

I then added the read group to the BAM file and after that there are no errors found in the BAM file, but when I try to use CollectMultipleMetrics I again get the "Sequences at index 0 don't match" error.

First couple of lines with CRAM header, shows UR:/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa:

@HD VN:1.5 SO:coordinate
@PG ID:STAR PN:STAR VN:STAR_2.5.1b CL:STAR --runThreadN 8 --genomeDir /apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/STAR/2.5.1b-foss-2015b/ --genomeLoad NoSharedMemory --readFilesIn /groups/umcg-biogen/tmp04/biogen/input/GTEx/sra/fastq/cart_prj8709_201809140901/SRR1378155_1.fastq.gz /groups/umcg-biogen/tmp04/biogen/input/GTEx/sra/fastq/cart_prj8709_201809140901/SRR1378155_2.fastq.gz --readFilesCommand zcat --outFileNamePrefix /local/5210636/SRR1378155_SRR1378155. --outSAMunmapped Within --outFilterMultimapNmax 1 --outFilterMismatchNmax 6 --quantMode GeneCounts --twopassMode Basic
@PG ID:scramble PN:scramble PP:STAR VN:1.14.6 CL:scramble -I bam -O cram -r /apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa -t 2 /local/5211312/SRR1378155_SRR1378155.fixmates.bam /groups//umcg-biogen/tmp04/biogen/input/GTEx/pipelines/results///cramFiles/SRR1378155_SRR1378155.cram
@CO user command line: STAR --outFileNamePrefix /local/5210636/SRR1378155_SRR1378155. --readFilesIn /groups/umcg-biogen/tmp04/biogen/input/GTEx/sra/fastq/cart_prj8709_201809140901/SRR1378155_1.fastq.gz /groups/umcg-biogen/tmp04/biogen/input/GTEx/sra/fastq/cart_prj8709_201809140901/SRR1378155_2.fastq.gz --readFilesCommand zcat --genomeDir /apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/STAR/2.5.1b-foss-2015b/ --genomeLoad NoSharedMemory --runThreadN 8 --outFilterMultimapNmax 1 --outFilterMismatchNmax 6 --twopassMode Basic --quantMode GeneCounts --outSAMunmapped Within
@SQ SN:chr1 LN:248956422 M5:2648ae1bacce4ec4b6cf337dcae37816 UR:/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa

Best Answers

  • NiekdeKleinNiekdeKlein Groningen
    Accepted Answer

    I hadn't looked at the .dict file, apparently this was made using the gzipped reference genome fasta. I remade the .dict file using the unzipped reference genome fasta and now it works.

    Thanks for the help @shlee!

Answers

Sign In or Register to comment.