Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

MESSAGE: BUG: requested unknown contig=ERCC-00002 index=-1

Hi,

I'm currently running variant calling on RNA-SEQ data from the ENCODE Project. To streamline the process, I have downloaded their previously aligned RNA-SEQ data (they used STAR aligner.) I then planned on adding read groups, sorting/marking duplicates, reassigning mapping qualities and recalibration before variant calling. However, while on the step to use Split'N'Trim to reassign mapping qualities, I was hit with the following error:

MESSAGE: BUG: requested unknown contig=ERCC-00002 index=-1

I saw a previous thread with somebody having the same issue, and it was recommended to use -fixNDN, but I was wondering if anybody else could pitch in on why this error was caused and if using the previously aligned data will be okay to use with the best practices workflow.

BTW.. for adding read groups and marking duplicates, I simply used the basic parameters outlined in this thread: https://software.broadinstitute.org/gatk/guide/article?id=3891, while making sure to edit the read group information specific for my data.

For the split and trim step that caused the error, this is what was used:

java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R hg38.fasta -I dedupped.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS

Answers

  • toledo32325toledo32325 ToledoMember
    edited January 2017

    Sorry to double-post, but just to update, even adding the -fixNDN that was recommended didn't work and the same error resulted: ERROR MESSAGE: BUG: requested unknown contig=ERCC-00002 index=-1 The error comes up very close to the process being over( in fact, there was only 2 minutes left when the error came up.)

    Should also mention I'm using the same reference genome that ENCODE used for alignment (GRCh38). I also indexed it using CreateSequencDictionary and Samtools.

    Is there something I'm missing here?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @toledo32325
    Hi,

    Can you please post the BAM header (specifically the SQ lines) and the FASTA dict file?

    Thanks,
    Sheila

Sign In or Register to comment.