The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

Picard ReorderSam error: Invalid reference index -1

nstranskynstransky Cambridge, MAMember

Hi there,
I have the following error when running picard ReorderSam.

java -jar /mnt/informatics/usr/local/picard-tools/picard.jar ReorderSam INPUT= sample1.bam OUTPUT= sample1_dedup.bam REFERENCE= hg38.fa VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true

When it gets to chrM, which has no reads in my file, picard crashes:
... INFO 2016-05-19 19:40:28 ReorderSam Processing chr8 INFO 2016-05-19 19:40:37 ReorderSam Wrote 2561012 reads INFO 2016-05-19 19:40:37 ReorderSam Processing chr9 INFO 2016-05-19 19:40:50 ReorderSam Wrote 3504004 reads INFO 2016-05-19 19:40:50 ReorderSam Processing chr9_KI270717v1_random INFO 2016-05-19 19:40:50 ReorderSam Wrote 0 reads INFO 2016-05-19 19:40:50 ReorderSam Processing chr9_KI270718v1_random INFO 2016-05-19 19:40:50 ReorderSam Wrote 0 reads INFO 2016-05-19 19:40:50 ReorderSam Processing chr9_KI270719v1_random INFO 2016-05-19 19:40:50 ReorderSam Wrote 0 reads INFO 2016-05-19 19:40:50 ReorderSam Processing chr9_KI270720v1_random INFO 2016-05-19 19:40:50 ReorderSam Wrote 0 reads [Thu May 19 19:40:50 EDT 2016] picard.sam.ReorderSam done. Elapsed time: 6.03 minutes. Runtime.totalMemory()=1489502208 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" java.lang.IllegalArgumentException: Invalid reference index -1 at htsjdk.samtools.QueryInterval.<init>(QueryInterval.java:24) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:504) at picard.sam.ReorderSam.doWork(ReorderSam.java:124) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:209) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

The bam and the reference file have the same dict and I can run picard with ALLOW_INCOMPLETE_DICT_CONCORDANCE=false. I'm not sure why it would crash right before chrM because there are a bunch of chromosomes in the bam that have zero reads.

Any pointers? Any other info that could help?

Thanks!

Nico

Tagged:

Answers

  • nstranskynstransky Cambridge, MAMember

    I should add that I'm using the latest Picard version:
    [Thu May 19 19:34:48 EDT 2016] Executing as nstransky@clust-node-03 on Linux 2.6.32-358.2.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_65-b17; Picard version: 2.3.0(9a00c87b7ffdb01cfb5a0d6e76556146196babb8_1463071327) IntelDeflater

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @nstransky
    Hi Nico,

    It seems like the reference index file you are using might be corrupt. Can you try deleting it and re-generating it?

    Thanks,
    Sheila

  • nstranskynstransky Cambridge, MAMember

    Great suggestion, thanks Sheila! I assume you are talking about hg38.fa.fai right?
    Thanks,
    Nico

  • nstranskynstransky Cambridge, MAMember

    Hi Sheila,
    It would appear that the .fai index file was not corrupted because the error persists.
    However I think I pinpointed the problem:
    The bam file does not contain any of the alternate locus scaffolds (chr*_alt), whereas hg38.fa does. Therefore ReorderSam complains about the absence of those contigs in the bam file (perhaps an informative error message could help?).
    Is there a way to make ReorderSam not complain about the "missing" _alt contigs?
    Thanks,
    Nico

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @nstransky
    Hi Nico,

    I see. So, the reference you are using is not the same reference you mapped the reads in the BAM file to? Is it possible to use the same reference used in the mapping step?

    I'm also confused why the reference.dict file and the BAM index file have the same contigs if the BAM file is missing some of the contigs? Can you tell us what tools you used before running ReorderSam?

    Thanks,
    Sheila

  • nstranskynstransky Cambridge, MAMember

    It was indeed the case:
    hg38.fa contains _alt contigs that were not present in the bam file, causing ReorderSam to fail.
    if ALLOW_INCOMPLETE_DICT_CONCORDANCE=true allows for missing BAM contigs in the reference contigs, why not allowing as well missing reference contigs in the BAM contigs? This would be especially useful downstream of tools that are not 'hg38 alternate locus reference'-aware.

  • nstranskynstransky Cambridge, MAMember

    Hi Sheila,
    I used STAR to align the RNAseq data. The STAR indexes were constructed using all individual chromosome and contigs fasta sequences from the reference genome, except the '_alt' contigs. Then I want to process the STAR-aligned files further and one step involves ReorderSam which I ran using the whole genome .fa reference and didn't think that the additional '_alt' contigs would be a problem. Now I circumvented the problem by creating a whole genome .fa reference that excludes the '_alt' contigs and ReorderSam worked.
    Nico

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @nstransky
    Hi again Nico,

    I have a few other questions/suggestions.

    1) Why are you using ReorderSam?
    2) Can you try running ValidateSamFile on your BAM file?

    Thanks,
    Sheila

  • nstranskynstransky Cambridge, MAMember

    1) Why are you using ReorderSam?

    Long story short, I want to remove some (mouse) contigs from the bam header, as well as make sure the contigs are in the same order as the whole genome fasta file, for compatibility with downstream GATK steps.

    2) Can you try running ValidateSamFile on your BAM file?

    Error Type Count
    WARNING:MISSING_TAG_NM 99952562

    This is a warning that I'm dealing with further in the pipeline.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @nstransky
    Hi Nico,

    I see. The major problem is that you should use the entire reference genome for mapping. Mapping to individual chromosomes is never a good idea. The best thing to do is go back and re-map your reads against the full reference, so you won't have to do any extra pre-processing steps (such as ReorderSam) that can cause errors.

    -Sheila

  • nstranskynstransky Cambridge, MAMember

    I am mapping to all chromosomes, not individual chromosomes. I'm just excluding alternative haplotypes as per STAR aligner recommendations:
    "Generally, patches and alternative haplotypes should not be included in the genome."
    Effectively, this is what I am doing now, by removing the _alt contigs from the whole fasta reference.
    It is strange that ReorderSam's behavior goes one way and not the other (allows additional contigs in the bam but not the reference) and does not give a decipherable error message.

    Issue · Github
    by Sheila

    Issue Number
    925
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
Sign In or Register to comment.