Indelrealigner doesn't seem to create the list/intervals file.

mmats010mmats010 Riverside CAMember

Hello,
I am trying to realign indels, but the first step on indel realignment seems to skip over creating the interval file.

My bam files are created using:

bwa mem -M -t 16 -c 3 -R "@RG\tID:group1\tSM:sample1\tPL:illumina\tLB:lib1\tPU:unit1" $RefSeq M1_pair1.fastq.gz M1_pair2.fastq.gz | samtools view -Sbh - > M1test7.bam;

samtools sort M1test7.bam M1test7.sorted;
samtools index M1test7.sorted.bam;

Then, I try to run the realignment:
java -Xmx4g -jar $GATK -T RealignerTargetCreator -R $RefSeq -o realignment_targetsM1test7.interval -I M1test7.sorted.bam;
java -Xmx4g -Djava.io.tmpdir=./tmp -jar $GATK -T IndelRealigner -R $RefSeq -I M1test7.sorted.bam -targetIntervals realignment_targetsM1test7.interval -o M1test7realigned.bam;

However, I get several errors on step one (creating the interval files)
INFO 12:49:08,323 HelpFormatter - Program Args: -T RealignerTargetCreator -R /bigdata/judelsonlab/mmatson/projects/refSeqs/ensemblEditedCNV/PinfestansEnsemblSorted.fa -o realignment_targetsM1test7.interval -I M1test7.sorted.bam
INFO 12:49:08,328 HelpFormatter - Executing as mmatson@c22 on Linux 3.10.0-229.1.2.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_17-b02.
INFO 12:49:08,329 HelpFormatter - Date/Time: 2016/02/16 12:49:08
INFO 12:49:08,330 HelpFormatter - ---------------------------------------------------------------------------------
INFO 12:49:08,331 HelpFormatter - ---------------------------------------------------------------------------------
INFO 12:49:09,103 GenomeAnalysisEngine - Strictness is SILENT
INFO 12:49:10,130 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 12:49:10,145 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 12:49:10,782 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.64
INFO 12:49:11,125 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 12:49:11,489 GenomeAnalysisEngine - Done preparing for traversal
INFO 12:49:11,492 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 12:49:11,494 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 12:49:11,496 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 12:49:13,199 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalArgumentException
at java.nio.ByteBuffer.allocate(ByteBuffer.java:330)
at htsjdk.samtools.reference.IndexedFastaSequenceFile.getSubsequenceAt(IndexedFastaSequenceFile.java:195)
at org.broadinstitute.gatk.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(CachingIndexedFastaSequenceFile.java:329)
at org.broadinstitute.gatk.engine.datasources.providers.LocusReferenceView.initializeReferenceSequence(LocusReferenceView.java:150)
at org.broadinstitute.gatk.engine.datasources.providers.LocusReferenceView.(LocusReferenceView.java:126)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:90)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.4-46-gbc02625):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Code exception (see stack trace for error itself)

The last error before the file exits is

ERROR MESSAGE: Couldn't read file /bigdata/judelsonlab/mmatson/projects/fastqFiles/bsaProgeny/realignment_targetsM1test7.interval because The interval file does not exist.
ERROR ------------------------------------------------------------------------------------------

Error: Unable to access jarfile FixMateInformation

So, I can't seem to determine why my interval file is not being created. I have tried deleting my reference sequence and re indexing and creating a new dict file as well. Any help would be appreciated.

Mike

Tagged:

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    How are you running this? In a script? It's not possible to get the three separate error messages you posted from running a single RealignerTargetCreator command.

  • mmats010mmats010 Riverside CAMember

    Yes, it is in a shell script over a cluster.
    I would also just ignore the realignment step since if i wish to call SNPs, I understand HaplotypeCaller will do a local realignment anyway, but I worry that the same error affecting RealignerTargetCreator will also affect other tools in GaTK.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Even if you're just interested in SNPs, it is useful to do realignment.

    I think your script is not working properly. After the first step (RTC) fails, it should exit with an error. Instead it tries to run subsequent steps but the files it needs are not available since they weren't produced due to the earlier failure. I would recommend fixing that otherwise you will have a hard time troubleshooting errors.

    Meanwhile the problem that is making the first step fail is that there's something wrong with your reference file or its index and sequence dictionary files. You should check those (possibly regenerate the index and sequence dictionary as described in the documentation).

  • mmats010mmats010 Riverside CAMember

    Hi Geraldine,
    I tried re-uploading the genome.fa file to its location on our cluster, as well as re-indexing and creating a new dictionary as you suggested. However, I still get the same error. This particular BAM with this particular reference sequence is able to be viewed on IGV, so I'm fairly sure nothing is very wrong with them.

    As for the script, it's a pretty basic script, so no quality control lines which are looking for if a step was successful, and killing the script if not.

    Thanks, Mike

  • mmats010mmats010 Riverside CAMember

    Unfortunately, we dont have 3.5 on our cluster just yet. I'll get back to you once I get access and re-try the script!

  • mmats010mmats010 Riverside CAMember

    Yup, it turns out that was the issue. We upgraded to 3.5 and the problem was immediately solved with no tweaks to my script.
    Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Great, that's good to hear. I would still encourage you to add some gate-checking to your script for future convenience, but if you don't, keep in mind that only the first error will be relevant if anything goes wrong.

Sign In or Register to comment.