Reduce running time for tool RealignerTargetCreator

ywy25ywy25 Member
edited June 2014 in Ask the GATK team

Dear GATK team,

I'm using GATK RealignerTargetCreator and IndelRealigner for a very small region(~100bp) that trimmed from the original whole exome BAM file.
For example, the region I need is chr1:1-150 (only contain one realign target).
I first used samtools to get the BAM for this region.
Then I met a problem while running RealignerTargetCreator with only chr1 (have chr01.dict) as reference file. Please see the following command I use:

java -Xmx1g -jar ~/programs/GenomeAnalysisTK.jar -T RealignerTargetCreator -R chr01.fa -I Trim_test_sort.bam -o realigner.intervals

Here is the error message:
ERROR MESSAGE: Badly formed genome loc: Contig chr2 given as location, but this contig isn't present in the Fasta sequence dictionary

I found this problem could be solved by using whole genome as reference (i.e. hg19.fa). However, it will take a very long time to go through every chromosome (step ProgressMeter), although the BAM file only contain reads located in chr1:1-150.
I also tried to delete some @SQ lines from the trimmed BAM header, but it didn't work.

Just wondering if there is anyway to let RealignerTargetCreator only go through the chr1:1-150 (or just chr1) to save time?

Many thanks!!


