Config issue with RealignerTargetCreator

I'm running RealignerTargetCreator (using Queue for submission) and the following error comes up:

ERROR contig known = chr9 / 141213430
ERROR contig reference = chr9 / 141213431.
ERROR known contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrMT, GU071091]
ERROR reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrMT, GU071091]

Alignment was done with the same reference file that is being used by RealignerTargetCreator; absolutely no changes were made to it.
The following is the dictionary that was produced of the reference file by Picard for GATK.

@HD VN:1.4 SO:unsorted
@SQ SN:chr1 LN:249250621 M5:1b22b98cdeb4a9304cb5d48026a85128 UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr2 LN:243199373 M5:a0d9851da00400dec1098a9255ac712e UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr3 LN:198022430 M5:fdfd811849cc2fadebc929bb925902e5 UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr4 LN:191154276 M5:23dccd106897542ad87d2765d28a19a1 UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr5 LN:180915260 M5:0740173db9ffd264d728f32784845cd7 UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr6 LN:171115067 M5:1d3a93a248d92a729ee764823acbbc6b UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr7 LN:159138663 M5:618366e953d6aaad97dbe4777c29375e UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr8 LN:146364022 M5:96f514a9929e410c6651697bded59aec UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr9 LN:141213431 M5:3e273117f15e0a400f01055d9f393768 UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr10 LN:135534747 M5:988c28e000e84c26d552359af1ea2e1d UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr11 LN:135006516 M5:98c59049a2df285c76ffb1c6db8f8b96 UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr12 LN:133851895 M5:51851ac0e1a115847ad36449b0015864 UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr13 LN:115169878 M5:283f8d7892baa81b510a015719ca7b0b UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr14 LN:107349540 M5:98f3cae32b2a2e9524bc19813927542e UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr15 LN:102531392 M5:e5645a794a8238215b2cd77acb95a078 UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr16 LN:90354753 M5:fc9b1a7b42b97a864f56b348b06095e6 UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr17 LN:81195210 M5:351f64d4f4f9ddd45b35336ad97aa6de UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr18 LN:78077248 M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr19 LN:59128983 M5:1aacd71f30db8e561810913e0b72636d UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr20 LN:63025520 M5:0dec9660ec1efaaf33281c0d5ea2560f UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr21 LN:48129895 M5:2979a6085bfe28e3ad6f552f361ed74d UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chr22 LN:51304566 M5:a718acaa6135fdca8357d5bfe94211dd UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chrX LN:155270560 M5:7e0e2e580297b7764e31dbc80c2540dd UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chrY LN:59373566 M5:1e86411d73e6f00a10590f976be01623 UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:chrMT LN:16569 M5:c68f52674c9fb33aef52dcf399755519 UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa
@SQ SN:GU071091 LN:39778 M5:111ef84ed674ff287723313d143fca86 UR:file:/home/tniranj1/the_gemstone_project/reference_genome/hg19_T7.fa

Below is the header for the bam file upon which RealignerTargetCreator is supposed to work on. The header has not been modified in anyway.

@HD VN:1.4 SO:coordinate
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983
@SQ SN:chr20 LN:63025520
@SQ SN:chr21 LN:48129895
@SQ SN:chr22 LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@SQ SN:chrMT LN:16569
@SQ SN:GU071091 LN:39778
@RG ID:1 PU:none LB:shotgun SM:tumor PL:illumina
@PG ID:bowtie2 VN:2.2.5 CL:"/cm/shared/jhmi/apps/bowtie2/gcc/64/2.2.5/bowtie2-align-s --wrapper basic-0 --very-sensitive-local -p 4 -x /home/tniranj1/the_gemstone_project/reference_genome/bowtie2_index/hg19_T7 -1 /tmp/19802.inpipe1 -2 /tmp/19802.inpipe2" PN:bowtie2
@PG ID:MarkDuplicates VN:1.119(d44cdb51745f5e8075c826430a39d8a61f1dd832_1408991805) CL:picard.sam.MarkDuplicates INPUT=[/home/tniranj1/the_gemstone_project/results/tumor_normal_pair_analysis/tumor_sample/shotgun_analysis/bowtie2/bowtie2_sorted.bam] OUTPUT=/home/tniranj1/the_gemstone_project/results/tumor_normal_pair_analysis/tumor_sample/shotgun_analysis/bowtie2/bowtie2_markdups.bam METRICS_FILE=/home/tniranj1/the_gemstone_project/results/tumor_normal_pair_analysis/tumor_sample/shotgun_analysis/bowtie2/stdouterr/bowtie2_markduplicatemetrics.txt REMOVE_DUPLICATES=false ASSUME_SORTED=true PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false PN:MarkDuplicates

The chromosome lengths of chr9 are identical in the reference file, the dictionary, and in the bam file (based on the header). Why is RealignerTargetCreator saying they have different lengths?

Thanks,
Tejas

Answers

  • tniranj1tniranj1 Member
    edited August 2015

    I should also mention, that I completely restarted my pipeline, thinking it might be an alignment issue. However, reperforming alignment, sorting, markdups, etc. has not fixed the problem.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    That's bizarre. Try deleting and regenerating all index and dict files, in case something got corrupted.

  • tniranj1tniranj1 Member

    Thanks. I've just done that, and the index and dict files are identical to the ones that were generated previously.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Oh wait the error is about the known variants resource. Have you checked the dictionary of the VCF file you're using as known variants?

  • tniranj1tniranj1 Member
    edited August 2015

    It looks like it was indeed with the vcf index.

    The original index was generated by GATK, because one did not exist.

    I deleted it, and re-ran RealignerTargetCreator. It produced a new index, that was identical the one that was generated previously.
    The same error was produced:

    ERROR contig known = chr9 / 141213430
    ERROR contig reference = chr9 / 141213431.

    I then deleted the vcf index, and built a new one using igvtools index.

    I then ran RealignerTargetCreator, and it seems to work well.

    There may be an issue with vcf index building with GATK, though it most likely a problem with the dbSNP file (build 144, downloaded from ncbi).

    Thanks for your help!

Sign In or Register to comment.