UnifiedGenotyper reads v. reference incompatibility

We are running UnifiedGenotyper to call variants in 5.5 Mb of targeted capture sequence. Using the script:

java \
-Xmx8g \
-jar /my.dir/GenomeAnalysisTK-2.5-2-gf57256b/GenomeAnalysisTK.jar \
-T UnifiedGenotyper \
-R /my.dir/hg19/ucsc.hg19.fasta \
-I /my.dir/bam/solid5500_FC1_20120227_01_08NA35454_F3.csfasta.ma.bam -I /my.dir/bam/solid5500_FC1_20120227_02_08NA35454_F3.csfasta.ma.bam -I /my.dir/bam/solid5500_FC1_20120227_03_08NA35454_F3.csfasta.ma.bam \
-o /my.dir/test3bam.vcf \
--dbsnp /my.dir/hg19/dbsnp_137.hg19.vcf \
-glm BOTH \
-L /my.dir/039087_D_BED_20120215_mod1.bed \
-stand_call_conf 30.0 \
-stand_emit_conf 30.0 \
-dcov 400

We get the following error:

ERROR MESSAGE: Input files reads and reference have incompatible contigs: The following contigs included in the intervals to process have different indices in the sequence dictionaries for the reads vs. the reference: [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr14, chr15, chr17, chr19, chr20, chr22]. As a result, the GATK engine will not correctly process reads from these contigs. You should either fix the sequence dictionaries for your reads so that these contigs have the same indices as in the sequence dictionary for your reference, or exclude these contigs from your intervals. This error can be disabled via -U ALLOW_SEQ_DICT_INCOMPATIBILITY, however this is not recommended as the GATK engine will not behave correctly..
ERROR reads contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chr23, chr24, chr25]
ERROR reference contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, chr19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]

Are we correct in concluding that the problem is that the chr names (ie, chr25 v. chrM) and order are different between our .bams and the reference .fasta provided in GATK's hg19 bundle? If so, are we also correct in concluding that a workable solution would be to substitute the .fasta reference we used to generate the .bams (which is also based on hg19) in UnifiedGenotyper's "-R" argument, and use the workflow described here (http://gatkforums.broadinstitute.org/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference) to generate the .fai and .dict to accompany our .fasta? If we use our own reference .fasta, can we still use the GATK bundle's unaltered "dbsnp_137.hg19.vcf" for the "--dbsnp" argument, or will this need to be modified?

Any advice would be much appreciated.

Best Answer


  • daniel_adkinsdaniel_adkins Member
    edited May 2013

    Ok. Many thanks for the very prompt answer. Just to be completely clear, the error is generated by mismatched chr names and ordering between the bams and the reference fasta, correct?

Sign In or Register to comment.