Where to obtain HumanNCBI37_UCSC reference sequence?
I have a set of BAM files which have been aligned using this file:
I have a reference file called hg19_UCSC.fa and for most chromosomes HaplotypeCaller runs fine using this reference sequence. However with HaplotypeCaller for chromosomes 19, 21 and 22 I get this error message:
WARN 08:38:22,963 SequenceDictionaryUtils - Input files reads and reference have incompatible contigs: The following contigs included in the intervals to process have different indices in the sequence dictionaries for the reads vs. the reference: [chr22]. As a result, the GATK engine will not correctly process reads from these contigs. You should either fix the sequence dictionaries for your reads so that these contigs have the same indices as in the sequence dictionary for your reference, or exclude these contigs from your intervals. This error can be disabled via -U ALLOW_SEQ_DICT_INCOMPATIBILITY, however this is not recommended as the GATK engine will not behave correctly..
In fact, even if I set ALLOW_SEQ_DICT_INCOMPATIBILITY I still get the error and I don't get any calls for these chromosomes.
It seems that there is some incompatibility in the dictionaries of the BAM and reference files which I have not been able to fix.
Using google, I have seen other people refer to the HumanNCBI37_UCSC reference sequence so I assume it is a standard reference for hg19 but presumably with a slightly different dictionary from the file called hg19_UCSC.fa.
Please tell me where I can download this reference file from. I've looked everywhere (including both NCBI and UCSC) and can't find it. Thanks for your help.
- Dave Curtis