Where to obtain HumanNCBI37_UCSC reference sequence?

DaveCurtisDaveCurtis London, UKMember

Hi.

I have a set of BAM files which have been aligned using this file:
samtoolsRefFile=/illumina/scratch/services/Genomes/FASTA_UCSC/HumanNCBI37_UCSC/HumanNCBI37_UCSC_XX.fa

I have a reference file called hg19_UCSC.fa and for most chromosomes HaplotypeCaller runs fine using this reference sequence. However with HaplotypeCaller for chromosomes 19, 21 and 22 I get this error message:
WARN 08:38:22,963 SequenceDictionaryUtils - Input files reads and reference have incompatible contigs: The following contigs included in the intervals to process have different indices in the sequence dictionaries for the reads vs. the reference: [chr22]. As a result, the GATK engine will not correctly process reads from these contigs. You should either fix the sequence dictionaries for your reads so that these contigs have the same indices as in the sequence dictionary for your reference, or exclude these contigs from your intervals. This error can be disabled via -U ALLOW_SEQ_DICT_INCOMPATIBILITY, however this is not recommended as the GATK engine will not behave correctly..

In fact, even if I set ALLOW_SEQ_DICT_INCOMPATIBILITY I still get the error and I don't get any calls for these chromosomes.

It seems that there is some incompatibility in the dictionaries of the BAM and reference files which I have not been able to fix.

Using google, I have seen other people refer to the HumanNCBI37_UCSC reference sequence so I assume it is a standard reference for hg19 but presumably with a slightly different dictionary from the file called hg19_UCSC.fa.

Please tell me where I can download this reference file from. I've looked everywhere (including both NCBI and UCSC) and can't find it. Thanks for your help.

Regards

  • Dave Curtis

Best Answer

Answers

  • DaveCurtisDaveCurtis London, UKMember

    Thanks. I think I've worked it out. The BAM files I have were prepared with two different references - one with the Y chromosome and one without and this threw out the indexing for the chromosomes listed after.

  • antoninoantonino King's College LondonMember

    Hi Dave Curtis, nice to write you. I had the same error using GATK ASEReadCounter. This tool requires a reference (I am using b37/hg19 from UCSC) a bam file and I vcf file. All of this file regards one chr since I am studying the X chromosome. No error I encountered but my results is an empty file without any reported read counts per allele. I am working hard to understand what is happened but I did not find any solution.

    Although I did not get error, can this results (no results I mean, empty output file) be associated to a wrong reference I am using?

  • DaveCurtisDaveCurtis London, UKMember

    Yes, I think you might just get empty output. Try to check the order of the chromosomes in your bam file and in your reference. (I'm afraid I don't remember how to do this.)

  • samljsamlj Member
    @DaveCurtis I've run into the same problem. Can you tell me in more detail how you solved it?
  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @samlj

    Would you please post the exact issue you are facing and what it is that you are try to achieve and I can try helping you out. Thank you.

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @samlj

    We are now closing this issue as we have not heard from user in more than 2 business days. Please post another question if you need more help with this issue.

Sign In or Register to comment.