Questions about using 1000 Genome bam files in HaplotypeCaller.

Dear GATK team,

I'm using GATK HaplotypeCaller in my analysis. Since there is only 10 samples in the project, I download bam files from 1000 Genome as indicated in GATK Best Practice. But the bams from 1000 Genome were aligned to NCBI37 while my bams were aligned to UCSC hg19 in GATK resource bundle. Both the contig name and length differed between those two reference. I tried reheader in samtools which didnot solve the problem. How could I handle that?


Yu Liu

Best Answer


  • liuliu Member

    Thank you Geraldine. I have another question about joint calling. When doing this, is it matter of which cohort of public available bams used? If it does matter, what's your recommendation?


    Yu Liu

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    It does matter, in the sense that the more divergent they are from your samples, the more likely it is that rare variants in your samples will be discarded. Joint calling is great for rescuing low-confidence variants that are present in many samples in a population, but the downside is that low-frequency variants risk being discarded. So you should choose a cohort that is as similar to yours as possible.

  • liuliu Member

    Ok, I think I know what to do with these data now. Thanks a lot for your help!

    Yu Liu

Sign In or Register to comment.