Layman's instructions to run HaplotypeCaller?


I am a complete outsider tasked with develping an application compatible with GATK's HaplotypeCaller. I've been diving into the docs, but my knowledge of Genomics is quite limited, so I'm a bit lost.

In particular, I'm having trouble finding a set of input data I can use to just run HaplotypeCaller and get started. I've looked into the "genomics-public-data" and "gatk-test-data" Google Cloud buckets, trying to find a correct combination of Reference and Input samples, but so far without success.

I've tried two approaches: first I downloaded hg38's fasta,fai and dict files for Homo Sapiens and assumed it would be relatively easy to find compatible readings in .bam format, but without understanding the acronyms it didn't go too far. Then I found on this forum a reference to some sample .bam readings under gatk-test-data/wgs_bam, but the ones I've tried report "incompatible contigs", at least with the hg38 data I have.

So obvisously I am doing something very wrong, but I'm too clueless to know what. Any help at all would be much appreciated.

I basically need to be able to pipe the HaplotypeCaller's results to another app, so I don't need meaningful data to get started, I can do fine with mocked or sample data, whatever is publicly available.

P.S.: I looked on the docs for some kind of quick start or tutorial, but none seem to point to actual input files one can use, I'm happy to help adding that to the docs when I learn it, if I can.


  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭


    If you don't need "meaningful data" or data that is complete, you can use the data from our hands on tutorials. You can find them in the Presentations section. The variant discovery tutorial has some data from chromosome 20, and the worksheet will help you run the HaplotypeCaller steps :smiley:


