Genomics Basics

As a general question, if you are aware of any reference or book for “Genomics for dummies” that explains a bit more about the algorithms used in the GATK pipeline that would be appreciated. I have already watched a lot of videos from the youtube channel: "Variant analysis with GATK course (2017)" but still cannot understand how each algorithm works. Thanks very much!


  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @ZLak,

    The Methods and Algorithms articles of the GATK website offers some insight into how the algorithms work. You might change the version setting to access the older documents as the internal algorithms haven't changed for existing tools.

  • ZLakZLak Member

    Hi @shlee,

    Thanks for your response. That is a great collection of information. Are you aware of any references that talks about Sort and MarkDuplicate stages in simple terms? I have already gone through This Article but I am going to need a more basic explanation of how both pipeline stages work. Also, knowing that both those steps are going to be IO-heavy, are you aware of any investigations on how to combine/merge the two steps into one?


  • SheilaSheila Broad InstituteMember, Broadie admin


    Unfortunately, we don't have anything more basic. Perhaps googling for some genomics books or terms will help. There are some great videos on youtube that talk about sequencing.


