HaplotypeCaller - treatment of scaffolds

Hi Team,

1 BAM = 1 individual

my question is regarding the HaplotypeCaller and scaffolds in a BAM file.
When I want to do the individual SNP-calling procedure (--emitRefConfidence GVCF) before the Joint Genotyping,
I found that with my number of scaffolds the process is computationally quite costy.
I now ran for every BAM the HaplotypeCaller just for a single scafflod (by using -L)

Question is: Do you see any downside in this approach regarding the result quality?
Or are the scaffolds treated independently anyways and my approach is fine?

The next step would be to combine the gvcfs to a single one again (corresponding to the original BAM)
and then do joint genotyping on a cohort of gvcfs (-> cohort of individuals)

Thanks a lot!
Alexander

Best Answer

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Accepted Answer

    Hi Alexander,

    Do you mean that your genome reference is an unfinished set of scaffolds? If so, the scaffolds are indeed processed independently, so parallelizing over the scaffolds is fine. You can do joint genotyping per scaffold as well. Otherwise you will still pay a performance penalty because GATK is not designed to handle large numbers of contigs/scaffolds and does not process such genomes efficiently. You can combine the final per-scaffold vcfs after joint genotyping, before filtering if you're running VQSR.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Accepted Answer

    Hi Alexander,

    Do you mean that your genome reference is an unfinished set of scaffolds? If so, the scaffolds are indeed processed independently, so parallelizing over the scaffolds is fine. You can do joint genotyping per scaffold as well. Otherwise you will still pay a performance penalty because GATK is not designed to handle large numbers of contigs/scaffolds and does not process such genomes efficiently. You can combine the final per-scaffold vcfs after joint genotyping, before filtering if you're running VQSR.

  • AlexanderVAlexanderV BerlinMember

    Awesome!
    Thank you for your answer.

    And yes, my genome in question is a unfinished set of scaffolds.

Sign In or Register to comment.