@RG for snp calling on plasmodium knowlesi illumina reads

JonJon Harvard T.H.Chan School of Public HealthMember

Hi all,

I am calling snps for the first time, and I am not sure how frame my questions. I am not even sure that GATK is the best method to use. The species is Plasmodium knowlesi. The data are 101nt paired reads sequenced at the Broad. I aligned them to the reference genome using bwa-mem. I am looking over the best practices, but they seem tailored for human data. My questions are 3:

  1. Is GATK the right tool for me?
  2. My alignments don't have @RGs. I am aware of the Picard tool AddOrReplaceReadGroups tool (https://broadinstitute.github.io/picard/command-line-overview.html). It requires two inputs for which I don't know what to put. RGLB, and RGPU. Could you please let me know how manage this information?
  3. What information may I provide to you to help you better address my question?

Thank you,

Jon Goldberg

Best Answers


  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    Hi Jon,

    1) Yes, you can use GATK for non-human data. However, it may require a little extra effort on your end. Have a look at the Best Practices which will help you get started. https://www.broadinstitute.org/gatk/guide/best-practices The tools and steps will be the same, but you may need to do some extra work to get some of the required inputs (e.g. known variations sites files).

    2) This article should answer your question: http://gatkforums.broadinstitute.org/discussion/1317/collected-faqs-about-bam-files

    3) I hope this helps to get you started. Please do post other questions if you get stuck along the way.


Sign In or Register to comment.