Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How to call SNPs in low coverage Data using GATK?

shinkenshinken IrapuatoMember

Hi

I would like to known what would be the best way to call SNPs from low coverage data (4x) using GATK. I want to call SNPs for maize, and there is a hapmap for this specie. Is it possible to use the hapmap data in the SNP calling? or it just can be used for the variant recalibration and these increase the possibility to call SNPs in low coverage data for those SNPs also present in the hapmap?

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hapmap will be helpful for variant filtering / recalibration but does not contribute to making calls in the first place.

    Note that for such low coverage data you may benefit from calling your samples together directly as the GVCF workflow does not perform very well with very low coverage. How many samples are you working with?

  • shinkenshinken IrapuatoMember

    Thank you very much for your response Geraldine

    At the moment only one low coverage genome, I am waiting for 8 more low coverage genomes, how ever in the meanwhile I would like to obtain the best possible from the genome that I have now. Any method that you recommend? Play with the filtering steps? Use other published genomes of the same species although they are not close related to make the calling together?

    Issue · Github
    by Sheila

    Issue Number
    472
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • shinkenshinken IrapuatoMember

    Thank you very much, the advice has been very useful.

  • yzqheartyzqheart ChinaMember

    Dear Geraldine VdAuwera,

    I have a similar situation with shinken. Now I have a very low coverage sample ( 1X ) and I want to call snp with this sample. After reading the answers posted above

    "So you'll probably want to lower at least the emit confidence threshold, and probably also the call confidence threshold, to help achieve good sensitivity"

    "But in your case I would recommend running in HC's regular mode on everything you have in hand right now, even though that means you will have to run them again when you get the rest of your samples"

    I have two questions:
    (1) If I want to lower the emit confidence threshold and the call confidence threshold, can you recommend a reasonable value of the parameters ?
    (2) Does the "HC's regular mode" mean that do not set --emitRefConfidence parameter ?

    Thank you

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Do I understand correctly that you want to call variants on a single sample with mean coverage of 1x?

    To be honest I don't see how you could possibly get any usable information from that dataset. You will be completely unable to distinguish real variants from technical artifacts. Such low coverage data can only be usable if you have multiple samples.

  • yzqheartyzqheart ChinaMember

    Dear Geraldine,

    Thanks for your reply.

    Now I have 4 samples, each sample's sequencing depth is 1X, how can I get SNPs in each sample ? Does the gvcf mode of GATK work ?

    Thank you

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    No, 1x is far too low. You can try running in normal multi sample mode but I don't think the results will be any good.

  • yzqheartyzqheart ChinaMember

    Dear Geraldine,

    If I want to call SNPs with a single sample, does 5x (or 10 x) sequencing depth can be used ?

    Thank you

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @yzqheart
    Hi,

    The GATK tools are tested and validated on 30X data. However, 5X or 10X data is better than 1X data!

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    If you do use 5X or 10X data, there are some arguments in HaplotypeCaller that can help. Have a look at -minPruning and -minDanglingBranchLength.

    Setting both of those to 1 should help.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    But be aware that your power to discover variants will be very low.

Sign In or Register to comment.