Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

SNP distribution across chromosome

Hi,
I wonder to ask -is there any tool in the GATK to calculate SNPs/Indels distribution in each chromosome based on 100kb or 1 MB window size? Thanks.

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @shis,

    What do you mean by distribution?

    Take a look at the Picard metrics page to see if any of the metrics would be helpful towards your calculations. Also, this thread highlights the importance of also factoring for coverage.

  • shisshis USAMember

    Hi Shlee, Thanks for the reply.
    "SNP distribution" - I meant how many SNPs present in 100 kb or 1 Mb region of a chromosome (e.g., rice chromosome 1). Actually, I want to analyse the number of SNPs present based on 100Kb or 1 Mb window size.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @shis
    Hi,

    I am not aware of any GATK tools that do that. You will need to find some other tools to do what you want.

    -Sheila

  • shisshis USAMember

    I find the solution to calculate SNP distribution in 1 MB region of a chromosome using vcftools --SNPdensity option. I used the following command to calculate SNP density in 1 MB window size of a chromosome:
    vcftools --vcf SNP.vcf --SNPdensity 1000000 --out SNP_snpdensity

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @shis
    Hi,

    Thanks for sharing!

    -Sheila

  • BegaliBegali GermanyMember

    @shlee
    @Sheila

    I would like to receive your hints for plotting how can I do it which will help me to determine threshods that with lower set score so remove them by hard filtering .. however my question can I obtain this plot distribution by GATK tools or I need to do it by R. program I have limited experiences with programmer languages .. my Q is any method with GATK which I can run it to obtain result such as here in this link https://gatkforums.broadinstitute.org/gatk/discussion/6925/understanding-and-adapting-the-generic-hard-filtering-recommendations .... also can you kindly provide me if also there what statistical analysis after filtering step will be useful for convince my results at the end ... I am new for seq analysis (my project for RADseq for plant) , Bioinformatics tools however I am trying my best after discuss with people like you kindly accept to help people like me ....thanks in advance

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Begali
    Hi,

    Have a look at the presentations section where we have some hands on tutorials for hard filtering. Those have R commands for plotting that should help get you started.

    -Sheila

Sign In or Register to comment.