Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Why GATK3.3 gives a more number (almost double) of INDELs than SNPs ?

maheshmahesh BangaloreMember
edited July 2015 in Ask the GATK team

I am working on Rice to call variants using GATK3.3. I have used BWA (bwa mem -M options) to map reads to reference genome. Followed by variant calling (HaplotypeCaller) by following Best Practices of GATK3.3. I have compared the results of 20 genotypes and in all genotypes INDELs are more (2x) as compared to SNPs. Please suggest me on this

Thanks
Mahesh HB

Post edited by mahesh on

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @mahesh
    Hi Mahesh HB,

    Can you tell me exactly how you processed your sample? Did you follow the Best Practices workflow? Can you also post the exact command you ran for Haplotype Caller?

    Thanks,
    Sheila

  • maheshmahesh BangaloreMember
    edited July 2015

    Hi Sheila
    Thanks for your response.

    I followed the Best practices. To summarise in brief, I have followed the steps like
    1. INDEL realignment
    a. Created a target list of intervals to be realigned
    b. Performed realignment of the target intervals

    1. Base quality recalibration
      a. analyzed the patterns of covariance in the sequence dataset
      b. Performed second pass to analyze covariation remaining after recalibration
      c. applied recalibration to sequence data
    2. SNP calling
      the command followed was:

      java -Xmx50g -Djava.io.tmpdir=gatk_tmp -jar /data1/mahesh/softwares/GATK3.3/GenomeAnalysisTK.jar -T HaplotypeCaller -R Nipponbare_v7.0.fa -I recalb_realigned_dedup_sorted_GP014_final.bam --dbsnp /data1/mahesh/02_snp_calling/01_japonica/new/2_3000_genomes_SNPs/3k_filtered.vcf --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o GP014_snps_indels.vcf -L target_intervals_GP014.list -nct 40

    Thanks
    Mahesh HB

  • SheilaSheila Broad InstituteMember, Broadie admin

    @mahesh
    Hi Mahesh HB,

    Haplotype Caller is designed to be very sensitive to indels, so this may not be a real issue. Do you know if Rice is not supposed to have that many indels? What happens after filtering the variants?

    -Sheila

  • maheshmahesh BangaloreMember

    @Sheila
    Thanks Sheila.
    I have filtered low quality InDels and retained only upto 30bp InDels. Interestingly, 1 base insertions are more as compared to other types of InDels. Because of this InDels count outnumbered the SNPs. I am happy to see such variations between indica v/s japonica.

    Mahesh

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Mahesh,

    This could be real variation, or it could be an artifact due to sequencing technology. We cannot tell you which it is. Be sure to evaluate the distribution of annotations in your variants and apply filtering accordingly. Look at our hard filtering recommendations as a starting point. Filtering on just QUAL values is not sufficient.

Sign In or Register to comment.