VCF files for Indels and SNPs

Hi,
I am following the Best Practices for DNAseq analysis and have 2 quick questions:
1. I wanted to confirm if the VCF files produced after the VariantRecalibrator and ApplyRecalibration steps for SNP and for Indels are completely independent of each other. In other words, the VCF file produced after these two steps for SNPs (mode SNP) is just for SNPs and for Indels (mode INDEL) is just for Indels.
2. What is the source of the ALT alleles in the VCF file - is it the various annotation files that were used during the analysis?
Thanks,

  • Pankaj

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @pagarwal14‌

    Hi Pankaj,

    1) When you run VariantRecalibrator in SNP mode then run ApplyRecalibration, you will end up with a vcf file with recalibrated SNPs and unrecalibrated indels. You can then run VariantRecalibrator in indel mode and run ApplyRecalibration on the vcf file with recalibrated SNPs and unrecalibrated indels to end up with a vcf file with both recalibrated SNPs and recalibrated indels.

    The output vcf file will have both SNPs and indels, but one may not be recalibrated, depending on which stage you are on. I hope this clarifies things.

    2) I am not sure what you mean by "source of the ALT alleles". Please clarify further. Thanks.

    -Sheila

  • pagarwal14pagarwal14 Durham, NCMember

    Hi Sheila,
    Quick followup on these two:
    1) Just wanted to verify if I can do the following: run VariantRecalibrator in SNP mode then run ApplyRecalibration based on the raw VCF generated by the Variant Calling (Unified Genotyper of Haplotype Caller) to get a vcf file with recalibrated SNPs and unrecalibrated indels. But then, instead of this the vcf file with recalibrated SNPs and unrecalibrated indels generated in the SNP mode, use the raw vcf file from the variant calling step (UG or HC) and run VariantRecalibrator in Indel mode then run ApplyRecalibration to get vcf file with unrecalibrated SNPs but recalibrated Indels. I would like to do this just to keep the recalibrated SNPs and recalibrated Indels in separate files since I am trying to compare the results to what I already have from Atlas2-SNP variant caller which gave me SNPs and Indels in separate vcf files.
    1.a) New related question: Is there a way to separate out the recalibrated SNPs and Indels if I follow the path you mentioned.

    2) I think I know the answer to this question, but just wanted to verify if the ALT alleles are the list of alleles gathered from the various samples that have alleles different from the reference allele. For the case of SNPs there should always be just one alternate allele, right?

    Thanks,

    • Pankaj
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @pagarwal14‌

    Hi Pankaj,

    Sure, you can keep the recalibrated SNPs and recalibrated indels in separate files just like you suggested.

    To get only the recalibrated SNPs and only the recalibrated indels, you can use SelectVariants. Please read about it here: https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php

    You are correct that the ALT alleles are the list of alleles from the samples that do not match the reference.
    However, you are wrong that there can only be 1 alternate allele for the case of a SNP.
    Example:
    You are heterozygous A/T at a site, but the reference is C.
    You will see a C in the REF column and both A and T in the ALT column. The genotype for you should show 1/2 meaning heterozygous A/T.

    I hope this clarifies things.

    -Sheila

  • pagarwal14pagarwal14 Durham, NCMember

    Thanks for the clarifications. Just one more quick follow up. My understanding is probably not correct on this, but I thought if reference is C, then a genotype of A/T would be considered homozygous (since A and T pair, it does not matter whether it is an A or a T). Since there are just 3 possible genotypes, AA, AG, and GG, would A/T still be considered heterozygous?
    Thanks,

    • Pankaj
Sign In or Register to comment.