We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

VariantAnnotator issue with trios!

Hi,

First I have to express my appreciation for great tools and the support we are getting from The Broad Institute on this forum.

Following BestPractices, I ran the VariantAnnotator walker on GATK 3.8 with the "PossibleDeNovo" annotation option together with .ped file for the families in the re-calibrated vcf. However, I ran across the following two interesting results:

  • One hiConfDeNovo variant was called for the proband, with a 0 value for the AD (Allelic Depths), and only 2 for the proband. Upon examining the bam files for both, the father had actually four reads in that area with the alternative variant, yet all four of them had low mapping quality; therefore it was changed from 1/1 before recalibration to 0/0 with 0 AD after calibration.
  • A variant was discovered previously via WES using different tools, but I couldn't find it as de novo in the call set. Upon going back to the recalibrated vcf, I found that the variant was present int he proband, and absent for both parents, with good coverage for the three of them. It has not been called as a de novo with VariantAnnotator, not even as a loConfDeNovo.

I believe that the recalibration process was done nicely, but the VariantAnnotator is behaving in a very weird way.
This is the command line I used:

$ java -jar /path/to/GenomeAnalysisTK.jar -T VariantAnnotator -R /path/to/ref.fasta -V recalibratedVariants.postCGP.Gfiltered.vcf -A PossibleDeNovo -ped trios.ped -o recalibratedVariants.postCGP.Gfiltered.deNovos.vcf

Thank you!

Best Answers

  • SheilaSheila Broad Institute admin
    Accepted Answer

    @alphahmed
    Hi,

    Okay. I will message you there then.

    Sheila

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @alphahmed
    Hi,

    Thanks for the compliments :smiley: I just let the team know.

    As for your actual question, can you post the before and after VCF records? Please include the mother, father and child genotypes.

    Thanks,
    Sheila

  • alphahmedalphahmed JAPANMember

    @Sheila
    I've messaged you the vcf records' description in private to preserve the confidentiality of the genomic data.
    Thanks again for your valued support.

  • SheilaSheila Broad InstituteMember, Broadie admin
    Accepted Answer

    @alphahmed
    Hi,

    Okay. I will message you there then.

    Sheila

  • alphahmedalphahmed JAPANMember

    Thank you so much Sheila!

    Your answer made it very clear that the algorithm was doing its best in predicting the de novo mutations. As you suggested, providing population-specific database for the CGP step will make the machine-learning algorithm a better classifier.

    Since I've already found two databases specific for the samples' population, can you please direct me to any resources that can guide me in building the necessary vcf database for the CGP?

    Best regrads,
    Ahmed

  • SheilaSheila Broad InstituteMember, Broadie admin

    @alphahmed
    Hi Ahmed,

    What kind of format are the files in now? You may find VariantsToVCF useful.

    -Sheila

  • alphahmedalphahmed JAPANMember

    Thanks again Sheila!

    The files are in '.tab' and '.tsv' formats. No individual genotypes provided, only frequencies.

    The VariantsToVcf looks like a very powerful tool, however, can I use it with .tsv ?

    I learned that raw HapMap is a table format, if my .tab file has different order of the columns, do I need to rearrange then, using some coding to make it identical to HapMap raw format, before I can run the VariantsToVcf ? What if not all fields are provided, would frequencies be enough? I'd like to know what parts of info are used by Recalibration and CGP?

    -Ahmed

  • alphahmedalphahmed JAPANMember

    Any news for me Sheila?

    I've looked at the different files used in the recalibration and CGP steps, and after revisiting the Doc#1259 it seems that GATK walkers only care for the positions and alternative genotypes of the SNPs. Is that right?

    If you'd confirm that, then I'll just need to use my Perl skills in completing the task.

    Thanks!
    Ahmed

Sign In or Register to comment.