If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

How to identify Denovo Mutations in the child compared with parents?

deepuedeepue HelsinkiMember
edited May 2015 in Ask the GATK team


I am new to NGS analysis and have been following this pipeline recommended in many of the posts in the online forums. I have 3 samples(1 child, 2 parents) and completed analysis till generation of VCF files using HaplotypeCaller. I would like to find de novo mutations in the child, Is it a good idea to proceed for de novo mutations identification after annotation or before annotation? Please advise me on how to proceed with this ?

Note: This question has been asked in another forum but I didn't get suggestions on the GATK functions.



Best Answer


  • deepuedeepue HelsinkiMember

    Hi @Sheila

    Thank you for your quick suggestion.

    I have few questions at this point, after going through the pipeline you have mentioned.
    1. It was mentioned that this workflow will work fine for cohort analysis. Can we use it for a single family - trio data ?
    2. I was wondering how to create a .ped file ? I have gone through the posts and understood its format. As only first 6 columns are required, can we create the file just by typing manually ? Can you please attach a sample ped file for one family ?
    3. Run time error while using the GenotypeGVCFs

    The list of input alleles must contain as an allele but that is not the case at position 13273; please use the Haplotype Caller with gVCF output to generate appropriate records

    I have used the parameters in the function HaplotypeCaller mentioned in the another posts in the same forum.
    --emitRefConfidence GVCF \
    --variant_index_type LINEAR \
    --variant_index_parameter 128000 \


  • SheilaSheila Broad InstituteMember, Broadie admin


    1) Yes, you can use the genotype refinement workflow on a trio.
    2) This should help with formatting/examples: You can create your own ped file manually or by using vcftools.
    3) Can you post the exact command you are running with GenotypeGVCFs? Also, please post an example record at that position for one of the samples.


  • deepuedeepue HelsinkiMember



    In the vcftools option, it only has one input file, can you please help me to pass trio data and relation between them to create a ped file ?

    This is the command i have run:

    java -Xmx2g -jar $GTKPATH/GenomeAnalysisTK.jar \
    -T GenotypeGVCFs \
    -R $REFPATH/hg19.fa \
    --variant $TRIPATH/child.output.raw.snps.indels.vcf \
    --variant $TRIPATH/dad.output.raw.snps.indels.vcf \
    --variant $TRIPATH/mom.output.raw.snps.indels.vcf \
    -o $TRIPATH/tri.genotyped.vcf

    Sorry, I couldn't get the information from that position. I am new to unix environment, can you please suggest me how to get that in an easier way ?


  • SheilaSheila Broad InstituteMember, Broadie admin


    1000 Genomes project has an online vcf to ped converter. I think this may be easier to use.


Sign In or Register to comment.