Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

trio pipeline

Dear friends
I am analyzing a trio
I have followed the pipeline described in van der Auwera et al. 2013
on each person individually up to HaplotypeCaller and VariantRecalibrator

is there a pipeline I can follow to put together the data and recognize disease variants in the affected child (de novo or inherited?)
thank you vittoria

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @vittoria
    Hi Vittoria,

    You may find the Genotype Refinement Workflow useful. You can find more information in the presentations section and Methods and Algorithms section.

    -Sheila

  • thank you sheila

    I understand that the first step is running

    java -jar GenomeAnalysisToolkit.jar -R human_g1k_v37_decoy.fasta -T CalculateGenotypePosteriors --supporting 1000G_phase3_v4_20130502.sites.vcf -ped trio.ped -V recalibratedVariants.vcf -o recalibratedVariants.postCGP.vcf

    I understand that recalibratedVariants.vcf combines the variants of the father, the mother and the child
    i.e. triorecalibratedVariants.vcf
    but
    I have produced the recalibratedVariants.vcf for father, mother and child separately

    how do I put them together
    fatherrecalibratedVariants.vcf, motherrecalibratedVariants.vcf childrecalibratedVariants.vcf -->
    triorecalibratedVariants.vcf

    moreover can I find .ped file I can modify with my data?

    thank you

    vittoria

  • Dear Sheila, thank you

    I understand that I have to run something like

    java -jar GenomeAnalysisToolkit.jar -R human_g1k_v37_decoy.fasta -T CalculateGenotypePosteriors --supporting 1000G_phase3_v4_20130502.sites.vcf -ped trio.ped -V recalibratedVariants.vcf -o recalibratedVariants.postCGP.vcf

    I assume that recalibratedVariants.vcf contains variants of the father, of the mother and of the kid
    I have 3 recalibrated vcf files one for the father, one for the mother and one for the kid
    how do I put together the files?
    second:
    I do I get an example of trio.ped? I would like to modify one

    thank you very much

    vittoria

    Issue · Github
    by shlee

    Issue Number
    2305
    State
    closed
    Last Updated
    Assignee
    Array
    Closed By
    chandrans
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited July 2017

    Hi @vittoria,

    Sheila is away at a workshop and can followup on your questions when she is back in Boston next week.

    In the meanwhile, this thread and this thread may be of interest.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @vittoria
    Hi Vittoria,

    For the trio VCF, we recommend running the GVCF workflow. However, if you must, you can simply use CombineVariants to combine the three VCFs into one VCF.

    For creating the ped file, have a look here.

    -Sheila

  • Dear friends, I generated a joint vcf file for a trio but I have problems visualizing it with igv. where can I find a sample file to see how a correct file should appear in igv? where is the information in each variant concerning the identity (son, father or mather)? I do not understand how I see whether a mutation is present in the sono but not in the parents and so on
    thank you maria vittoria

  • SheilaSheila Broad InstituteMember, Broadie admin

    @vittoria
    Hi Maria,

    I think you will find the Variant Discovery tutorial here helpful.

    -Sheila

  • artitandonartitandon Member ✭✭

    I have a trio whole exome data set, based on the VQSR documentation it seems that 3 samples is too few to run with VQSR. Is that correct? What is the best option in this?

  • SheilaSheila Broad InstituteMember, Broadie admin

    @artitandon
    Hi,

    Yes, 3 exome samples are not enough for VQSR. We recommend using at least 30 exome samples. You can try hard filtering.

    -Sheila

  • Hi,

    I am struggling to understand the best practice to analyze a trio WES as I want to do it with the new GATK4 tools. May you correct my method and suggest best options.. let s see;

    First, data pre-processing starting from fastq: bwa --> samtools sort --> picard duplicates and fixed mates. Once completed these steps I wanted to Realign and Recalibrate respectively using RealignTargetCreator and IndelRealigner with GATK with dbsnp and indel GOLD's and the last BaseRecalibrator for dbsnps, but i found

    https://gatkforums.broadinstitute.org/gatk/discussion/11455/realignertargetcreator-and-indelrealigner?

    where is said to skip those and go straight to Printreads with new tool ApplyBSQR for BQSR and call HaplotypeCaller....

    Here I am getting confused, should i go straight after picard step to HaplotypeCaller in ERC GVCF mode and then use GenomicDBImport to pass the GenotypeGVCFs for joint calling? Otherwise what would be the best practice.

    As i also read https://gatkforums.broadinstitute.org/gatk/discussion/3893/calling-variants-on-cohorts-of-samples-using-the-haplotypecaller-in-gvcf-mode but cant get to merge in my head the best way.
    thank you in advance!!

  • SheilaSheila Broad InstituteMember, Broadie admin

    @MasMarius
    Hi,

    Perhaps the Best Practices will help you.

    Here I am getting confused, should i go straight after picard step to HaplotypeCaller in ERC GVCF mode and then use GenomicDBImport to pass the GenotypeGVCFs for joint calling?

    You should run BQSR before variant calling. The slide decks in the Presentations section will help with the workflow.

    -Sheila

  • @Sheila
    Going to try it, really appreciate it!

Sign In or Register to comment.