Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

Combining variants from different WES capture types

jpfloridojpflorido SevilleMember

Hi there!
I've googled on GATK forum with no success for the following topic. I have a set of wes (around 110 samples in total) all of them from an specific population. The aim of the project is to study population genetic variation. All samples have been processed with GATK 4.1.2. The issue is that I have two subsets of samples, each generated with a different capture technology.

Not sure how to proceed to study variants for the whole set since it is desired to reduce the batch effect as much as possible. I've run the following: gVCF files were generated for each sample and then a joint analysis has been applied using all gVCF files (GenomicsDBImport and genotypeGVCFs). Not sure if this approach is the best one (it is the same as assuming a single capture technology). For GenomicsDBImport, the intervals used were all chromosomes although another try would be to build de database using a specific set of regions given just by the intersection of the two capture BEDs.

Another approach would be to perform joint variant calling separately for each subset and then combine results somehow (not sure how) using again the intersection of capture BEDs, but may be this might introduce a worse batch effect.

Any suggestions?
Thanks,
Javier

Answers

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hi @jpflorido ,

    The GATK support team is focusing on resolving questions about GATK tool-specific errors and abnormal results from GATK tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.

    Please continue to post your questions because we will be mining them for improvements to documentation, resources, and the tools.

    We cannot guarantee a reply, however we ask other community members to help out if you know the answer.

    For context, see this announcement and check out our support policy.

Sign In or Register to comment.