Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

SelectVariants exclude non-variant sites after GenotypeGVCF

Hello, GATK

I merged 40 individual gvcf files with CombineGVCFs and set the Genotype with GenotypeGVCF.
After that, I want to filter out those sites which have no SNPs or INDEL in all the samples.
Is it appropriate to filter these sites out with SelectVariants?
Will it keep the sites which is not only lack of data in some of the sample but the homozygote in some other samples ?
What does the SelectVariants --exclude-non-variants depend on ? Does it depend on the sites which ALT exist ?

Tagged:

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    Accepted Answer

    @tytolin
    Hi,

    You should be able to use SelectVariants with --max-nocall-number.

    -Sheila

  • tytolintytolin Member

    @Sheila Thanks,

    So it means that GenotypeGVCFs will remove all the non-variant site in the output of CombineGVCFs ?
    Because I notice that the file size shrinks dramatically after doing GenotypeGVCFs.

    I have 4 groups of birds, each group contains 10 birds. So, I merge each group separately with CombineGVCFs after calling gvcf for every birds.
    Can I merge 2 groups of my samples with CombineGVCFs ? or I need to merge 20 samples at a time for my analysis?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited August 2018

    @tytolin
    Hi,

    So it means that GenotypeGVCFs will remove all the non-variant site in the output of CombineGVCFs ?

    Yes. If you need the non-variant sites, you will need to use GATK3, as that functionality is not yet in GATK4. But, there are plans to port it over very soon.

    I have 4 groups of birds, each group contains 10 birds. So, I merge each group separately with CombineGVCFs after calling gvcf for every birds. Can I merge 2 groups of my samples with CombineGVCFs ? or I need to merge 20 samples at a time for my analysis?

    I am a bit confused. Can you tell me more about your end goal? Do you have BAM files for every single bird, or do you have one BAM file for each group?

    Thanks,
    Sheila

Sign In or Register to comment.