Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Missing Genotypes and QC

Maybe this is an obvious question, but if standard practice is to filter individuals with certain percentage of missing genotypes then would that be somewhat a conundrum since their genotype data was used for joint-genotyping? Would that mean you would have to go back and re-jointgenotype without the individual with excess missing genotypes?

I assume this is really just an issue at the threshold level (ie a few snps to push it over the threshold).

Tagged:

Answers

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @nchuang I thought about this myself. I would just remove the samples and subsequently remove sites for which there are no non-reference alleles. Your discarded samples might have contributed FPs (assuming they are of low quality), but this will probably not have skewed/upset the VQSR model too much (assuming you ran it). If you are still to run VQSR and you are removing a large fraction of samples, then you probably need to re-genotype, because this will affect your annotations. If you choose to remove the samples and move on without re-calling, then this can be accomplished with either bcftools view -x or bcftools view -c 1, which is documented here. If your samples are of good quality and simply have a lot of missing calls, then I would definitely go for this option, because they will only have contributed positively to the joint calling. Best of luck and please share your solution (and more details about your data in terms of coverage, sample count, sequencing platform, ethnicity, etc.), if you have time. Thanks.

  • nchuangnchuang Member
    edited August 2015

    @tommycarstensen said:
    nchuang I thought about this myself. I would just remove the samples and subsequently remove sites for which there are no non-reference alleles. Your discarded samples might have contributed FPs (assuming they are of low quality), but this will probably not have skewed/upset the VQSR model too much (assuming you ran it). If you are still to run VQSR and you are removing a large fraction of samples, then you probably need to re-genotype, because this will affect your annotations. If you choose to remove the samples and move on without re-calling, then this can be accomplished with either bcftools view -x or bcftools view -c 1, which is documented here. If your samples are of good quality and simply have a lot of missing calls, then I would definitely go for this option, because they will only have contributed positively to the joint calling. Best of luck and please share your solution (and more details about your data in terms of coverage, sample count, sequencing platform, ethnicity, etc.), if you have time. Thanks.

    I am working with ~6x data of about 100 samples sequenced with Illumina. I am doing a case/control analysis but I only have 26 cases, if I remove the ones with 5% missing genotypes then I would lose a good amount of cases so we are going to hold on to them for now. However, I do agree I would rather proceed with removal of samples instead of regenotyping. It is already a nightmare juggling reannotations with ANNOVAR for each new vcf I make, and now I am using PLINK for association analyses I cannot fathom how I am going ot annotate those in ANNOVAR. That is a really neat trick with bcftools. I have been using GATK SelectVariants and CombineVariants to achieve similar results.

Sign In or Register to comment.