The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

VQSR for multi-sample VCF

Hi,

I've been going through the VQSR documentation/guide and haven't been able to pin down an answer to how it behaves on multi-sample VCF (generated by multi-sample calling with UG).
Should VQSR be run on this? Or on each sample separately, given that coverage and other statistics used to determine the variant confidence score aren't the same for each sample and so can lead to conflicting determinations on different samples.

What is the best way to go about this?

Many thanks.

Tagged:

Best Answer

Answers

  • mikemike Member

    Hi, Geraldine:

    I got almost similar questions. my VQSR step is on a multi-sample vcf file, which was generated from Unified genotype by calling variants from pooled bam files of many samples. I noticed that for the vcf file after VQSR step, the FILTER column has either "PASS". or something like " VQSRTrancheINDEL99.00to99.90" or "VQSRTrancheINDEL90.00to99.00" for some variants, but this vcf file has multiple samples, I am trying to understand how each individual sample amongst the group affect the final assignment of such "Filter". In other words, let's see, the vcf file has 50 samples, for a given variant site, if 25 of the samples doing great in quality or whatever metrics that VQSR assessed, but 25 of the other samples not doing so great on this site (e.g. quality issues of reads or alignment here), if assigned PASS to this variant, the 25 good samples would be reasonable, but for the other 25 samples seems not reasonable. If vice versa, assigned VQSRTrancheINDEL90.00to99.00 (not PASS) to this variant site, for the half good samples, it seems not fair. Imagine if we just take the 25 good samples together, and just the bad 25 samples as a group to call variants separately as 2 groups, the 25-good group would have the variant at this site (PASS) and the 25 bad sample group would be flagged as not PASS. So my question is when pooled samples together, how VQSR made decision to call the site as PASS or non-PASS? Maybe my question is out of track here, but I just try to understand how VQSR deal with such situation. Also I noticed many variant with one or more samples as "./." as genotype by Unified genotyper seem tending to be flagged as Non-PASS, is it true? In other words, the ones flagged as PASS seem not having any ./. genotype in any of the samples in vcf file.

    Thanks a lot!

    Mike

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Mike,

    I think what's confusing you here is that you expect the VariantRecalibrator to judge whether individual sample calls are good or bad, but that's not really the case. Its main purpose is to determine, for a given site, whether there is evidence that the site is really variant in one or more samples. If the site passes the filter, it is then up to you to evaluate whether some of the samples are not really variant at that site.

    A genotype of "./." means that the caller could not decide either way whether the sample was variant or not, and so it marked it as a "no-call". This really only happens when there is no useable data for that sample. In general a high degree of missingness is a sign that the variant isn't real (in which case the variant fails the filter, and is not marked as PASS) so the correlation you've noticed is probably real.

  • Could I ask an following-up question, Geraldine? You said “It is then up to you to evaluate whether some of the samples are not really variant at that site”, could you give some suggestions how to evaluate, check GQ? Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    To be honest, evaluation is the hardest part, because there is no one-size-fits-all formula. GQ should give you a strong indication. But sometimes you might need to actually look at the pileup of bases to see what the data looks like. Hopefully, the higher your data quality is, the less you will need to go in and check things manually.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    I should say also, people in the community are most welcome to share their favorite tips and tricks evaluating variants and genotype calls. This is the part where we typically hand the data over to analysts who do the actual interpretation of the call set data, so we are not in the best position to give advice. But it's an important topic and we are more than happy to help facilitate the conversation.

Sign In or Register to comment.