The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

VQSR for multi-sample VCF

eranmickeranmick Member Posts: 1

Hi,

I've been going through the VQSR documentation/guide and haven't been able to pin down an answer to how it behaves on multi-sample VCF (generated by multi-sample calling with UG).
Should VQSR be run on this? Or on each sample separately, given that coverage and other statistics used to determine the variant confidence score aren't the same for each sample and so can lead to conflicting determinations on different samples.

What is the best way to go about this?

Many thanks.

Tagged:

Best Answer

Answers

  • mikemike Member Posts: 103

    Hi, Geraldine:

    I got almost similar questions. my VQSR step is on a multi-sample vcf file, which was generated from Unified genotype by calling variants from pooled bam files of many samples. I noticed that for the vcf file after VQSR step, the FILTER column has either "PASS". or something like " VQSRTrancheINDEL99.00to99.90" or "VQSRTrancheINDEL90.00to99.00" for some variants, but this vcf file has multiple samples, I am trying to understand how each individual sample amongst the group affect the final assignment of such "Filter". In other words, let's see, the vcf file has 50 samples, for a given variant site, if 25 of the samples doing great in quality or whatever metrics that VQSR assessed, but 25 of the other samples not doing so great on this site (e.g. quality issues of reads or alignment here), if assigned PASS to this variant, the 25 good samples would be reasonable, but for the other 25 samples seems not reasonable. If vice versa, assigned VQSRTrancheINDEL90.00to99.00 (not PASS) to this variant site, for the half good samples, it seems not fair. Imagine if we just take the 25 good samples together, and just the bad 25 samples as a group to call variants separately as 2 groups, the 25-good group would have the variant at this site (PASS) and the 25 bad sample group would be flagged as not PASS. So my question is when pooled samples together, how VQSR made decision to call the site as PASS or non-PASS? Maybe my question is out of track here, but I just try to understand how VQSR deal with such situation. Also I noticed many variant with one or more samples as "./." as genotype by Unified genotyper seem tending to be flagged as Non-PASS, is it true? In other words, the ones flagged as PASS seem not having any ./. genotype in any of the samples in vcf file.

    Thanks a lot!

    Mike

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,163 admin

    Hi Mike,

    I think what's confusing you here is that you expect the VariantRecalibrator to judge whether individual sample calls are good or bad, but that's not really the case. Its main purpose is to determine, for a given site, whether there is evidence that the site is really variant in one or more samples. If the site passes the filter, it is then up to you to evaluate whether some of the samples are not really variant at that site.

    A genotype of "./." means that the caller could not decide either way whether the sample was variant or not, and so it marked it as a "no-call". This really only happens when there is no useable data for that sample. In general a high degree of missingness is a sign that the variant isn't real (in which case the variant fails the filter, and is not marked as PASS) so the correlation you've noticed is probably real.

    Geraldine Van der Auwera, PhD

  • ying_sheng_1ying_sheng_1 Member Posts: 65

    Could I ask an following-up question, Geraldine? You said “It is then up to you to evaluate whether some of the samples are not really variant at that site”, could you give some suggestions how to evaluate, check GQ? Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,163 admin

    To be honest, evaluation is the hardest part, because there is no one-size-fits-all formula. GQ should give you a strong indication. But sometimes you might need to actually look at the pileup of bases to see what the data looks like. Hopefully, the higher your data quality is, the less you will need to go in and check things manually.

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,163 admin

    I should say also, people in the community are most welcome to share their favorite tips and tricks evaluating variants and genotype calls. This is the part where we typically hand the data over to analysts who do the actual interpretation of the call set data, so we are not in the best position to give advice. But it's an important topic and we are more than happy to help facilitate the conversation.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.