The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.4 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Stratify comp rods by sample in VariantEval?

Hi GATK Team,

I am heavy user of the VariantEval evaluators (particularly GenotypeConcordance) and tracked some unexpected results to the Sample stratification. What is the motivation for not stratifying the comp RODs by the sample? This seems to be a very conscious choice so I was hoping to understand the background of that choice

The relevant section of is:

for ( final RodBinding<VariantContext> compRod : comps ) {
                            // no sample stratification for comps
                            final HashMap<String, Collection<VariantContext>> compSetHash = compVCs.get(compRod);
                            final Collection<VariantContext> compSet = (compSetHash == null || compSetHash.size() == 0) ? Collections.<VariantContext>emptyList() : compVCs.get(compRod).values().iterator().next();

The effect for me is that many spurious genotypes get included in cases where is a comp variant, but not eval variant.




  • ebanksebanks Broad InstituteMember, Broadie, Dev

    Just so we can understand and improve the code, could you please give us a concrete example of how this hurts your analysis and why you get spurious sites included? Keep in mind that GenotypeConcordance is no longer an evaluator module - it was pulled out and released as a standalone tool a while back - so please use an example with a different evaluator. Thanks!

  • Hi Eric,

    I am using a slightly customized copy of GenotypeConcordance that explicitly tracks genotypes from non-PASSing variants. I recognize that Evaluator has been deprecated (as an aside, what motivated the transition to a stand-alone walker?), and I understand if this falls outside "Ask the Team". But thought it was worth an ask to better understand the core VariantEval walker.

    I have been running the evaluator against many multi-sample "comp" and "eval" RODs that are all technical replicates (doing an all-pairs analysis), typically stratifying by some combination of filter, type and novelty. Just recently I needed to stratify by sample as well. I look at all sites in the comp and eval RODs, and because the VariantContext extracted from the comp ROD is not cut apart by sample (when using sample stratification) it retains all its genotypes inflating the counts of comp genotypes not called in the eval ROD.

    An example with two samples:

    eval ROD:
    variant1 0/0 0/1

    comp ROD:
    variant2 0/0 0/1

    in this case n_comp_HOM_REF_called_NO_CALL and n_comp_HET_called_NO_CALL would both be 1 for sample1 and sample2, as opposed to 1 and 0 for sample1 and 0 and 1 for sample2.

    Does that example make sense? Basically when there is no eval VariantContext, and the sample stratification is used there are excess genotypes in the comp VariantContext.


  • ebanksebanks Broad InstituteMember, Broadie, Dev

    Hey Michael,

    The motivation for moving the GenotypeConcordance evaluator into its own standalone tool was precisely the one you bring up in this thread: it had adverse interactions with some of the stratifications in certain contexts/situations. So instead of managing that headache (poorly), we decided to be safe and move it out.

    I'm 99% sure that Chris (the author of the new and improved GenotypeConcordance tool) took cases like yours into account when implementing it (and a cursory review of the code seems to confirm it). Could you try rerunning your analysis with it instead of VariantEval? And please continue to give such helpful feedback.

Sign In or Register to comment.