The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

VariantEval Structural Variation

GrantMarshallGrantMarshall Member Posts: 6

Hi,

I just finished running a fairly large number of WGS samples through HaplotypeCaller and I've been using VariantEval to look at some summary stats on these samples. I've noticed that under '#:GATKTable:VariantSummary:1000 Genomes Phase I summary of variants table' there's a section on structural variations and that apparently I'm getting about 3500 in one of my samples. Here's the actual section of the table in question:

#:GATKTable:20:3:%s:%s:%s:%s:%s:%d:%d:%d:%.2f:%s:%d:%.2f:%.1f:%d:%s:%d:%.1f:%d:%s:%d:;
#:GATKTable:VariantSummary:1000 Genomes Phase I summary of variants table
VariantSummary  CompRod  EvalRod  JexlExpression  Novelty  nSamples  nProcessedLoci  nSNPs    TiTvRatio  SNPNoveltyRate  nSNPsPerSample  TiTvRatioPerSample  SNPDPPerSample  nIndels  IndelNoveltyRate  nIndelsPerSample  IndelDPPerSample  nSVs  SVNoveltyRate  nSVsPerSample
VariantSummary  dbsnp    vcf1     none            all             1      3095693981  3446166       2.08            1.34         3446166                2.08             0.0   962028             15.33            962028               0.0  3282          73.58           3282
VariantSummary  dbsnp    vcf1     none            known           1      3095693981  3399907       2.08            0.00         3399907                2.08             0.0   814506              0.00            814506               0.0   867           0.00            867
VariantSummary  dbsnp    vcf1     none            novel           1      3095693981    46259       1.71          100.00           46259                1.71             0.0   147522            100.00            147522               0.0  2415         100.00           2415

I didn't think that HaplotypeCaller even looked for structural variations, so I tried to find these structural variations in the VCF, hoping they were encoded as described here and I couldn't find anything. Could someone tell me why VariantEval is showing a number of structural variations but the actual VCF isn't finding any? Does VariantEval just interpret a sufficiently large indel as a SV? If so, I can understand why it may call some structural variations considering there are indels longer than 1k bp in the indels of the sample.

Thanks,

Grant

Best Answer

Answers

Sign In or Register to comment.