VariantEval Structural Variation

GrantMarshallGrantMarshall Posts: 6Member

Hi,

I just finished running a fairly large number of WGS samples through HaplotypeCaller and I've been using VariantEval to look at some summary stats on these samples. I've noticed that under '#:GATKTable:VariantSummary:1000 Genomes Phase I summary of variants table' there's a section on structural variations and that apparently I'm getting about 3500 in one of my samples. Here's the actual section of the table in question:

#:GATKTable:20:3:%s:%s:%s:%s:%s:%d:%d:%d:%.2f:%s:%d:%.2f:%.1f:%d:%s:%d:%.1f:%d:%s:%d:;
#:GATKTable:VariantSummary:1000 Genomes Phase I summary of variants table
VariantSummary  CompRod  EvalRod  JexlExpression  Novelty  nSamples  nProcessedLoci  nSNPs    TiTvRatio  SNPNoveltyRate  nSNPsPerSample  TiTvRatioPerSample  SNPDPPerSample  nIndels  IndelNoveltyRate  nIndelsPerSample  IndelDPPerSample  nSVs  SVNoveltyRate  nSVsPerSample
VariantSummary  dbsnp    vcf1     none            all             1      3095693981  3446166       2.08            1.34         3446166                2.08             0.0   962028             15.33            962028               0.0  3282          73.58           3282
VariantSummary  dbsnp    vcf1     none            known           1      3095693981  3399907       2.08            0.00         3399907                2.08             0.0   814506              0.00            814506               0.0   867           0.00            867
VariantSummary  dbsnp    vcf1     none            novel           1      3095693981    46259       1.71          100.00           46259                1.71             0.0   147522            100.00            147522               0.0  2415         100.00           2415

I didn't think that HaplotypeCaller even looked for structural variations, so I tried to find these structural variations in the VCF, hoping they were encoded as described here and I couldn't find anything. Could someone tell me why VariantEval is showing a number of structural variations but the actual VCF isn't finding any? Does VariantEval just interpret a sufficiently large indel as a SV? If so, I can understand why it may call some structural variations considering there are indels longer than 1k bp in the indels of the sample.

Thanks,

Grant

Best Answer

Answers

  • GrantMarshallGrantMarshall Posts: 6Member

    Thanks, that's exactly what I needed to know!

  • modi2020modi2020 Posts: 15Member

    Hi Geraldine,

    Since anything above 50 bp is a structural variation, I wonder if there is a tool that can sort such events into various SV classes i.e inversions, translocations, etc.

    Do you know of such a tool ?

    Thank you

    @Geraldine_VdAuwera said:
    Hi Grant,

    The convention we use is that events that are 50bp or larger are called SVs.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,171Administrator, GATK Dev admin

    The only tool I'm aware of for working with SVs is GenomeSTRiP but I don't know if it has a function for classifying them as you ask. There may be others but I'm not up to date on what's going on in that space, to be honest.

    Geraldine Van der Auwera, PhD

  • modi2020modi2020 Posts: 15Member

    Great!
    Thank you so much Geraldine! :-)

Sign In or Register to comment.