Ti/Tv Variant Evaluator results from VariantEval

avidLearneravidLearner Posts: 9Member

Hi,

I have processed 10 whole-exome samples using the GATK best practices workflow (GATK v2.4-3-g2a7af43). I am currently evaluating my variant call set (generated from HaplotypeCaller) with OMNI 2.5 SNP array (comparison set) and dbSNP 137.

I have included 2 rows from the Ti/Tv Variant Evaluator table:

CompRod  EvalRod  Novelty  Sample     nTi    nTv     tiTvRatio  nTiInComp  nTvInComp  TiTvRatioStandard
 OMNI     MyCalls   all     all       79945   30322     2.64      993588    274219      3.62 
 dbsnp    MyCalls   all     all       79945   30322     2.64      30214009  15253850    1.98

According to literature survey, the Ti/Tv ratio should be approximately 2.1 for whole genome sequencing and 2.8 for whole exome sequencing. Since I am getting Ti/Tv of 2.64 for exome, does this indicate false positives in the data? Also, what could be the rationale for getting such high TiTvRatioStandard for the OMNI whole genome data?

Thanks!

Tagged:

Best Answers

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,672Administrator, GATK Developer admin

    Hi there,

    Here you're posting the "all" lines, but for this evaluation you should be looking at the "known" lines. Can you post those to show the difference?

    Geraldine Van der Auwera, PhD

  • avidLearneravidLearner Posts: 9Member

    I have posted the "known" lines below for all the 10 samples.

    CompRod EvalRod Novelty   Sample   nTi      nTv     tiTvRatio   nTiInComp nTvInComp TiTvRatioStandard
    OMNI     MyCalls  known    sample1  34234   12999       2.63      35305       8370     4.22
    OMNI     MyCalls  known    sample2  34462   13111       2.63      35307       8372     4.22
    OMNI     MyCalls  known    sample3  34497   13015       2.65      35295       8367     4.22
    OMNI     MyCalls  known    sample4  34655   13206       2.62      35299       8377     4.21
    OMNI     MyCalls  known    sample5  34811   13369       2.60      35310       8368     4.22
    OMNI     MyCalls  known    sample6  34315   13186       2.60      35304       8368     4.22
    OMNI     MyCalls  known    sample7  35558   13645       2.61      35303       8361     4.22
    OMNI     MyCalls  known    sample8  35497   13708       2.59      35299       8368     4.22
    OMNI     MyCalls  known    sample9  35408   13702       2.58      35304       8363     4.22
    OMNI     MyCalls  known    sample10 35440   13678       2.59      35291       8360     4.22
    OMNI     MyCalls  known    all      77489   29137       2.66      35161       8313     4.23
    dbsnp    MyCalls  known   sample1   34234   12999       2.63      77239      29551     2.61
    dbsnp    MyCalls  known   sample2   34462   13111       2.63      77277      29539     2.62
    dbsnp    MyCalls  known   sample3   34497   13015       2.65      77248      29547     2.61
    dbsnp    MyCalls  known   sample4   34655   13206       2.62      77251      29545     2.61
    dbsnp    MyCalls  known   sample5   34811   13369       2.6       77261      29536     2.62
    dbsnp    MyCalls  known   sample6   34315   13186       2.6       77296      29565     2.61
    dbsnp    MyCalls  known   sample7   35558   13645       2.61      77254      29545     2.61
    dbsnp    MyCalls  known   sample8   35497   13708       2.59      77277      29532     2.62
    dbsnp    MyCalls  known   sample9   35408   13702       2.58      77256      29540     2.62
    dbsnp    MyCalls  known   sample10  35440   13678       2.59      77261      29520     2.62
    dbsnp    MyCalls  known   all       77489   29137       2.66      76687      29237     2.62
  • KurtKurt Posts: 166Member ✭✭✭

    When you make your calls do you restrict your calls to intervals in a bed file? If so, what is the total amount of unique non-overlapping genomic space contained in your bed file? Is around 30 MB or is more like 50-70 MB?

  • avidLearneravidLearner Posts: 9Member
    edited April 2013

    Yes I restricted my calls to intervals in a bed file. I ran HaplotypeCaller without interval padding but I modified the bed file such that it is padded by 50 bp with the overlaps merged. The total length of the intervals in my modified bed file is approximately 65MB. Are you suggesting that the non-exonic regions are contributing to the lower Ti/Tv ratio for my samples?

    Post edited by avidLearner on
  • avidLearneravidLearner Posts: 9Member
    edited April 2013

    Thanks @Kurt. Any ideas on why the ratios are so high for OMNI?

    Post edited by avidLearner on
Sign In or Register to comment.