Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

Ti/Tv Variant Evaluator results from VariantEval

avidLearneravidLearner Posts: 9Member

Hi,

I have processed 10 whole-exome samples using the GATK best practices workflow (GATK v2.4-3-g2a7af43). I am currently evaluating my variant call set (generated from HaplotypeCaller) with OMNI 2.5 SNP array (comparison set) and dbSNP 137.

I have included 2 rows from the Ti/Tv Variant Evaluator table:

CompRod EvalRod Novelty Sample nTi nTv tiTvRatio nTiInComp nTvInComp TiTvRatioStandard
OMNI MyCalls all all 79945 30322 2.64 993588 274219 3.62
dbsnp MyCalls all all 79945 30322 2.64 30214009 15253850 1.98

According to literature survey, the Ti/Tv ratio should be approximately 2.1 for whole genome sequencing and 2.8 for whole exome sequencing. Since I am getting Ti/Tv of 2.64 for exome, does this indicate false positives in the data? Also, what could be the rationale for getting such high TiTvRatioStandard for the OMNI whole genome data?

Thanks!

Tagged:

Best Answers

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,815Administrator, GATK Developer admin

    Hi there,

    Here you're posting the "all" lines, but for this evaluation you should be looking at the "known" lines. Can you post those to show the difference?

    Geraldine Van der Auwera, PhD

  • avidLearneravidLearner Posts: 9Member

    I have posted the "known" lines below for all the 10 samples.

    CompRod EvalRod Novelty Sample nTi nTv tiTvRatio nTiInComp nTvInComp TiTvRatioStandard
    OMNI MyCalls known sample1 34234 12999 2.63 35305 8370 4.22
    OMNI MyCalls known sample2 34462 13111 2.63 35307 8372 4.22
    OMNI MyCalls known sample3 34497 13015 2.65 35295 8367 4.22
    OMNI MyCalls known sample4 34655 13206 2.62 35299 8377 4.21
    OMNI MyCalls known sample5 34811 13369 2.60 35310 8368 4.22
    OMNI MyCalls known sample6 34315 13186 2.60 35304 8368 4.22
    OMNI MyCalls known sample7 35558 13645 2.61 35303 8361 4.22
    OMNI MyCalls known sample8 35497 13708 2.59 35299 8368 4.22
    OMNI MyCalls known sample9 35408 13702 2.58 35304 8363 4.22
    OMNI MyCalls known sample10 35440 13678 2.59 35291 8360 4.22
    OMNI MyCalls known all 77489 29137 2.66 35161 8313 4.23
    dbsnp MyCalls known sample1 34234 12999 2.63 77239 29551 2.61
    dbsnp MyCalls known sample2 34462 13111 2.63 77277 29539 2.62
    dbsnp MyCalls known sample3 34497 13015 2.65 77248 29547 2.61
    dbsnp MyCalls known sample4 34655 13206 2.62 77251 29545 2.61
    dbsnp MyCalls known sample5 34811 13369 2.6 77261 29536 2.62
    dbsnp MyCalls known sample6 34315 13186 2.6 77296 29565 2.61
    dbsnp MyCalls known sample7 35558 13645 2.61 77254 29545 2.61
    dbsnp MyCalls known sample8 35497 13708 2.59 77277 29532 2.62
    dbsnp MyCalls known sample9 35408 13702 2.58 77256 29540 2.62
    dbsnp MyCalls known sample10 35440 13678 2.59 77261 29520 2.62
    dbsnp MyCalls known all 77489 29137 2.66 76687 29237 2.62

  • KurtKurt Posts: 126Member ✭✭✭

    When you make your calls do you restrict your calls to intervals in a bed file? If so, what is the total amount of unique non-overlapping genomic space contained in your bed file? Is around 30 MB or is more like 50-70 MB?

  • avidLearneravidLearner Posts: 9Member
    edited April 2013

    Yes I restricted my calls to intervals in a bed file. I ran HaplotypeCaller without interval padding but I modified the bed file such that it is padded by 50 bp with the overlaps merged. The total length of the intervals in my modified bed file is approximately 65MB. Are you suggesting that the non-exonic regions are contributing to the lower Ti/Tv ratio for my samples?

    Post edited by avidLearner on
  • avidLearneravidLearner Posts: 9Member
    edited April 2013

    Thanks @Kurt. Any ideas on why the ratios are so high for OMNI?

    Post edited by avidLearner on
Sign In or Register to comment.