The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.

Ti/Tv Variant Evaluator results from VariantEval

avidLearneravidLearner Member Posts: 9

Hi,

I have processed 10 whole-exome samples using the GATK best practices workflow (GATK v2.4-3-g2a7af43). I am currently evaluating my variant call set (generated from HaplotypeCaller) with OMNI 2.5 SNP array (comparison set) and dbSNP 137.

I have included 2 rows from the Ti/Tv Variant Evaluator table:

CompRod  EvalRod  Novelty  Sample     nTi    nTv     tiTvRatio  nTiInComp  nTvInComp  TiTvRatioStandard
 OMNI     MyCalls   all     all       79945   30322     2.64      993588    274219      3.62 
 dbsnp    MyCalls   all     all       79945   30322     2.64      30214009  15253850    1.98

According to literature survey, the Ti/Tv ratio should be approximately 2.1 for whole genome sequencing and 2.8 for whole exome sequencing. Since I am getting Ti/Tv of 2.64 for exome, does this indicate false positives in the data? Also, what could be the rationale for getting such high TiTvRatioStandard for the OMNI whole genome data?

Thanks!

Tagged:

Best Answers

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 10,704 admin

    Hi there,

    Here you're posting the "all" lines, but for this evaluation you should be looking at the "known" lines. Can you post those to show the difference?

    Geraldine Van der Auwera, PhD

  • avidLearneravidLearner Member Posts: 9

    I have posted the "known" lines below for all the 10 samples.

    CompRod EvalRod Novelty   Sample   nTi      nTv     tiTvRatio   nTiInComp nTvInComp TiTvRatioStandard
    OMNI     MyCalls  known    sample1  34234   12999       2.63      35305       8370     4.22
    OMNI     MyCalls  known    sample2  34462   13111       2.63      35307       8372     4.22
    OMNI     MyCalls  known    sample3  34497   13015       2.65      35295       8367     4.22
    OMNI     MyCalls  known    sample4  34655   13206       2.62      35299       8377     4.21
    OMNI     MyCalls  known    sample5  34811   13369       2.60      35310       8368     4.22
    OMNI     MyCalls  known    sample6  34315   13186       2.60      35304       8368     4.22
    OMNI     MyCalls  known    sample7  35558   13645       2.61      35303       8361     4.22
    OMNI     MyCalls  known    sample8  35497   13708       2.59      35299       8368     4.22
    OMNI     MyCalls  known    sample9  35408   13702       2.58      35304       8363     4.22
    OMNI     MyCalls  known    sample10 35440   13678       2.59      35291       8360     4.22
    OMNI     MyCalls  known    all      77489   29137       2.66      35161       8313     4.23
    dbsnp    MyCalls  known   sample1   34234   12999       2.63      77239      29551     2.61
    dbsnp    MyCalls  known   sample2   34462   13111       2.63      77277      29539     2.62
    dbsnp    MyCalls  known   sample3   34497   13015       2.65      77248      29547     2.61
    dbsnp    MyCalls  known   sample4   34655   13206       2.62      77251      29545     2.61
    dbsnp    MyCalls  known   sample5   34811   13369       2.6       77261      29536     2.62
    dbsnp    MyCalls  known   sample6   34315   13186       2.6       77296      29565     2.61
    dbsnp    MyCalls  known   sample7   35558   13645       2.61      77254      29545     2.61
    dbsnp    MyCalls  known   sample8   35497   13708       2.59      77277      29532     2.62
    dbsnp    MyCalls  known   sample9   35408   13702       2.58      77256      29540     2.62
    dbsnp    MyCalls  known   sample10  35440   13678       2.59      77261      29520     2.62
    dbsnp    MyCalls  known   all       77489   29137       2.66      76687      29237     2.62
  • KurtKurt Member Posts: 255 ✭✭✭

    When you make your calls do you restrict your calls to intervals in a bed file? If so, what is the total amount of unique non-overlapping genomic space contained in your bed file? Is around 30 MB or is more like 50-70 MB?

  • avidLearneravidLearner Member Posts: 9
    edited April 2013

    Yes I restricted my calls to intervals in a bed file. I ran HaplotypeCaller without interval padding but I modified the bed file such that it is padded by 50 bp with the overlaps merged. The total length of the intervals in my modified bed file is approximately 65MB. Are you suggesting that the non-exonic regions are contributing to the lower Ti/Tv ratio for my samples?

    Post edited by avidLearner on
  • avidLearneravidLearner Member Posts: 9
    edited April 2013

    Thanks @Kurt. Any ideas on why the ratios are so high for OMNI?

  • mglclinicalmglclinical USAMember Posts: 78

    I will also try to restrict my vcf file with ucsc refseq transcript bed file and the compute the Ti/Tv ratio and see if it improves my Ti/Tv ratio

  • mglclinicalmglclinical USAMember Posts: 78

    @Kurt

    thank you for your replies. My target bed file covered 45 million bases, my vcf file had ~35,000 variants, and my Ti/Tv ratio was at 2.67

    I downloaded ucsc refseq exons bed file , and restricted my vcf with ucsc genes, and my new vcf file ended up with ~20,000 variants. My Ti/Tv ratio on new vcf file is 2.99

Sign In or Register to comment.