The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Ti/Tv Variant Evaluator results from VariantEval

avidLearneravidLearner Member Posts: 9

Hi,

I have processed 10 whole-exome samples using the GATK best practices workflow (GATK v2.4-3-g2a7af43). I am currently evaluating my variant call set (generated from HaplotypeCaller) with OMNI 2.5 SNP array (comparison set) and dbSNP 137.

I have included 2 rows from the Ti/Tv Variant Evaluator table:

CompRod EvalRod Novelty Sample nTi nTv tiTvRatio nTiInComp nTvInComp TiTvRatioStandard OMNI MyCalls all all 79945 30322 2.64 993588 274219 3.62 dbsnp MyCalls all all 79945 30322 2.64 30214009 15253850 1.98

According to literature survey, the Ti/Tv ratio should be approximately 2.1 for whole genome sequencing and 2.8 for whole exome sequencing. Since I am getting Ti/Tv of 2.64 for exome, does this indicate false positives in the data? Also, what could be the rationale for getting such high TiTvRatioStandard for the OMNI whole genome data?

Thanks!

Tagged:

Best Answers

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,732 admin

    Hi there,

    Here you're posting the "all" lines, but for this evaluation you should be looking at the "known" lines. Can you post those to show the difference?

    Geraldine Van der Auwera, PhD

  • avidLearneravidLearner Member Posts: 9

    I have posted the "known" lines below for all the 10 samples.

    CompRod EvalRod Novelty Sample nTi nTv tiTvRatio nTiInComp nTvInComp TiTvRatioStandard OMNI MyCalls known sample1 34234 12999 2.63 35305 8370 4.22 OMNI MyCalls known sample2 34462 13111 2.63 35307 8372 4.22 OMNI MyCalls known sample3 34497 13015 2.65 35295 8367 4.22 OMNI MyCalls known sample4 34655 13206 2.62 35299 8377 4.21 OMNI MyCalls known sample5 34811 13369 2.60 35310 8368 4.22 OMNI MyCalls known sample6 34315 13186 2.60 35304 8368 4.22 OMNI MyCalls known sample7 35558 13645 2.61 35303 8361 4.22 OMNI MyCalls known sample8 35497 13708 2.59 35299 8368 4.22 OMNI MyCalls known sample9 35408 13702 2.58 35304 8363 4.22 OMNI MyCalls known sample10 35440 13678 2.59 35291 8360 4.22 OMNI MyCalls known all 77489 29137 2.66 35161 8313 4.23 dbsnp MyCalls known sample1 34234 12999 2.63 77239 29551 2.61 dbsnp MyCalls known sample2 34462 13111 2.63 77277 29539 2.62 dbsnp MyCalls known sample3 34497 13015 2.65 77248 29547 2.61 dbsnp MyCalls known sample4 34655 13206 2.62 77251 29545 2.61 dbsnp MyCalls known sample5 34811 13369 2.6 77261 29536 2.62 dbsnp MyCalls known sample6 34315 13186 2.6 77296 29565 2.61 dbsnp MyCalls known sample7 35558 13645 2.61 77254 29545 2.61 dbsnp MyCalls known sample8 35497 13708 2.59 77277 29532 2.62 dbsnp MyCalls known sample9 35408 13702 2.58 77256 29540 2.62 dbsnp MyCalls known sample10 35440 13678 2.59 77261 29520 2.62 dbsnp MyCalls known all 77489 29137 2.66 76687 29237 2.62

  • KurtKurt Member Posts: 255 ✭✭✭

    When you make your calls do you restrict your calls to intervals in a bed file? If so, what is the total amount of unique non-overlapping genomic space contained in your bed file? Is around 30 MB or is more like 50-70 MB?

  • avidLearneravidLearner Member Posts: 9
    edited April 2013

    Yes I restricted my calls to intervals in a bed file. I ran HaplotypeCaller without interval padding but I modified the bed file such that it is padded by 50 bp with the overlaps merged. The total length of the intervals in my modified bed file is approximately 65MB. Are you suggesting that the non-exonic regions are contributing to the lower Ti/Tv ratio for my samples?

    Post edited by avidLearner on
  • avidLearneravidLearner Member Posts: 9
    edited April 2013

    Thanks @Kurt. Any ideas on why the ratios are so high for OMNI?

  • mglclinicalmglclinical USAMember Posts: 95

    I will also try to restrict my vcf file with ucsc refseq transcript bed file and the compute the Ti/Tv ratio and see if it improves my Ti/Tv ratio

  • mglclinicalmglclinical USAMember Posts: 95

    @Kurt

    thank you for your replies. My target bed file covered 45 million bases, my vcf file had ~35,000 variants, and my Ti/Tv ratio was at 2.67

    I downloaded ucsc refseq exons bed file , and restricted my vcf with ucsc genes, and my new vcf file ended up with ~20,000 variants. My Ti/Tv ratio on new vcf file is 2.99

Sign In or Register to comment.