Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Could I use variantEval to evaluate somatic mutation calling by mutect2?

Hello,
I have a basic question about VariantEval? could I use this methods to evaluate somatic mutation from tumor? I tried with default parameter: even there are more than 40K lines mutations passed filter, VariantEvalonly report about 1K variants in the callset. I am trying to understand why VariantEval get so few, what is the filter VariantEval use? How do I tell how these mutations are filtered out by VariantEval?
Thanks a lot for help,
WIlley

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @weiguofeng
    Hi WIIIey,

    Can you post the exact command you ran? What are you using as your comparison file?

    Thanks,
    Sheila

  • weiguofengweiguofeng Member
    edited October 2018

    Sheila,

    Sorry for late response. Here is the code I used, thanks a lot for help.

    java -jar GenomeAnalysisTK.jar \
        -T VariantEval -nt 8 \
        -R Human.B37.3.karyo.fa \
        -eval germline.vcf \
        -D dbsnp_138.b37.excluding_sites_after_129.vcf \
        -noEV -EV CompOverlap -EV IndelSummary -EV TiTvVariantEvaluator -EV CountVariants -EV MultiallelicSummary \
            -comp:SNVcalling SNVcalling.vcf \
        -L WES_region.bed \
        -o eval.grp.txt
    

    Best,
    Willey

    Post edited by shlee on
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @weiguofeng,

    Sheila has left the Communications team for greener pastures. While our new front-line support specialist is ramping up, I am helping out on the forum.

    First, VariantEval is an evaluation tool and should not filter any variants. That being said, I believe it will only analyze variants that are PASS or . in your callset. You may find Tutorial#6211 helpful.

    Second, looking at your command, it appears you are evaluating a germline callset against dbsnp and a comparison file that is somatic (based on the SNV label). Given you say:

    could I use this methods to evaluate somatic mutation from tumor?

    I think you want to switch the -eval and -comp labels around. The -eval parameter should define your somatic callset that is of interest to you and the --comp parameter defines one of the datasets that like dbSNP you would like to compare your data against.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @weiguofeng, sorry for the late response. You can certainly use VariantEval to collect basic statistics about mutation calls made by Mutect2, though keep in mind some of the evaluation modules may be geared toward germline calls specifically, and there are many different wasy to use this tool depending on what you're trying to find out. Here you're setting "germline.vcf" as the main file to evaluate, and "SNVcalling.vcf" as the comparison. Is that what you're trying to do? What is the ultimate purpose of your evaluation?

    If you want us to help interpret the results of your evaluation we'll need you to post the summary tables.

  • weiguofengweiguofeng Member
    edited October 2018

    Hi, @shlee and @Geraldine_VdAuwera ,

    We are evaluate whether we can use germinal mutation calling methods on tumor samples only. After called "germinal mutation", we want to compare with SNV calling with Mutect2 on the same sample to see how many "germinal mutation" are SNV.
    As Shlee suggested, I did that in both ways, the summary result is attached. As long as -eval SNV, the total mutation is dropped to ~1000.
    I also attached last two table to show Picard tool CollectVariantCallingMetrics results for both germinal mutation vcf and SNV.vcf. As you can see that SNV.vcf file containing about ~40K Passed mutations.

    That is where I am confused. I also used Picard GenotypeConcordance to compare two vcf directly, the result seems much more understandable.

    My questions is why GATK tools seems filtering out some SNV callings.
    Sorry if my question is too naive.
    thanks for help,
    Weiguo

    Post edited by bhanuGandham on
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @weiguofeng,

    Geraldine is occupied with preparing for the ASHG meeting, so I will follow-up on her request to see your data.

    Did you get a chance to go through Tutorial#6211, which I linked to above and which covers VariantEval and touches briefly on GenotypeConcordance? At the top of this document are two links that go into detail differences between VariantEval and also CollectVariantCallingMetrics.

    Basically, the differences you see relate to how concordance is defined by the different tools. Concordance could be at the site-level (a genomic position is called variant), the allele-level (the variant alt allele matches) or the genotype-level (genotype/zygosity of call matches). VariantEval defines concordance at the site-level whereas GenotypeConcordance defines concordance at matching genotype. I hope this clarifies things.

  • Shlee,
    I went through the tutorial and understand varianteval evaluate site level variant. My understanding is this calling based on field 4 and 5 in vcf file.
    I do not understand is why varienteval only pick up 1323 site level variant as all in 2nd table of my attached file above. My SNV calling resulted vcf file has 40K records, which can be seen in my 4th table in the same attached file.
    Thank you very much for checking this for me,
    Weiguo

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Again @weiguofeng, the tool only considers passing variants. Are you saying you have 40K passing somatic mutation sites?

  • Yes, all 40K are passed variants (have "PASS" label), that is why I do not understand what is the filter used by varianteval tools, but this filter is not used by picard.
    Thank you very much for help with my confusion.

Sign In or Register to comment.