Is dbSNP GRCh38.p2.vcf and GRCh38.p7.fa compatible in MuTect??

yoteamoyoteamo seoulMember
edited June 2016 in Ask the GATK team

Is the BAM file generated from STAR using GRCh38.p7.fa and GRCh38.p7.gtf compatible with dbSNP_GRCh38.p2? I didnt do it yet, but I want to make sure that the patched version doesnt matter.

And one more thing. When doing MuTect analysis using RNA_seq dose cosmic_coding region only (not included noncoding region), is it ok?
Thanks for your help :)

Tagged:

Issue · Github
by Sheila

Issue Number
980
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
sooheelee

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator
    edited June 2016

    Hi @yoteamo,

    Your p2-dbSNP file is compatible with the p7-patched references. Patches add information to the assembly without disrupting the chromosome coordinate.

    We have not validated using MuTect on RNA-Seq data. So you are on your own with this particular aspect of your question.

    As for your coding/noncoding region question, since RNA can be coding or non-coding, e.g. lncRNAs, and for coding transcripts can contain non-coding regions, e.g. UTRs, in addition coding regions, I am curious why you would use only COSMIC's coding regions in conjunction with your dbSNP file (presumably for the entire genome)? I would think you would want to match the whitelist (COSMIC) and redlist (dbSNP or ExAc) so that the genomic coverage is similar--either for coding regions only or for the same genomic intervals. This way you avoid introducing bias for sites present in the redlist but absent in coverage in the whitelist (as I've diagrammed and explain below). Furthermore, if you have a choice of covering more of the genome or less, why not cover more? Variants in non-coding regions can influence expression levels--take eQTLS for example--and I assume you have interest in these given your RNA-Seq data.

    image If you use a redlist/whitelist set that is disjointed in coverage, then it is possible you may miss rescuing true mutations and that these tumor mutations get filtered. That is, you increase your false negatives.

    • For variants found in Tumor and dbSNP but not COSMIC, MuTect requires more evidence in Normal to refute that the site is germline (Illustration site 2).
    • However, in the same situation if the site is also in COSMIC (site 1), then we negate that prior and revert back to the normal amount of evidence in Normal. That is, site 1 essentially has the same behavior as site 4.
Sign In or Register to comment.