Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on October 14, 2019, due to the U.S. holiday. We will return to monitoring the forum on October 15.

short read preprocessing in MuTect

jingmengjingmeng AustraliaMember

Hi, I am using MuTect on paired tumor/normal exome data. MuTect pre_process low quality reads before somatic SNV discovery. How can I get the processed bam files? I know there is source code for MuTect at github, but I am not sure which java files are used in preprocessing steps. Can you please tell me the program files for low-quality reads preprocessing? Thanks very much for your time!

Best
Jing

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin

    MuTect doesn't really do any pre-processing; perhaps what you refer to is the quality-based read filtering that is done by the underlying GATK engine.

    What are you trying to do with the source code?

  • jingmengjingmeng AustraliaMember

    @Geraldine_VdAuwera said:
    MuTect doesn't really do any pre-processing; perhaps what you refer to is the quality-based read filtering that is done by the underlying GATK engine.

    What are you trying to do with the source code?

    Thanks very much for your reply. I am collecting a list of candidate sites predicted by MuTect, and want to generate a pileup file containing the depth, base calls, base quality, mapping quality and within-read positions of bases mapped to each candidate site. I know MuTect applies short read preprocessing before somatic SNV detection, please look at http://www.nature.com/nbt/journal/v31/n3/extref/nbt.2514-S1.pdf. I checked https://www.broadinstitute.org/gatk/guide/tooldocs/#ReadFilters, and it does not contain all the filters used by MuTect, like the filter that if there is an overlapping read pair, and both reads agree the read with the highest quality score is retained otherwise both are discarded. So I am asking how I can get the preprocessed bam files. Thanks very much for your time.

    Best
    Jing

  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin

    Hi Jing,

    Those operations are done on the fly by the GATK engine and some deep parts of the genotyping code. It is not possible to emit a bam file corresponding to that processing.

    The preprocessing that is done separately is the duplicate marking with Picard, GATK realignment/co-cleaning and base recalibration. Any further filtering or read operations are done internally.

  • jingmengjingmeng AustraliaMember

    Thanks for your reply. If possible, could you please show me the deep parts of the genotyping code corresponding to pre-processing reads? I have downloaded the source code of MuTect, but I know nothing about Java. Thanks very much for your precious time.

    Best
    Jing

  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin

    Sorry, we're not able to provide that level of support at this time.

  • jingmengjingmeng AustraliaMember

    @Geraldine_VdAuwera said:
    Sorry, we're not able to provide that level of support at this time.

    It does not matter. Thanks very much for your time.

    Best
    Jing

Sign In or Register to comment.