short read preprocessing in MuTect

Hi, I am using MuTect on paired tumor/normal exome data. MuTect pre_process low quality reads before somatic SNV discovery. How can I get the processed bam files? I know there is source code for MuTect at github, but I am not sure which java files are used in preprocessing steps. Can you please tell me the program files for low-quality reads preprocessing? Thanks very much for your time!

Best
Jing

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    MuTect doesn't really do any pre-processing; perhaps what you refer to is the quality-based read filtering that is done by the underlying GATK engine.

    What are you trying to do with the source code?

  • jingmengjingmeng AustraliaMember

    @Geraldine_VdAuwera said:
    MuTect doesn't really do any pre-processing; perhaps what you refer to is the quality-based read filtering that is done by the underlying GATK engine.

    What are you trying to do with the source code?

    Thanks very much for your reply. I am collecting a list of candidate sites predicted by MuTect, and want to generate a pileup file containing the depth, base calls, base quality, mapping quality and within-read positions of bases mapped to each candidate site. I know MuTect applies short read preprocessing before somatic SNV detection, please look at http://www.nature.com/nbt/journal/v31/n3/extref/nbt.2514-S1.pdf. I checked https://www.broadinstitute.org/gatk/guide/tooldocs/#ReadFilters, and it does not contain all the filters used by MuTect, like the filter that if there is an overlapping read pair, and both reads agree the read with the highest quality score is retained otherwise both are discarded. So I am asking how I can get the preprocessed bam files. Thanks very much for your time.

    Best
    Jing

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Jing,

    Those operations are done on the fly by the GATK engine and some deep parts of the genotyping code. It is not possible to emit a bam file corresponding to that processing.

    The preprocessing that is done separately is the duplicate marking with Picard, GATK realignment/co-cleaning and base recalibration. Any further filtering or read operations are done internally.

  • jingmengjingmeng AustraliaMember

    Thanks for your reply. If possible, could you please show me the deep parts of the genotyping code corresponding to pre-processing reads? I have downloaded the source code of MuTect, but I know nothing about Java. Thanks very much for your precious time.

    Best
    Jing

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Sorry, we're not able to provide that level of support at this time.

  • jingmengjingmeng AustraliaMember

    @Geraldine_VdAuwera said:
    Sorry, we're not able to provide that level of support at this time.

    It does not matter. Thanks very much for your time.

    Best
    Jing

Sign In or Register to comment.