Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

removing low-quality reads by MuTect

jingmengjingmeng AustraliaMember

Before variant calling, MuTect removes low-quality reads first, please look at short read preprocessing at nature.com/nbt/journal/v31/n3/extref/nbt.2514-S1.pdf. I want to use this short read pre-processing method for my BAM files, and tried to program by perl. But I have no idea about how to program these sentences: (c) if there is an overlapping read pair, and both reads agree the read with the highest quality score is retained otherwise both are discarded. (b) if there is an overlapping read pair, and both reads agree the read with the highest quality score is retained otherwise the read that disagrees with the reference is retained. Can anybody help me to understand them? Thanks very much for help!

--best
Jing

Tagged:

Best Answer

Answers

  • jingmengjingmeng AustraliaMember

    @Sheila said:
    @jingmeng
    Hi Jing,

    I hope this example will help you.

    Reference is ATGCATGCA
    ForwardRead is ATGCAT
    ReverseRead is CTTGCA

    We can see positions 4,5 and 6 overlap in the forward and reverse reads. Now, notice in position 4 both reads agree the base is a C. But, in position 5, the forward read shows the base is A but the reverse read shows the base is T.

    For your first case (c), at position 4, the read which has the higher quality C base will be used. At position 5, none of the reads will be used.

    For your second case (b), at position 4, the read which has the higher quality C base will be used. At position 5, the reverse read will be used because it mismatches the reference.

    I hope this helps.

    Sheila

    Hi Sheila,

    Thanks very much. Your reply is very clearly.

    Jing

Sign In or Register to comment.