Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Can Base Quality Score Recalibration (BQSR) result in mixed phred scores?

KarinaBKarinaB Member
edited June 12 in Ask the GATK team
I have .bam files with phred scores ranging from 1-66. It seems to be a mix of Phred + 33 and Phred + 64 . I was wondering if it is possible that this could have happened int the BQSR in the Picard pipeline?

Thank you

Best Answer

  • KarinaBKarinaB
    Accepted Answer
    Hi again just letting you know that I have found the cause of the problem . It was a consequence of samtools mpileup overlap detection mode. So the Picard pipeline has nothing to do with it.

Answers

  • bshifawbshifaw Member, Broadie, Moderator admin

    Hi @KarinaB

    Could you elaborate on your question? e.g. What version of GATK are you using, what's the link to the Picard pipeline you are referring too?

    Here is a document that describes the process of BQSR Base Quality Score Recalibration (BQSR) and what it does. You can use IGV to check the effect of the tool on the quality scores by comparing the BAM files before and after BQSR

  • KarinaBKarinaB Member
    Hi @bshifaw and thank you for your reply. I will try to clarify my question.

    The mix of phred + 33 and phred + 64 is observed using samtools mpileup. As far as I understand ,the difference between the two scoring systems is which ascii characters is used to represent a base quality.
    For instance the mpileup output for one position in my .bam file could look like this :
    "b>^[email protected]=??]@A^AAbBBcc"
    The characters : "^","]","b","c" belong to phred + 64 while the rest belong to phred +33. The phred scores should range from 0 -41 for both phred score systems but since I initially analysed the basequality scores through a python script which only converted from phred+33 I saw scores ranging from 0-66.

    Checking the positions in IGV results in basequalities ranging from 0-41 so I assume IGV converts from both phred+33 and phred+64. I am unaware if there is any way to see in IGV wether the basequality was derived from phred +33 or from phred +64.

    The .bam files is sequenced,processed and aligned by Broad institute using the Picard pipeline. I am unfortunately unaware which version Broad used. I can read : "GATK IndelRealigner VN:nightly-2015-07-31-g3c92960" in the header of the .bam file .

    The question is really if this mix of phred +33 and phred +64 ascii characters could possible be a sideeffect of the Picard pipeline ? The data is sequenced in 2018 so I am ruling out the possibility of the data being sequenced by an Illumina platform using the phred +64 system.
  • KarinaBKarinaB Member
    Accepted Answer
    Hi again just letting you know that I have found the cause of the problem . It was a consequence of samtools mpileup overlap detection mode. So the Picard pipeline has nothing to do with it.
  • bshifawbshifaw Member, Broadie, Moderator admin

    Thanks for posting the solution to the problem!

Sign In or Register to comment.