Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How does ApplyBQSR handle unmapped reads?

Dear GATK team,

I am working with WGS BAM files provided by a major consortium. They applied a BWA + GATK pipeline. The quality scores were recalibrated, but the resulting BAM files do not contain an OQ tag giving the original base qualities.

What does GATK do with the base quality scores of unmapped reads in the ApplyBQSR (GATK4) or PrintReads (GATK3) tool?

My analysis involves a subset of reads drawn from mapped reads and unmapped reads. Therefore, I am concerned that this subset will contain a mixture of quality scores - recalibrated for some reads, but not for others.

Thank you,
Matthew

Best Answer

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MA admin
    Accepted Answer

    Hi @map2085, iirc you're correct that nucleotide context is based on the reference. I believe there's some logic built-in to discount covariates that don't apply, as if those covariates had a neutral value for those reads. The reads will still be recalibrated based on applicable covariates.

Answers

  • map2085map2085 nyMember
    edited July 2018

    Looking around, I found this workflow on the GATK github page, which has the following comment:

    # add the unmapped sequences as a separate line to ensure that they are recalibrated as well

    How are unmapped reads recalibrated if there is no reference sequence for applying the covariates, i.e. nucleotide context, etc. ?

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @map2085,

    Unmapped reads should have undergone BQSR the same as the mapped reads if Best Practices were followed. You can see this implemented in this production pipeline. Remember that BQSR uses a recalibration table (gathered from across the mapped reads) to correct base qualities. These are applied to unmapped reads with -L unmapped.

    OQ tags take up a lot of space so file storage is likely the reason why they were removed.

  • map2085map2085 nyMember

    HI @shlee ,

    Thank you for the confirmation.

    I am surprised that BQSR can be applied to unmapped reads because I thought that some covariates would be undefined for unmapped reads. For instance, I thought that the "context" covariate of the BQSR tables meant nucleotide context in the reference genome to which the read was mapped.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Accepted Answer

    Hi @map2085, iirc you're correct that nucleotide context is based on the reference. I believe there's some logic built-in to discount covariates that don't apply, as if those covariates had a neutral value for those reads. The reads will still be recalibrated based on applicable covariates.

Sign In or Register to comment.