Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

GATK4-ApplyBQSR Missing ReadGroups in PublicPairedSingleSampleWf

EADGEADG KielMember ✭✭✭

He its me again,

i have a Problem with the function ApplyBQSR which repace the PrintReads function.
I run a modified local Version of PublicPairedSingleSampleWf, tested with the demo files from the cloudbucket.
The wdl-wf is: 2-fastq-files => fastqToBam => bwa => Merge =>SortFix=>BaseRecalibrator=>ApplyBQSR=>HaplotypeCaller.
Because of the missing ReadGroup´s I added the AddOrReplaceReadGroups command to the SortFix-Function so it looks like this:

    java -Xmx4000m -jar /usr/gitc/picard.jar \
    SortSam \
    INPUT=${input_bam} \
    OUTPUT=/dev/stdout \
    SORT_ORDER="coordinate" \
    CREATE_INDEX=false \
    CREATE_MD5_FILE=false | \
    java -Xmx4g -jar /usr/gitc/picard.jar \
    AddOrReplaceReadGroups \
    INPUT=/dev/stdin \
    OUTPUT=/dev/stdout \
    RGID=1 \
    RGLB=delta9 \
    RGPL=ILLUMINA \
    RGPU=UNIT1 \
    RGSM=${sample_name} | \
    java -Xmx500m -jar /usr/gitc/picard.jar \
    SetNmAndUqTags \
    INPUT=/dev/stdin \
    OUTPUT=${output_bam_basename}.bam  \
    CREATE_INDEX=true \
    CREATE_MD5_FILE=true \
    REFERENCE_SEQUENCE=${ref_fasta}

ApplyBQSR is throwing the following Error:

java.lang.IllegalStateException: missing readgroup UNIT1

Plus a bunch of stacktraces from java.(I can post if needed)
When I convert the bam-file to a sam i can see the different readgroups.

@RG     ID:1    LB:delta9  PL:ILLUMINA     SM:pipeTest        PU:UNIT1

Any clue to fix this issue ?

Thanks!

Greetings EADG!

Tagged:

Issue · Github
by shlee

Issue Number
1505
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Best Answer

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited December 2016

    Hi @EADG,

    I suggest you double-check if every record has read group information--count the number of records, then count the number of records with RG tags. I suspect you'll find the numbers do not match.

    My suggestion is that you apply AddOrReplaceReadGroups to the output of FastqToSam. The output of this becomes your uBAM (unaligned BAM). After aligning the FASTQs with BWA to produce the aligned BAM, in the MergeBamAlignment step, merge the uBAM with the aligned BAM. Notice that MergeBamAlignment takes two BAMs as input. The reasons for this reorganization of your workflow are explained in Tutorial#6484 and Tutorial#6483. I believe this reorganization will solve the error, which indicates that some of the reads are missing readgroup information. In the case of BQSR, if RGPU is present, then it takes precedence over RGID (as explained in the comment thread of this document). So I think the missing readgroup UNIT1 error is implies some records are missing read group information of any kind.

  • EADGEADG KielMember ✭✭✭

    HI @shlee,

    thank you for your answer, i do the reorganization of the workflow, but the error is still persist. I have straight linear workflow without scatter&gather so I have only one file to check for RG. I wonder that the Baserecalibrator dont gripe over that "missing" RG.

    Hm after looking at the Basecalibrator-Log I see that all my reads are fine and not a single read is touched by the Baserecalibrator..is that possible ?

    Also i run the workflow with the GATk3.6 Combination, Bascalibrator=>PrintReads which run smoothly to the end.

    Maybe iam using the gatk4.0 version of basrelicalibration wrong ?

    Greeting EADG

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @EADG,

    Thanks for trying that out. Let me make sure I understand your observations correctly.

    1. When you use GATK4's BaseRecalibrator + ApplyBQSR you get java.lang.IllegalStateException: missing readgroup UNIT1 plus a bunch of other stacktrace errors. Base qualities remain unchanged.
    2. When you switch back to GATK3.6, BaseRecalibrator + PrintReads works fine. Base qualities are recalibrated.

    Please confirm that the above is the case.

    Here's what I need from you now. Can you post the entirety of your error stacktrace? We need to confirm the earliest line of code at which there is an error. Also, can you please let us know which build of GATK4 you are using. Our production workflow uses BaseRecalibrator + ApplyBQSR from GATK 4.alpha-249-g7df4044. We use it in the context of a scatter&gather workflow (described here) while you have a linear workflow. If this is the only difference between getting the error and not, then we'll be asking you for a piece of test data that can recapitulate the error so that we can fill out a bug report. Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @EADG We recently found a regression in the GATK4 ApplyBQSR; it's probably not related to what you're seeing (very different error message) but as a result I'm inclined to think that your issue may also be due to a bug rather than anything you might be doing wrong. Would be great if you could test both the version used in our production pipeline (in the published docker image) and the very latest version of GATK as compiled from the source code repository at https://github.com/broadinstitute/gatk. If the issue persists in the latest then we'll need some test data as @shlee stated. Instructions for preparing a bug report are at https://software.broadinstitute.org/gatk/documentation/article?id=1894

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Oh and to be clear the test data should be the bam that is input to ApplyBQSR. So if you're running this only as a pipeline, you'll need to modify your WDL to output the bam intermediate before base recalibration, then run BaseRecalibrator and ApplyBQSR separately and verify that the same error occurs.

  • EADGEADG KielMember ✭✭✭

    Hi @shlee and @Geraldine_VdAuwera,

    unfortunately the new Version (gatk-package-4.alpha.2-116-gdd7e19a-SNAPSHOT-local.jar) dont fix the problem, but it was worth a try. I pack everthing you together and try to upload it to your ftp-Server but i get the max 20 user Error. Is there a nother place were i can upload the file ?

    @Geraldine_VdAuwera I copied the files from the cromwell-execution/.../Call-ApplyBQSR/inputs/../ directory, so I dont have to output the files seperatly. I hope that doesnt has an affect.

    Greetings

    EADG

  • EADGEADG KielMember ✭✭✭

    Ok i finally uploaded the file ApplyBQSR_Error_GATK4_EADG...

    Issue · Github
    by Sheila

    Issue Number
    1517
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    ronlevine
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Thanks for the files, @EADG. We'll try to get to this next week but to be honest I can't guarantee we will be able to do so -- it's the end of the year so we are trying to finish up some high priority items.
  • EADGEADG KielMember ✭✭✭

    Thank you @Geraldine_VdAuwera for your response..its no problem I can wait :). Also i can use the gatk3.6 Version.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Soon you will be able to use 3.7, if we can finally get this release out >_<

  • EADGEADG KielMember ✭✭✭

    You mean we get 3.7 as a christmas present ? Would it be the last 3. Version before GATK4 ?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Hah, yes, exactly. I wanted to release it last week but we ran into an htsjdk bug that blocked us for a few days. But as of this morning it's all fixed and ready to go! I plan to push the big red button Monday morning :)
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    It's out! GATK 3.7 is out!

  • EADGEADG KielMember ✭✭✭

    Yeah I see...I will try it tomorow :)

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Ah, I'm glad you figured it out. Will see what we can do to make the error message more informative.
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    And happy holiday to you too!
Sign In or Register to comment.