Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GatherBamFiles / FixMateInformation / ValidateSamFile

manolismanolis Member ✭✭
edited February 2018 in Ask the GATK team

Hi, here the pipe...

1) ApplyBQSR

while read -r f1 f2; do ....
${ph6} --java-options ${java_opt1} ApplyBQSR -R ${gnm} -I ${fBAM} -O ${fol5}/${c_applybqsr} -L ${f1} -bqsr ${fol5}/${bqsrrd} --static-quantized-quals 10 --static-quantized-quals 20 --static-quantized-quals 30 --add-output-sam-program-record --create-output-bam-md5 --use-original-qualities
... done

2) GatherBamFiles
java -Dsamjdk.compression_level=${cl} ${java_opt1} -jar ${ph3} GatherBamFiles ${BQSRs} O=${fol4}/${tofixapplybqsr} CREATE_INDEX=true CREATE_MD5_FILE=true

BQSR variable is all Input chr .bam files

3) ValidateSamFile
java -jar ${ph3} ValidateSamFile I=${tofixapplybqsr} MODE=SUMMARY

4)FixMateInformation
java -jar ${ph3} FixMateInformation I=${tofixapplybqsr} O=${applybqsr} CREATE_INDEX=true CREATE_MD5_FILE=true

5)ValidateSamFile

java -jar ${ph3} ValidateSamFile I=${applybqsr} MODE=SUMMARY

After steps 2 the validation of bam file (step 3) give me an error:

HISTOGRAM java.lang.String

Error Type Count
ERROR:MATE_NOT_FOUND 11647

Then I'm going to fix this error with FixMateInformation (step 4) and trying again to validate my bam, the error is always there!

HISTOGRAM java.lang.String

Error Type Count

ERROR:MATE_NOT_FOUND 11647

This kind of error is important? I'have really to fix it or can I go to the next steps (bam -> gVFC).
Any suggestion how to fix it?

My intervals are: chr1... chr22, chrX, chrY, chrM.

Many thanks


Error details

ERROR: Read name A00125:27:H3JT2DMXX:2:2168:22227:12054, Mate not found for paired read
ERROR: Read name A00125:27:H3JT2DMXX:1:2229:28085:35008, Mate not found for paired read
ERROR: Read name A00125:27:H3JT2DMXX:2:2111:1045:33646, Mate not found for paired read
....

Best Answers

Answers

  • manolismanolis Member ✭✭

    Hi sheila,

    now I saw your answer..
    some minutes ago I opened a new thread... https://gatkforums.broadinstitute.org/gatk/discussion/11473/error-in-gatherbqsrreports#latest

    It is an error in the gatherbqsrreports step, before the GatherBamFiles step ... I don't know if it is related to the present error...

    I'm analyzing 40 WES samples with about 60X of coverage...

  • manolismanolis Member ✭✭

    Sorry, I'm trying and I will let you know,

    many thanks

  • manolismanolis Member ✭✭
    edited March 2018

    GATK v4.0.2.1

    Hi, I'm not sure what details you need... here what I done (sample ZMV)...

    SetNmAndUqTags + "ValidateSamFile (MODE=SUMMARY)"
    "No errors found"

    BaseRecalibrator with 22Chr and X+Y
    seems everything ok!

    GatherBQSRReports
    Tool returned: 0

    ApplyBQSR with 22Chr and X+Y
    seems everything ok!

    GatherBamFiles step
    seems everything ok but after "ValidateSamFile (MODE=SUMMARY)" I have this:

    HISTOGRAM java.lang.String
    Error Type Count
    ERROR:MATE_NOT_FOUND 10972
    ERROR: Read name A00125:27:H3JT2DMXX:1:1465:7997:2002, Mate not found for paired read
    ERROR: Read name A00125:27:H3JT2DMXX:2:1302:28926:26475, Mate not found for paired read
    ...

    Any suggestion?
    Many thanks

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited March 2018

    Hi @manolis,

    Are you working with reads subset from a larger alignment file? It appears that for a number of your paired end reads, the mate is missing from the file.

    Your options are to see if downstream tools complain about this, to remove the mate-missing reads, or to convert these mate-missing reads to single end reads by removing the 0x1 SAM flag. I show how to do this in Section 6.1 of Tutorial#8017.

  • manolismanolis Member ✭✭

    That I don't understand is why before BQSR everything is ok (ValidateSamFile), but after BQSR I have the problem of mate-missing reads... This change could affect BQSR steps and its results or viceversa this "error" is generated from BQSR steps...?

    What do you think?

    For now I will exclude the missing-mate reads.
    Best

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭
    Accepted Answer

    You are using -L in ApplyBQSR step that you should avoid. This is causing missing mates after ApplyBQSR. Use intervals only during sampling and model generation of BQSR not during printing those reads back to bam.

  • manolismanolis Member ✭✭
    Accepted Answer

    Great news SkyWarrior !!! No more errors.

    Thanks a lot all of you!!!

Sign In or Register to comment.