We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GatherBamFiles / FixMateInformation / ValidateSamFile

manolismanolis Member ✭✭✭
edited February 2018 in Ask the GATK team

Hi, here the pipe...

1) ApplyBQSR

while read -r f1 f2; do ....
${ph6} --java-options ${java_opt1} ApplyBQSR -R ${gnm} -I ${fBAM} -O ${fol5}/${c_applybqsr} -L ${f1} -bqsr ${fol5}/${bqsrrd} --static-quantized-quals 10 --static-quantized-quals 20 --static-quantized-quals 30 --add-output-sam-program-record --create-output-bam-md5 --use-original-qualities
... done

2) GatherBamFiles
java -Dsamjdk.compression_level=${cl} ${java_opt1} -jar ${ph3} GatherBamFiles ${BQSRs} O=${fol4}/${tofixapplybqsr} CREATE_INDEX=true CREATE_MD5_FILE=true

BQSR variable is all Input chr .bam files

3) ValidateSamFile
java -jar ${ph3} ValidateSamFile I=${tofixapplybqsr} MODE=SUMMARY

java -jar ${ph3} FixMateInformation I=${tofixapplybqsr} O=${applybqsr} CREATE_INDEX=true CREATE_MD5_FILE=true


java -jar ${ph3} ValidateSamFile I=${applybqsr} MODE=SUMMARY

After steps 2 the validation of bam file (step 3) give me an error:

HISTOGRAM java.lang.String

Error Type Count

Then I'm going to fix this error with FixMateInformation (step 4) and trying again to validate my bam, the error is always there!

HISTOGRAM java.lang.String

Error Type Count


This kind of error is important? I'have really to fix it or can I go to the next steps (bam -> gVFC).
Any suggestion how to fix it?

My intervals are: chr1... chr22, chrX, chrY, chrM.

Many thanks

Error details

ERROR: Read name A00125:27:H3JT2DMXX:2:2168:22227:12054, Mate not found for paired read
ERROR: Read name A00125:27:H3JT2DMXX:1:2229:28085:35008, Mate not found for paired read
ERROR: Read name A00125:27:H3JT2DMXX:2:2111:1045:33646, Mate not found for paired read

Best Answers


  • manolismanolis Member ✭✭✭

    Hi sheila,

    now I saw your answer..
    some minutes ago I opened a new thread... https://gatkforums.broadinstitute.org/gatk/discussion/11473/error-in-gatherbqsrreports#latest

    It is an error in the gatherbqsrreports step, before the GatherBamFiles step ... I don't know if it is related to the present error...

    I'm analyzing 40 WES samples with about 60X of coverage...

  • manolismanolis Member ✭✭✭

    Sorry, I'm trying and I will let you know,

    many thanks

  • manolismanolis Member ✭✭✭
    edited March 2018

    GATK v4.0.2.1

    Hi, I'm not sure what details you need... here what I done (sample ZMV)...

    SetNmAndUqTags + "ValidateSamFile (MODE=SUMMARY)"
    "No errors found"

    BaseRecalibrator with 22Chr and X+Y
    seems everything ok!

    Tool returned: 0

    ApplyBQSR with 22Chr and X+Y
    seems everything ok!

    GatherBamFiles step
    seems everything ok but after "ValidateSamFile (MODE=SUMMARY)" I have this:

    HISTOGRAM java.lang.String
    Error Type Count
    ERROR: Read name A00125:27:H3JT2DMXX:1:1465:7997:2002, Mate not found for paired read
    ERROR: Read name A00125:27:H3JT2DMXX:2:1302:28926:26475, Mate not found for paired read

    Any suggestion?
    Many thanks

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited March 2018

    Hi @manolis,

    Are you working with reads subset from a larger alignment file? It appears that for a number of your paired end reads, the mate is missing from the file.

    Your options are to see if downstream tools complain about this, to remove the mate-missing reads, or to convert these mate-missing reads to single end reads by removing the 0x1 SAM flag. I show how to do this in Section 6.1 of Tutorial#8017.

  • manolismanolis Member ✭✭✭

    That I don't understand is why before BQSR everything is ok (ValidateSamFile), but after BQSR I have the problem of mate-missing reads... This change could affect BQSR steps and its results or viceversa this "error" is generated from BQSR steps...?

    What do you think?

    For now I will exclude the missing-mate reads.

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭
    Accepted Answer

    You are using -L in ApplyBQSR step that you should avoid. This is causing missing mates after ApplyBQSR. Use intervals only during sampling and model generation of BQSR not during printing those reads back to bam.

  • manolismanolis Member ✭✭✭
    Accepted Answer

    Great news SkyWarrior !!! No more errors.

    Thanks a lot all of you!!!

Sign In or Register to comment.