The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

reduceReads removes reads that should not be removed

I am having issues making sense of the behaviour of reducereads in an example where I know that the call is real (Sanger validated...).
In the non-reduced pileup the variant is pretty solid:

11 47364249 G 12 aA..A.A,.A.^~.

The compressed pileup looks like:
11 47364249 G 2 .A

which I guess is OK but the depth associated with the A call is 1 instead of the expected 5 (as evidenced when I run a samtools view).
Accordingly, after running the UG, I get 7 reads supporting the reference and 1 supporting the alternative.
But really it should be 7/5. I am losing 4 precious reads that turn this variant into missing.

I really can't make sense of it. I tweaked all the options I could find to keep as many reads as possible when reducing:
-minqual 0 -minmap 0 --dont_hardclip_low_qual_tails -noclip_ad

I am using GenomeAnalysisTK-2.7-4, and I can upload the slide of the BAM file to illustrate the problem.
I also attach my debugging code (as a txt file) if someone wants to see if I am missing a key option.

Is that a bug? Or am I missing something obvious?


Best Answer


  • ebanksebanks Broad InstituteMember, Broadie, Dev

    Hi there,
    How do you know that the reduced depth is only 1 at that position? (FYI If you look at it in IGV it will tell you the representative depth)

  • Good question. I guessed that when running samtools view, there is a long set of values at the end which looks like this:
    RR:B:c,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1, ...

    and for the other synthetic read

    so I presumed that this must be the depth at every position, it must be stored somewhere and the numbers look right (also consistent with the pileup)

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    Good. Those numbers are the depths relative to the first position in the read. So a 1 in the middle of the read means "1 more than the number in the first position". It looks then like you have a depth of 3 at the position in question, right?

  • So the reads with the 1s and 2s is the read with the reference call, pretty sure of that. It eventually goes up to 6-7 and that matches the reference read count of 7. So that ones makes sense.

    The problem is the second read. I will paste below the full sequence but it never goes above 1, so the depth varies between 1 and 2 I presume. Nothing like the 5 I expect.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Vincent,

    Try writing the reads that end up missing to a separate file and validate that file, or run them through RR and look for the filter summary. This is to check if there is something wrong with those reads that leads to them getting filtered out somehow.

  • vplagnolvplagnol Member
    edited November 2013

    Thanks Geraldine, on it now. Just sorry to be dumb but what validation, do you mean the Picard validation tools or something part of GATK?
    By filter summary I presume you mean extract my 12 reads (7 reference and 5 alternate) and run reduce reads? I get the following very clean output:

    INFO 18:35:23,731 MicroScheduler - 0 reads were filtered out during the traversal out of approximately 12 total reads (0.00%)
    INFO 18:35:23,731 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter
    INFO 18:35:23,731 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter
    INFO 18:35:23,731 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
    INFO 18:35:23,732 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
    INFO 18:35:23,732 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
    INFO 18:35:23,732 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter
    INFO 18:35:24,964 GATKRunReport - Uploaded run statistics report to AWS S3

    So no filtering is happening.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Yep, that's what I meant - and that is indeed a very clean output. So filtering's not the issue, ok. By validation I meant Picard ValidateSAMFile, yes. If that also runs cleanly, I'll ask you to upload your snippet with the 12 reads so we can have a closer look at it.

  • vplagnolvplagnol Member
    edited November 2013

    Yes, picard validation is OK too.
    Every single read gives me
    ERROR: Read name HWUSI-EAS483_0025_FC:1:4:2500:9893#TGACCA, Mate not found for paired read

    but that is for all the reads, simply because I only sliced one read of each pair. I presume this is normal.

    I uploaded the BAM file + the test scripts to your public ftp just now. The file name is

    Hopefully it is all straightforward, I put the arguments on top. Just run the reduceReads command line, you will see what I mean.
    I am guessing this is a bug, but I give 50% chance that it's me being stupid.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Good news, the devs have a proposed fix for this bug. It might take a little longer than usual to go through code review and into the codebase due to the impending holiday, but it should make it into the nightlies early next week.

  • Thanks Geraldine. Obviously curious to hear the reason. Let me know as soon as the fix has made its way to the nightly builds. That will make some clinicians happy.

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    Mapping Quality was being treated as a byte instead of an int (which is fine for most aligners, but not Novoalign).

  • Thanks Eric. I suspected indeed it was quite specific to my setup (otherwise it would have been found before).

Sign In or Register to comment.