If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

IndelRealignment step: Lost of reads?

I'm using IndelRealigner tool on my BAM file and then counting the number of reads in the BAM file using samtools stats and it turns out that in my input BAM I have 167170574 reads mapped while in the output BAM of IndelRealignment step 121608609. Is it an expected behaviour?

Thank you in advance,


Best Answer


  • SheilaSheila Broad InstituteMember, Broadie admin


    I would not expect so many reads to not be mapped after Indel Realignment step. Can you post the log output of IndelRealigner where it shows how many reads were filtered out?


  • IrantzuIrantzu Member
    edited February 2018
    INFO  08:15:40,232 ProgressMeter -            done   2.06608363E8    98.4 m      28.0 s       99.9%    98.5 m       4.0 s
    INFO  08:15:40,232 ProgressMeter - Total runtime 5906.04 secs, 98.43 min, 1.64 hours
    INFO  08:15:40,254 MicroScheduler - 0 reads were filtered out during the traversal out of approximately 206608363 total reads (0.00%)
    INFO  08:15:40,254 MicroScheduler -   -> 0 reads (0.00% of total) failing BadCigarFilter
    INFO  08:15:40,255 MicroScheduler -   -> 0 reads (0.00% of total) failing MalformedReadFilter
    Done. There were 1 WARN messages, the first 1 are repeated below.
    WARN  06:37:13,653 IndexDictionaryUtils - Track knownAlleles doesn't have a sequence dictionary built in, skipping dictionary validation

    I have just realized that I use an interval file in the command:

    Program Args: -T IndelRealigner -R /storage/irantzu/mm10/Mus_musculus.GRCm38.dna.toplevel.fa -I F72_Liver_R1_sorted_dedup_recab.bam -I F72_tumor_R1_sorted_dedup_recab.bam --interval_set_rule INTERSECTION -L Murin_covered_nochr.bed -known mgp.v5.merged.indels.dbSNP142.normed.vcf.gz -targetIntervals F72_Liver_R1T_N.intervals -nWayOut

    Does IndelRealigner report only those alignments of the regions specified in the interval file? However.. before IndelRealignment, I ran BaseRecalibrator, specifying also the same interval file and having as result the same number of mapped reads, so.. it did not report only those reads falling within my interest region so I expect IndelRealigner to behaviour in the same way.

    Thank you again :)

Sign In or Register to comment.