Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

RealignerTargetCreator / IndelRealigner

Hi,

I am using GATK 3.1-1 on Ion torrent data. I am using human reference as provided by Ion Torrent people. I used picard to remove the duplicates. RealignerTargetCreator / IndelRealigner tools (with indel.vcf files from 1000G_phase1 and Mills) runs without any error on TMAP produced .bam files. But I am not sure whether the tools are doing what they are suppose to do? The stdout with informative lines are below.

Can anyone help me interpreting -
219901 reads (97.07% of total) failing DuplicateReadFilter (Does it mean - none of the reads were duplicate?)
1539 reads (0.68% of total) failing MappingQualityZeroFilter (Does it mean 1539 reads are at multiple location or low quality score?)

INFO 17:57:10,596 ProgressMeter - chr2:108035321 3.29e+08 30.0 s 0.0 s 11.5% 4.3 m 3.8 m
INFO 17:57:40,597 ProgressMeter - chr4:55195129 7.26e+08 60.0 s 0.0 s 24.1% 4.2 m 3.2 m
INFO 17:58:10,599 ProgressMeter - chr6:75886121 1.12e+09 90.0 s 0.0 s 36.8% 4.1 m 2.6 m
INFO 17:58:40,600 ProgressMeter - chr8:136959289 1.52e+09 120.0 s 0.0 s 49.4% 4.0 m 2.0 m
INFO 17:59:10,601 ProgressMeter - chr11:127358689 1.92e+09 2.5 m 0.0 s 62.8% 4.0 m 88.0 s
INFO 17:59:40,602 ProgressMeter - chr15:60257869 2.31e+09 3.0 m 0.0 s 76.5% 3.9 m 55.0 s
INFO 18:00:10,604 ProgressMeter - chr20:41377801 2.74e+09 3.5 m 0.0 s 89.2% 3.9 m 25.0 s
INFO 18:00:40,605 ProgressMeter - chrM:16485 3.06e+09 4.0 m 0.0 s 100.0% 4.0 m 0.0 s
INFO 18:01:00,391 ProgressMeter - done 3.10e+09 4.3 m 0.0 s 100.0% 4.3 m 0.0 s
INFO 18:01:00,392 ProgressMeter - Total runtime 259.80 secs, 4.33 min, 0.07 hours
INFO 18:01:00,451 MicroScheduler - 221440 reads were filtered out during the traversal out of approximately 226527 total reads (97.75%)
INFO 18:01:00,451 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter
INFO 18:01:00,452 MicroScheduler - -> 0 reads (0.00% of total) failing BadMateFilter
INFO 18:01:00,452 MicroScheduler - -> 219901 reads (97.07% of total) failing DuplicateReadFilter
INFO 18:01:00,452 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO 18:01:00,452 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO 18:01:00,452 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO 18:01:00,453 MicroScheduler - -> 1539 reads (0.68% of total) failing MappingQualityZeroFilter
INFO 18:01:00,453 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
INFO 18:01:00,453 MicroScheduler - -> 0 reads (0.00% of total) failing Platform454Filter
INFO 18:01:00,466 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter

Thank you for the help,
SK

Tagged:

Best Answer

Answers

  • surendrksurendrk Member

    Thanks Sheila. Yes, its not regular whole exome sequence analysis. The data are from cancer panel chip version 2 and I can see that why there is so many duplicates.
    SK

Sign In or Register to comment.