To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

RealignerTargetCreator / IndelRealigner

Hi,

I am using GATK 3.1-1 on Ion torrent data. I am using human reference as provided by Ion Torrent people. I used picard to remove the duplicates. RealignerTargetCreator / IndelRealigner tools (with indel.vcf files from 1000G_phase1 and Mills) runs without any error on TMAP produced .bam files. But I am not sure whether the tools are doing what they are suppose to do? The stdout with informative lines are below.

Can anyone help me interpreting -
219901 reads (97.07% of total) failing DuplicateReadFilter (Does it mean - none of the reads were duplicate?)
1539 reads (0.68% of total) failing MappingQualityZeroFilter (Does it mean 1539 reads are at multiple location or low quality score?)

INFO 17:57:10,596 ProgressMeter - chr2:108035321 3.29e+08 30.0 s 0.0 s 11.5% 4.3 m 3.8 m
INFO 17:57:40,597 ProgressMeter - chr4:55195129 7.26e+08 60.0 s 0.0 s 24.1% 4.2 m 3.2 m
INFO 17:58:10,599 ProgressMeter - chr6:75886121 1.12e+09 90.0 s 0.0 s 36.8% 4.1 m 2.6 m
INFO 17:58:40,600 ProgressMeter - chr8:136959289 1.52e+09 120.0 s 0.0 s 49.4% 4.0 m 2.0 m
INFO 17:59:10,601 ProgressMeter - chr11:127358689 1.92e+09 2.5 m 0.0 s 62.8% 4.0 m 88.0 s
INFO 17:59:40,602 ProgressMeter - chr15:60257869 2.31e+09 3.0 m 0.0 s 76.5% 3.9 m 55.0 s
INFO 18:00:10,604 ProgressMeter - chr20:41377801 2.74e+09 3.5 m 0.0 s 89.2% 3.9 m 25.0 s
INFO 18:00:40,605 ProgressMeter - chrM:16485 3.06e+09 4.0 m 0.0 s 100.0% 4.0 m 0.0 s
INFO 18:01:00,391 ProgressMeter - done 3.10e+09 4.3 m 0.0 s 100.0% 4.3 m 0.0 s
INFO 18:01:00,392 ProgressMeter - Total runtime 259.80 secs, 4.33 min, 0.07 hours
INFO 18:01:00,451 MicroScheduler - 221440 reads were filtered out during the traversal out of approximately 226527 total reads (97.75%)
INFO 18:01:00,451 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter
INFO 18:01:00,452 MicroScheduler - -> 0 reads (0.00% of total) failing BadMateFilter
INFO 18:01:00,452 MicroScheduler - -> 219901 reads (97.07% of total) failing DuplicateReadFilter
INFO 18:01:00,452 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO 18:01:00,452 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO 18:01:00,452 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO 18:01:00,453 MicroScheduler - -> 1539 reads (0.68% of total) failing MappingQualityZeroFilter
INFO 18:01:00,453 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
INFO 18:01:00,453 MicroScheduler - -> 0 reads (0.00% of total) failing Platform454Filter
INFO 18:01:00,466 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter

Thank you for the help,
SK

Tagged:

Best Answer

Answers

  • Thanks Sheila. Yes, its not regular whole exome sequence analysis. The data are from cancer panel chip version 2 and I can see that why there is so many duplicates.
    SK

Sign In or Register to comment.