This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Apparent loss of reads during realigment
Hi, I've 32 exomes in a merged BAM file, I have read groups to identify each exome, with id, sample, library and platform all set (I hope) correctly, what I can't understand why I have a discrepancy in the logs generated about the number of reads traversed by RealignmentTargetcreator Vs IndelRealigner:
From my RealignmentTargetcreator run I got:
INFO 18:24:17,286 ProgressMeter - Total runtime 5394.91 secs, 89.92 min, 1.50 hours INFO 18:24:17,286 MicroScheduler - 389,565,880 reads were filtered out during traversal out of 2,752,553,629 total (14.15%) INFO 18:24:17,287 MicroScheduler - -> 24573133 reads (0.89% of total) failing BadMateFilter INFO 18:24:17,287 MicroScheduler - -> 229871090 reads (8.35% of total) failing DuplicateReadFilter INFO 18:24:17,287 MicroScheduler - -> 135121273 reads (4.91% of total) failing MappingQualityZeroFilter INFO 18:24:17,298 MicroScheduler - -> 384 reads (0.00% of total) failing UnmappedReadFilter
Vs this from IndelRealigner
INFO 11:47:21,707 MicroScheduler - 0 reads were filtered out during traversal out of 200,100 total (0.00%)
Why have so few of the reads RealignmentTarget creator reported been traversed by IndelRealigner?
My command for IndelRealigner was:
GenomeAnalysisTK-2.4-9-g532efad/GenomeAnalysisTK.jar -T IndelRealigner --maxReadsInMemory 1000000 --maxReadsForRealignment 1000000 -known /data/GATK_bundle/hg19/Mills_and_1000G_gold_standard.indels.hg19.vcf -known /data/GATK_bundle/hg19/1000G_phase1.indels.hg19.vcf -I Merged_dedup.bam -R /data/GATK_bundle/hg19/ucsc.hg19.fasta -targetIntervals forIndelRealigner.intervals -o Merged_dedup_realigned.bam