Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Apparent loss of reads during realigment
Hi, I've 32 exomes in a merged BAM file, I have read groups to identify each exome, with id, sample, library and platform all set (I hope) correctly, what I can't understand why I have a discrepancy in the logs generated about the number of reads traversed by RealignmentTargetcreator Vs IndelRealigner:
From my RealignmentTargetcreator run I got:
INFO 18:24:17,286 ProgressMeter - Total runtime 5394.91 secs, 89.92 min, 1.50 hours INFO 18:24:17,286 MicroScheduler - 389,565,880 reads were filtered out during traversal out of 2,752,553,629 total (14.15%) INFO 18:24:17,287 MicroScheduler - -> 24573133 reads (0.89% of total) failing BadMateFilter INFO 18:24:17,287 MicroScheduler - -> 229871090 reads (8.35% of total) failing DuplicateReadFilter INFO 18:24:17,287 MicroScheduler - -> 135121273 reads (4.91% of total) failing MappingQualityZeroFilter INFO 18:24:17,298 MicroScheduler - -> 384 reads (0.00% of total) failing UnmappedReadFilter
Vs this from IndelRealigner
INFO 11:47:21,707 MicroScheduler - 0 reads were filtered out during traversal out of 200,100 total (0.00%)
Why have so few of the reads RealignmentTarget creator reported been traversed by IndelRealigner?
My command for IndelRealigner was:
GenomeAnalysisTK-2.4-9-g532efad/GenomeAnalysisTK.jar -T IndelRealigner --maxReadsInMemory 1000000 --maxReadsForRealignment 1000000 -known /data/GATK_bundle/hg19/Mills_and_1000G_gold_standard.indels.hg19.vcf -known /data/GATK_bundle/hg19/1000G_phase1.indels.hg19.vcf -I Merged_dedup.bam -R /data/GATK_bundle/hg19/ucsc.hg19.fasta -targetIntervals forIndelRealigner.intervals -o Merged_dedup_realigned.bam