This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
GATK IndelRealigner removing too many reads
I'm quite new to SNP calling. I am trying to setup a pipeline which includes GATK IndelRealigner as a final step. My bam file (before realignment) is a little over 1GB. After running the indel realigner however, it's reduced to 18MB! I'm assuming its throwing out way too many reads or something has gone wrong.
I'm calling the indel realigner with the default options as follows:
java -Xmx16g -jar $GATK_DIR/GenomeAnalysisTK.jar \ -T IndelRealigner \ -R /path/to/my/ref \ -I input.bam.intervals \ -targetIntervals input.bam.intervals \ -o realn.bam \
I am generating the read groups using
AddOrReplaceReadGroups.jar (from picard tools) and interval file using GATK
RealignerTargetCreator with default options.
My bam file was generated off the raw reads of experiment
SRA181417 fetched from SRA (after cleaning adapters using cutadapt, mapping to reference using bwa-mem, and removing duplicate reads using picard tools)
I have tried this on other reads and do not have the same issue.
Can anyone comment on why indel realigner could be throwing out so many reads.