Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
GATK IndelRealigner removing too many reads
I'm quite new to SNP calling. I am trying to setup a pipeline which includes GATK IndelRealigner as a final step. My bam file (before realignment) is a little over 1GB. After running the indel realigner however, it's reduced to 18MB! I'm assuming its throwing out way too many reads or something has gone wrong.
I'm calling the indel realigner with the default options as follows:
java -Xmx16g -jar $GATK_DIR/GenomeAnalysisTK.jar \ -T IndelRealigner \ -R /path/to/my/ref \ -I input.bam.intervals \ -targetIntervals input.bam.intervals \ -o realn.bam \
I am generating the read groups using
AddOrReplaceReadGroups.jar (from picard tools) and interval file using GATK
RealignerTargetCreator with default options.
My bam file was generated off the raw reads of experiment
SRA181417 fetched from SRA (after cleaning adapters using cutadapt, mapping to reference using bwa-mem, and removing duplicate reads using picard tools)
I have tried this on other reads and do not have the same issue.
Can anyone comment on why indel realigner could be throwing out so many reads.