Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
SplitNCigarReads generates badCigar reads
For our RNA variant calling pipeline, we follow the GATK best practices workflow (STAR 2-pass -> mark duplicates & sort -> SplitNTrim -> indel realignement -> base recalibration -> variantcalling).
After SplitNCigarReads is succesfully run, the InderRealigner filters out reads because of failing BadCigarFilter. That seems strange to me, because I don't think that SplitNCigarReads should cause the reads with a BadCigarFilter.
Can someone explain to me what is happening in SplitNCigarReads to the reads that are filtered out?
For example, here are the number of reads after each steps for a Sample:
So for this sample, 621 reads are filtered out, which is also reported in the logfile of indelRealigner:
INFO 11:10:40,726 MicroScheduler - 621 reads were filtered out during the traversal out of approximately 22177873 total reads (0.00%)
INFO 11:10:40,726 MicroScheduler - -> 621 reads (0.00% of total) failing BadCigarFilter
INFO 11:10:40,727 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
These are the commands (GATK v3.4) I used (untill realignment):
STAR --genomeDir Homo_sapiens.GRCh37 --runThreadN 4 --outFileNamePrefix sample_ --outReadsUnmapped Fastx --outSAMtype BAM SortedByCoordinate --readFilesCommand zcat --outSJfilterIntronMaxVsReadN 10000000 --chimJunctionOverhangMin 15 --chimSegmentMin 15 --twopassMode Basic --readFilesIn Sample_L001_R1_001.fastq.gz,Sample_L002_R1_001.fastq.gz,Sample_L003_R1_001.fastq.gz,Sample_L004_R1_001.fastq.gz
java -jar AddOrReplaceReadGroups.jar INPUT=Sample_Aligned.sortedByCoord.out.bam OUTPUT=Sample_sorted.bam RGID=Sample_L004_R1 RGLB=R1 RGPL=ILLUMINA RGPU=L004 RGSM=Sample
java -jar MarkDuplicates.jar INPUT=Sample_sorted.bam OUTPUT=Sample_sorted_dedupped.bam CREATE_INDEX=true VALIDATION_STRINGENCY=SILENT M=Sample_markDup_metrics.txt
java -jar GenomeAnalysisTK-3.4-46/GenomeAnalysisTK.jar -T SplitNCigarReads -R Homo_sapiens.GRCh37.GATK.illumina.fa -I Sample_sorted_dedupped.bam -o Sample_sorted_dedupped_splitN.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS
java -jar GenomeAnalysisTK-3.4-46/GenomeAnalysisTK.jar -T RealignerTargetCreator -R Homo_sapiens.GRCh37.GATK.illumina.fa -I Sample_sorted_dedupped_splitN.bam -o Sample_target.intervals
java -jar GenomeAnalysisTK-3.4-46/GenomeAnalysisTK.jar -T IndelRealigner -RHomo_sapiens.GRCh37.GATK.illumina.fa -I Sample_sorted_dedupped_splitN.bam -targetIntervals Sample_target.intervals -o Sample_sorted_dedupped_splitN_realigned.bam