Insert length filtering problem

Hello, I'm following the HaplotypeCaller pipeline for SNP and indel calling, however, I'm facing the following problem. The sequence I'm analyzing has two copies of the same gene with reverse orientation separated by a sequence of approximately 2kbs. Therefore, when I'm mapping the reads from the fastq files some of the reads are mapping on the wrong copy of the gene which is quite obvious as the insert length is much greater than the expected insert length. I tried to filter the sam files using custom bash scripts for the 9th column (insert length column on the sam file) but since the alignment is done with bwa mem, all of the values for the 9th column are set to zero. I also tried to use the gatk MaxInsertSizeFilter read filter but it didn't seem to influence the output of the HaplotypeCaller. I am aware than HaplotypeCaller is realigning the reads when necessary and is also determines the likelihoods of the haplotypes but it seems that in my case I'm missing some of the SNPs in the final vcf file and I'm pretty convinced it has to do with the wrong mapping of the reads. Does anybody have any idea how I can resolve this? I would really appreciate any help.

Issue · Github
by Sheila

Issue Number
2753
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Answers

Sign In or Register to comment.