The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Member Posts: 42
edited August 2012

Hello,

I have a bam file where few reads have CIGAR strings that start with Deletions. For example: 440H1D33M1I1D33M.
I am trying to execute BaseRecalibrator (2.0 beta) on this file. However, I see an error below:

"##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/datarig/CGP/GatkAnalysis/NG_2012_05_10_v2.6/WithGATK2.0/with_SamV2/NG_R1/test.ordered.sorted.realigned.bam} is malformed: Read starting with de
letion. Cigar: 440H1D33M1I1D33M. This is an indication of a malformed file, but the SAM spec allows reads starting in deletion. If you are sure you want to use this read, re-run your analysis with

However if I use the -rf BadCigar filter, I still get the same error. The command I used is pasted below.

"java -Xmx4g -jar GenomeAnalysisTK.jar -T BaseRecalibrator -I test.bam -R ucsc.hg19.fasta -knownSites dbsnp_135.hg19.vcf -o recal_data.grp -rf BadCigar"

Could you please let me know what I am doing wrong?

Thanks

Hi there,

You're not doing anything wrong, this is an issue that the tool is not handling properly. We'll fix it asap and post a notice in this thread when it's done.

Geraldine Van der Auwera, PhD

Thanks for the feedback. I'll update the error message now.

Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

• Member Posts: 42

Thanks !
Also, if I may make a suggestion, the error I get in its message says:
" If you are sure you want to use this read, re-run your analysis with the extra option: -rf BadCigar" which indicates that the filter will somehow "use" the bad reads and INCLUDE them during analysis. However, the documentation for -rf Badcigar says that this option will DISCARD bad reads. This is conflicting information. could this please be modified as well.

• Member Posts: 10

Hi there,

What version of GATK are you using?

Geraldine Van der Auwera, PhD

• Member Posts: 10

Hi Geraldinne,
I am using the latest version. In fact, I am dealing with RNA-seq data. I followed the best practice, everything went well except unifiedgenotyper. I always got the error message "bad cigar", even if I added '-rf BadCigar'. I don't know what should I do. Do you have any suggestion?

You should validate your bam file first. You can also check a set of reads and see if they all have bad cigar problems, or if it's just a subset.

Geraldine Van der Auwera, PhD

• Member Posts: 266 ✭✭

I'm working on exome seq and using GATK version 2.7-2-g6bda569 and still get the same error in HC!

@blueskypy, have you validated your files?

Geraldine Van der Auwera, PhD

• spainMember Posts: 52