-rf BadCigar

newbie16newbie16 Posts: 37Member
edited August 2012 in Ask the GATK team

Hello,

I have a bam file where few reads have CIGAR strings that start with Deletions. For example: 440H1D33M1I1D33M.
I am trying to execute BaseRecalibrator (2.0 beta) on this file. However, I see an error below:

"##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/datarig/CGP/GatkAnalysis/NG_2012_05_10_v2.6/WithGATK2.0/with_SamV2/NG_R1/test.ordered.sorted.realigned.bam} is malformed: Read starting with de
letion. Cigar: 440H1D33M1I1D33M. This is an indication of a malformed file, but the SAM spec allows reads starting in deletion. If you are sure you want to use this read, re-run your analysis with
the extra option: -rf BadCigar"

However if I use the -rf BadCigar filter, I still get the same error. The command I used is pasted below.

"java -Xmx4g -jar GenomeAnalysisTK.jar -T BaseRecalibrator -I test.bam -R ucsc.hg19.fasta -knownSites dbsnp_135.hg19.vcf -o recal_data.grp -rf BadCigar"

Could you please let me know what I am doing wrong?

Thanks

Tagged:

Best Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,271Administrator, GATK Dev admin
    Answer ✓

    Hi there,

    You're not doing anything wrong, this is an issue that the tool is not handling properly. We'll fix it asap and post a notice in this thread when it's done.

    Geraldine Van der Auwera, PhD

  • ebanksebanks Broad InstitutePosts: 689Member, Administrator, GATK Dev, Broadie, Moderator, DSDE Dev, GP Member admin
    Answer ✓

    Thanks for the feedback. I'll update the error message now.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,271Administrator, GATK Dev admin
    Answer ✓

    Hi there,

    You're not doing anything wrong, this is an issue that the tool is not handling properly. We'll fix it asap and post a notice in this thread when it's done.

    Geraldine Van der Auwera, PhD

  • newbie16newbie16 Posts: 37Member

    Thanks !
    Also, if I may make a suggestion, the error I get in its message says:
    " If you are sure you want to use this read, re-run your analysis with the extra option: -rf BadCigar" which indicates that the filter will somehow "use" the bad reads and INCLUDE them during analysis. However, the documentation for -rf Badcigar says that this option will DISCARD bad reads. This is conflicting information. could this please be modified as well.

  • ebanksebanks Broad InstitutePosts: 689Member, Administrator, GATK Dev, Broadie, Moderator, DSDE Dev, GP Member admin
    Answer ✓

    Thanks for the feedback. I'll update the error message now.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • maquezaihoumaquezaihou Posts: 9Member

    I had the same error! Adding "-rf BadCigar" doesn't help. Any suggestions?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,271Administrator, GATK Dev admin

    Hi there,

    What version of GATK are you using?

    Geraldine Van der Auwera, PhD

  • maquezaihoumaquezaihou Posts: 9Member

    Hi Geraldinne,
    I am using the latest version. In fact, I am dealing with RNA-seq data. I followed the best practice, everything went well except unifiedgenotyper. I always got the error message "bad cigar", even if I added '-rf BadCigar'. I don't know what should I do. Do you have any suggestion?
    Thank you in advance!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,271Administrator, GATK Dev admin

    You should validate your bam file first. You can also check a set of reads and see if they all have bad cigar problems, or if it's just a subset.

    Geraldine Van der Auwera, PhD

  • blueskypyblueskypy Posts: 253Member ✭✭

    I'm working on exome seq and using GATK version 2.7-2-g6bda569 and still get the same error in HC!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,271Administrator, GATK Dev admin

    @blueskypy, have you validated your files?

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.