SplitNCigarReads complains about Badly form genome loc!

Dear GATK developers,

Thanks for this nice and practical tool with great supports! I tried to find relevant information in forum for this error reported by splitNCigarReads walker but couldn't find any information. So I hope I am not bulking the forum. Here is the error

MESSAGE: Badly formed genome loc: Parameters to GenomeLocParser are incorrect:The stop position 69 is less than start 70 in contig JH375787.1

The same reference has been used as in previous steps for RG and MarkDup. I get this error for some of my samples not all.

It's the command if it can help:
java -jar /sw/apps/bioinfo/GATK/3.1.1//GenomeAnalysisTK.jar -T SplitNCigarReads -R /home/nimar/b2013097/private/nobackup/data/Reference/Reference.fa -I Sample_50T.merged.Reordered.sort.RG.MarkDup-filtered-unmapped.bam -U ALLOW_N_CIGAR_READS -o Sample_50T.merged.Reordered.sort.RG.MarkDup-filtered-unmapped.SplitN.bam
Looking forward to your comments

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Can you please retry with version 3.2?There were a few bug fixes in that version that may apply to your issue. In addition, you should also try validating your bam file with ValidateSamFile from Picard tools.

  • nimarafatiUUnimarafatiUU SwedenMember

    Dear Geraldine,

    Thanks for your reply. I tried it by versions 3.2.0 and 3.2.2 and got the same error when at the end of the analysis, GATK wants to print the report.

    It's very confusing, since if there is a problem with bam files why GATK does all the steps and report an error at the end. Also, why for some of the bam files, generated based on the same pipeline/commands, it works fine but not for the rest.

    Following your suggestion I tried ValidateSamFile and got this message/ERROR which also looks odd:
    ERROR: Record 56, Read name HISEQ:112:D2D4AACXX:5:1111:11215:15877, NM tag (nucleotide differences) in file [1] does not match reality [13]
    ERROR: Record 57, Read name HISEQ:112:D2D4AACXX:5:1309:12383:56073, NM tag (nucleotide differences) in file [1] does not match reality [13]
    ERROR: Record 58, Read name HISEQ:112:D2D4AACXX:5:2107:14617:33091, NM tag (nucleotide differences) in file [1] does not match reality [13]
    ERROR: Record 59, Read name HISEQ:112:D2D4AACXX:5:2107:18950:79528, NM tag (nucleotide differences) in file [1] does not match reality [13]
    ERROR: Record 60, Read name HISEQ:112:D2D4AACXX:5:2115:13238:59098, NM tag (nucleotide differences) in file [1] does not match reality [13]
    ERROR: Record 61, Read name HISEQ:112:D2D4AACXX:5:2202:1757:86972, NM tag (nucleotide differences) in file [1] does not match reality [13]
    ERROR: Record 62, Read name HISEQ:112:D2D4AACXX:5:2204:13488:98626, NM tag (nucleotide differences) in file [0] does not match reality [12]
    .
    .
    .
    .
    Is it related to the problem?
    I have been using gsnap to align RNA-seq and DNA-sea reads. By Performing GATK SNP-calling pipeline for DNA-seq data aligned by gsnap, I did not get any errors.

    Thanks for your help.
    Regards,
    Nima

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Nima,

    What do you mean by "print the report"? It's quite possible that the error is somewhere at the end of your file.

    You can ignore the NM tag errors, they are not important. There is an option in ValidateSamFile to ignore unimportant errors, though I forget what it is exactly. It's in the Picard documentation.

Sign In or Register to comment.