Difference in output file from Picard's MarkDuplicates
Hi, I'm working on a pipeline which takes a RNA-seq bam file and looks for overlaps using R. When I run the R script on my original BAM file it works fine, however when I run it on the bam file produced by MarkDuplicates it throws this error:
*tmp*, "queryHits", value = integer(0)) :
replacement has 0 rows, data has 2
Calls: $<- -> $<-.data.frame
From what I can find online this means that the R script has been asked to find a variable in the file which it cannot find. The R script uses these packages: library(GenomicRanges) and library(Biostrings)
This is the section of R script which falls over:
overlaps <- findOverlaps(GRbam,GR)
overs <- data.frame(NA,rownames=c(1:length(overlaps)))
I included both the following options when calling MarkDuplicates to try and reduce the formatting changes in the new file:
Has anyone come across a similar issue and know what might be different in the new bam file? I've compared both the headers and the only difference is that the original file has @HD:VN 1.0, while the MarkDuplicates output has @HD:VN 1.5. Could this be the issue? I can't find much online about the differences.
Please feel free to ask for more information if I haven't been clear.