The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!
Difference in output file from Picard's MarkDuplicates
Hi, I'm working on a pipeline which takes a RNA-seq bam file and looks for overlaps using R. When I run the R script on my original BAM file it works fine, however when I run it on the bam file produced by MarkDuplicates it throws this error:
*tmp*, "queryHits", value = integer(0)) :
replacement has 0 rows, data has 2
Calls: $<- -> $<-.data.frame
From what I can find online this means that the R script has been asked to find a variable in the file which it cannot find. The R script uses these packages: library(GenomicRanges) and library(Biostrings)
This is the section of R script which falls over:
overlaps <- findOverlaps(GRbam,GR)
overs <- data.frame(NA,rownames=c(1:length(overlaps)))
I included both the following options when calling MarkDuplicates to try and reduce the formatting changes in the new file:
Has anyone come across a similar issue and know what might be different in the new bam file? I've compared both the headers and the only difference is that the original file has @HD:VN 1.0, while the MarkDuplicates output has @HD:VN 1.5. Could this be the issue? I can't find much online about the differences.
Please feel free to ask for more information if I haven't been clear.