Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

Difference in output file from Picard's MarkDuplicates

Hi, I'm working on a pipeline which takes a RNA-seq bam file and looks for overlaps using R. When I run the R script on my original BAM file it works fine, however when I run it on the bam file produced by MarkDuplicates it throws this error:

Error in $<-.data.frame(*tmp*, "queryHits", value = integer(0)) :
replacement has 0 rows, data has 2
Calls: $<- -> $<-.data.frame
Execution halted

From what I can find online this means that the R script has been asked to find a variable in the file which it cannot find. The R script uses these packages: library(GenomicRanges) and library(Biostrings)
This is the section of R script which falls over:

overlaps <- findOverlaps(GRbam,GR)
printWithTimeStamp("Collating data:\n")
overs <- data.frame(NA,rownames=c(1:length(overlaps)))
printWithTimeStamp(" queryHits\n")
overs$queryHits<-queryHits(overlaps)

I included both the following options when calling MarkDuplicates to try and reduce the formatting changes in the new file:
REMOVE_DUPLICATES=TRUE
PROGRAM_RECORD_ID=null

Has anyone come across a similar issue and know what might be different in the new bam file? I've compared both the headers and the only difference is that the original file has @HD:VN 1.0, while the MarkDuplicates output has @HD:VN 1.5. Could this be the issue? I can't find much online about the differences.
Please feel free to ask for more information if I haven't been clear.

Tagged:

Best Answer

  • Accepted Answer

    Problem solved! I had the input and output as the same name (to try to reduce the effect on the rest of the pipeline), but I changed the output name and just had to deal with altering the pipeline as such, and it seems to be working fine now!

Answers

Sign In or Register to comment.