Picard MarkDuplicates vs samtools rmdup for variant calling with GATK
Dear GATK team,
I'm going to do variant calling for several tens of samples using hg38 reference with GATK. I have several questions about this process. They are partially covered on forums and in FAQs, but I'd like to clarify some points:
1) Am I right that MarkDuplicates can process a BAM file that contains both paired-end and single-end reads? (Picard FAQ hints it can, but just to be sure.)
2) Am I right that MarkDuplicates is significantly slower than samtools rmdup (because of its algorithm that marks not only dupes from the same chromosome, but also dupes from different chromosomes)?
3) Is there any evidence that use of MarkDuplicates is significantly better for the downstream analysis with GATK than use of samtools rmdup? (Of course, MarkDuplicates is used in the Best Practices, but Picard tools are used everywhere in that guide.)
1) I use bowtie2 --very-sensitive for read mapping.
2) I'd like to get a gVCF file for each sample.