I am using the Featured Worskpace for PreProcessingForVariantDiscovery_GATK4 with my data to produce a clean BAM for GATK4 Variant discovery analyses using Best Practices. PreProcessingForVariantDiscovery_GATK4.MarkDuplicates is running for 13 hours. Is this normal? My unmapped bam is 7.97 GB long. I am using 8 CPUs, since it did not allow me more than that (quota exceeded error). Pls help.
I have worked some time on a mRNAseq set, single-end. Its a high quality set and lots of biological replicates (200+).
My question is, how could I best contribute to the methodology used for SNPs call in mRNAseq? What do we need tested to improve this method?