This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Is it possible to disable DuplicateReadFilter in UnifiedGenotyper?
My understanding of DuplicateReadFilter is that is removes PCR duplicates (ie: identical reads that map to the same location in the genome). Is there a way to prevent UnifiedGenotyper from using this filter?
Many of my 'duplicate' reads are not really duplicates. I have 50bp single ended yeast RNA-seq data, and 10 samples/lane results in highly over-sampled data. Removing duplicates results in an undercounting of reads at the most highly expressed genes, and therefore an increased number of sub-clonal variants at these genes (because the reads from the major clonal population are discarded, but reads of sub-clonal populations are kept because they have a different sequence). At least I think that is what is happening.