This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Are there issues with using reads coming from different technologies and having different depths?
We are analyzing a WGS data of 60 samples (6 groups, 10 samples/group) produced by HiSeq4000. The mean coverage per sample is 25x (lowest sample is 15x).
Now we realized we need to sequence more samples in order to better estimate the allele frequencies. Due to budget and technical constrains we came down to sequence 90 samples (6 groups, 15 samples/group) at a target coverage of 5x. This time on a NovaSeq platform.
Our aim is to do population analysis using SNP allele frequencies after combining the Hiseq4000 (25x coverage) data and the NovaSeq (5x coverage) data.
My plan for the new batch (NovaSeq - 5x) is to run it through the steps of GATK's best practices until HaplotypeCaller and then combine it with the original batch (Hiseq4000 - 25x) using CombineGVCFs and do joint calling with GenotypeGVCF.
I am working with mice samples, so I will do VQSR.
I have the following questions
Is there an issue with joint calling variants and genotypes using information from different thechnologies?
Is 5x too low to confidently determine genotypes? in other words, would such results be publishable?
A similar thread here but data was produced with the same thechnology.