- Last Active
In my case, CombineGVCFs runs faster than GenomicDBImport no matter what sample size or chromosome size. I'm using CombineGVCFs in my pipeline now. Although I still don't know why GenomicDBImport is slow in my hand but it's recommended in the Best …
@Sheila I have read this thread before I posted. Here I quote, "The existing tool does scale to tens of thousands of samples successfully, but you need to use smaller intervals (not a whole chromosome at once), and use the --batchSize argument t…
Couldn't agree more. I'm testing -L smaller.intervals right now. ~20 Mb takes one week. I can take that. However, I need to verify the results before moving on. I don't know how much differences it causes. I compared CombineGVCFs with GenomicsDBIm…
@SkyWarrior Thank you for sharing the codes with me. I did the same as you, which is to run GenomicsDBImport with each chromosome. However, each chromosome will takes 42 days (based on 5 hours/1 Mb). So my situation is that GenomicsDBImport take…
@SkyWarrior Hi, how did you parallelize gvcf imports into a DB using a perl script? Are you saying that I can split 140 samples into 14 ten-sample batches and GenomicsDBImport into the same DB in order to parallelize this step? Something like, ga…
I agree with you that I don't need --batch-size for my 140 samples. The real problem is how to run GenomicsDBImport within a reasonable time, say a few days. It took 5 hours for 1 Mb, which means at least 6 weeks for a 200-Mb chromosome.
@SkyWarrior said: You may skip consolidate step in GenomicsDBImport unless you have gazillions of batches to import. I am working with 1500~ samples in 50 sample batches and 30 batches do not really disturb the system during genotyping. You may s…