We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GenomicsDBimport - correct syntax several intervals/chromosomes

I read this blog-post from Geraldine "UsingGenomicsDBImport in practice";
saying something about how to import several chromosomes;

gatk GenomicsDBImport \
-V data/gvcfs/mother.g.vcf \
-V data/gvcfs/father.g.vcf \
-V data/gvcfs/son.g.vcf \
--genomicsdb-workspace-path my_database \
--intervals chr20,chr21

So I used similar With
-L 1, 2 (like above)

When I do similar with my gvcfs (there are no "Chr" in the chromosome-names) I get an error mesage;
"A USER ERROR has occurred: Badly formed genome unclippedLoc: Query interval "1,2" is not valid for this input."
When I use -L 1 the program runs and I can continue successfully With GenotypeGVCFs
When I use -L 1 -L 2 the program runs, but there (seems to be) no data for chromosome 2 (and maybe more errors too)?

what is the correct syntax if I want to make batches of several chromosomes?
In this blog-post ( https://gatkforums.broadinstitute.org/gatk/discussion/24371/a-problem-about-genomicsdbimport-and-combinegvcfs#latest ) it is said (bhanuGandham ) that; "GenomicsDBImport is used for samples in the order of thousands. For <1000 samples it is better to use CombineGVCFs". This was New information.
Is 1000 an "official/recommended" limit for the number of WGS-samples to GenomicsDBImport resp CombineGVCFs?
Is a sample an individual with all chromosomes or is it meant one interval/chromosome from 1000 individuals?


Sign In or Register to comment.