GenomicsDBImport--do multiple samples need individual databases?
I am trying to do variant calling on a reference transcriptome that I've produced, but I have some questions about functionality of GenomicsDBImport, and downstream in SelectVariants. I know that you can only look at one genomic interval per go, but do they all need individual databases?
I've begun running this command from bash, with
$contigs as the list of contigs to go through,
$path gives the absolute path, and
files.txt representing all my samples.
for i in $contigs do gatk GenomicsDBImport \ $(cat files.txt) \ --genomicsdb-workspace-path $path/my_database \ --intervals $i done
I get this error after running it
A USER ERROR has occurred: The workspace you're trying to create already exists. ( /gatk/my_data/my_database ) Writing into an existing workspace can cause data corruption. Please choose an output path that doesn't already exist. Does this mean that I need to create an individual result for each contig? And how does this influence the downstream SelectVariants command?