This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
GATK4 importGenomicsDB multiple lanes per sample
I've been trying to genotype a set of samples using GATK4. Some samples were sequenced in multiple lanes, so they have multiple fastq files. I've been keeping each individual lane separate, but keeping track of sample identity through read group. In GATK3, this would work fine and I'd be able to just run each lane individually and the final VCF would merge samples based on read group ID.
I just tried GenomicsDBImport and it gave me an error
A USER ERROR has occurred: Duplicate sample: HK24_16_01_R_003. Sample was found in both file...
What is the best way to work around this? In the example code provided for GATK4, the bam files are merged after alignment, but before duplicate marking. Is that the recommended method? This makes sense, but perhaps it should be noted more clearly in the documentation.