We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GATK4 importGenomicsDB multiple lanes per sample

I've been trying to genotype a set of samples using GATK4. Some samples were sequenced in multiple lanes, so they have multiple fastq files. I've been keeping each individual lane separate, but keeping track of sample identity through read group. In GATK3, this would work fine and I'd be able to just run each lane individually and the final VCF would merge samples based on read group ID.

I just tried GenomicsDBImport and it gave me an error

A USER ERROR has occurred: Duplicate sample: HK24_16_01_R_003. Sample was found in both file...

What is the best way to work around this? In the example code provided for GATK4, the bam files are merged after alignment, but before duplicate marking. Is that the recommended method? This makes sense, but perhaps it should be noted more clearly in the documentation.

Best Answer

Answers

  • shuMSRshuMSR MalaysiaMember
    hi, i also have the same error since I use same sample name for all sample but uniq ID and PU. so did I need to rerun addreadgroup again? there is no clear explanation about how to add RG
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @shuMSR

    Please post the version of GATK you are using, the exact command and the entire error log.

Sign In or Register to comment.