Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Error in BQSR: more than one read group with same ID
I have merged single-sample sorted .bam files into a single .bam file per read group with the associated @RG lines for every sample using samtools merge. When I try to run this on BQSR, I get an error that there is more than one read group with the same ID. I'm confused, because this is the point, but I'm not sure exactly where the tool is pulling this information from that is causing the problem. I've found a previous thread where someone ran this with the same read group info, and didn't seem to have a problem: http://gatkforums.broadinstitute.org/gatk/discussion/5986/bqsr-readgroups#latest
I'm using the entire read group at once for BQSR so that I can use 1B bases as the input since some of my samples are small enough they wouldn't have sufficient numbers of reads alone. I've run a test on individual samples (with unique RG values appended to the real value so there are no duplicates) and it completes the task fine.
I suspect this may be an issue with the file merge, but I'm not sure. During BQSR, is the sample ID used for anything? For example, if I run all the data from a single lane without including sample information (i.e. just include the first header), could I then subsequently use that calibration report on the individual bam files? Or does PrintReads look for the individual samples as well when re-writing?