Errors about missing read group (RG) information

delangeldelangel Posts: 71GATK Dev mod
edited May 15 in Common Problems

The GATK expects specific information in the header of BAM files (as detailed in the input requirements FAQs), and will fail with an error if it does not find that information.

So what do you do? You use a Picard tool called AddOrReplaceReadGroups to add the missing information to your BAM file.

Here's an example:

# throws an error
java -jar GenomeAnalysisTK.jar \
    -T HaplotypeCaller \
    -R reference.fasta \
    -I reads_without_RG.bam \
    -o output.vcf

# fix the read groups
java -jar picard.jar AddOrReplaceReadGroups \
    I= reads_without_RG.bam \
    O=  reads_with_RG.bam \
    SORT_ORDER=coordinate \
    RGID=foo \
    RGLB=bar \
    RGPL=illumina \
    RGSM=Sample1 \

# runs without error
java -jar GenomeAnalysisTK.jar \
    -T HaplotypeCaller \
    -R reference.fasta \
    -I reads_with_RG.bam \
    -o output.vcf

Note that if you don't know what information to put in the read groups, you should ask whoever performed the sequencing or provided the BAM to give you the metadata you need.

This tool is part of the Picard package.

Post edited by Geraldine_VdAuwera on
Sign In or Register to comment.