We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

How to fix the error: The GATK no longer supports SAM files without read groups?

Hi,

I am new to GATK. I want to use GATK to call SNP and indel based on simulated reads and reference. I have 1 read file, reads_dip1.fastq and reference file ref.fa. I first build bam file by BWA and then use GATK HaplotypeCaller to call variants. However, it gave an error message: SAM/BAM/CRAM file dedup.bam is malformed:SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups. Since I have only one read group, do I still need to provide read group information in sam file? How to define read groups in the header? Thanks for your help.

Here is my code:

./bwa index ref.fa
./samtools faidx ref.fa
java -jar picard.jar CreateSequenceDictionary.jar REFERENCE=ref.fa OUTPUT=ref.dict

./bwa mem ref.fa reads_dip1.fastq> aln.sam

java -jar picard.jar SortSam I=aln.sam O=sorted.bam SORT_ORDER=coordinate

java -jar picard.jar MarkDuplicates I=sorted.bam O=dedup.bam METRICS_FILE=metrics.txt

java -jar picard.jar BuildBamIndex INPUT=dedup.bam

java -jar GenomeAnalysisTK.jar -R ref.fa -T HaplotypeCaller -I dedup.bam -o GATKvar.vcf

ERROR MESSAGE: SAM/BAM/CRAM file /Users/mantang/Desktop/sims/dedup.bam is malformed: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups

Best,

Man

Answers

  • jrandalljrandall Member

    Man,

    The usual time to set read group information would be during the alignment step. In your example, if you change the bwa mem command line to:

    bwa mem -R '@RG\tID:dip1\tSM:dip1' ref.fa reads_dip1.fastq > aln.sam

    Then I think GATK should be happy with that output (when run through your subsequent steps). Note that I've arbitrarily named your read group 'dip1' but you can call it something else if you prefer.

    If this is a large file, you might want to avoid running the bwa mem alignment again (nothing about the alignment will change when adding a read group, the only difference will be that there will be an @RG header line and a corresponding RG:Z: tag on all of the alignment lines. Therefore, instead of re-running bwa mem, you could instead edit the SAM header to add the read group header and a corresponding RG:Z:dip1 tag at the end of every non-header line in the file. If you don't know how to do that, it will probably be easier just to run bwa mem again.

    Cheers,

    Josh.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Note that you can add read groups easily using Picard AddIrReplaceReadGroups.

    A more general comment is that you're not following our recommended best practices for pre-processing. There are several steps required before running HaplotypeCaller. See the Guide for more information.

Sign In or Register to comment.