This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
How to fix the error: The GATK no longer supports SAM files without read groups?
I am new to GATK. I want to use GATK to call SNP and indel based on simulated reads and reference. I have 1 read file, reads_dip1.fastq and reference file ref.fa. I first build bam file by BWA and then use GATK HaplotypeCaller to call variants. However, it gave an error message: SAM/BAM/CRAM file dedup.bam is malformed:SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups. Since I have only one read group, do I still need to provide read group information in sam file? How to define read groups in the header? Thanks for your help.
Here is my code:
./bwa index ref.fa
./samtools faidx ref.fa
java -jar picard.jar CreateSequenceDictionary.jar REFERENCE=ref.fa OUTPUT=ref.dict
./bwa mem ref.fa reads_dip1.fastq> aln.sam
java -jar picard.jar SortSam I=aln.sam O=sorted.bam SORT_ORDER=coordinate
java -jar picard.jar MarkDuplicates I=sorted.bam O=dedup.bam METRICS_FILE=metrics.txt
java -jar picard.jar BuildBamIndex INPUT=dedup.bam
java -jar GenomeAnalysisTK.jar -R ref.fa -T HaplotypeCaller -I dedup.bam -o GATKvar.vcf
ERROR MESSAGE: SAM/BAM/CRAM file /Users/mantang/Desktop/sims/dedup.bam is malformed: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups