AddOrReplaceReadGroups

I am processing single-cell RNAseq data which I downloaded using GEO accession number (it was in .sra format which I converted to .bam)

Now I'm trying to run the scRNAseq pipeline and got stuck since it seems like I don't have read groups in the header.

I'm trying to use Picard's AddOrReplaceReadGroups with the following command:
java -Xmx15g picard.jar AddOrReplaceReadGroups \
I=SRR5164436.bam \
O=SRR5164436_RG.bam \
RGID=bam1 \
RGLB=lib1 \
RGPL=illumina \
RGPU=ad_lib_Chow1 \
RGSM=sra36
But I get this error:
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 1, Read name 1, RG ID on SAMRecord not found in header: 1

I don't understand why it is happening. Can you please help? See below the complete error message.

Also, can you please explain where can I get the RGPU, if I don't have the .fastq file? If I cannot, I'd have to put some arbitrary number?

11:51:55.904 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/cvar/jhlab/Kathy/Drop-seq/picard-2.12.2/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Oct 23 11:51:55 EDT 2017] AddOrReplaceReadGroups INPUT=SRR5164436.bam OUTPUT=SRR5164436_RG.bam RGID=bam1 RGLB=lib1 RGPL=illumina RGPU=ad_lib_Chow1 RGSM=sra36    VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Mon Oct 23 11:51:55 EDT 2017] Executing as kushakov@uger-c065.broadinstitute.org on Linux 2.6.32-696.6.3.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Deflater: Intel; Inflater: Intel; Picard version: 2.12.2-SNAPSHOT
INFO 2017-10-23 11:51:55 AddOrReplaceReadGroups Created read group ID=bam1 PL=illumina LB=lib1 SM=sra36
 
[Mon Oct 23 11:51:56 EDT 2017] picard.sam.AddOrReplaceReadGroups done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2058354688
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 1, Read name 1, RG ID on SAMRecord not found in header: 1
at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:454)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:812)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.(BAMFileReader.java:783)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.(BAMFileReader.java:771)
at htsjdk.samtools.BAMFileReader.getIterator(BAMFileReader.java:474)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.iterator(SamReader.java:478)
at picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:141)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

Best Answer

Answers

  • kushakovkushakov Member, Broadie

    @Sheila said:
    @kushakov
    Hi,

    It looks like you have read groups in your reads that conflict with the read groups you are trying to add. In your case, you can simply add the RG fields manually to the BAM header instead of using AddOrReplaceGroups. If you do a google search, you should find some more tips on how to do that.

    -Sheila

    So instead of using .bam files converted from .sra I converted .fastq to .bam (using the default settings of FastqtoSam) and it solved my problems for the downstream pipeline.

    Thanks!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @kushakov
    Hi,

    Thank you for reporting your solution :smile:

    -Sheila

Sign In or Register to comment.