What would be the correct approach to simulate reads from two parents?

I have simulated pair end reads from two parental lines. These reads were combined to simulate an F1 cross. Later I aligned the reads with one of the parents and generate the BAM files that were then read by Genome STRiP.
I'm getting these two different errors.

java.lang.RuntimeException: Mismatched read pair records found:

Not sure how to interpret and fix this.

java.lang.IllegalArgumentException: Read pair records have different read groups: scf7180000037249-id.whte-28-21004: ID_3_4,ID_12_4

When I generated the RG tags I run this command:

java -jar /share/apps/picard-tools/AddOrReplaceReadGroups.jar I=F1_indiv${i}4x_bwa_mem_sorted.bam O=F1_indiv${i}_4x_bwa_mem_sorted_rg.bam SORT_ORDER=coordinate CREATE_INDEX=true RGPL=illumina RGID=ID${i}4 RGSM=indiv${i} RGPU=ART${i}_4 RGLB=ART_popv3_whte

But this is not considering that half of the reads in the fastq file were from one parent and the other from the second parent. Is there a way to correctly generate the RG tags in this scenario?

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    Regarding the first problem, you would have to provide more information about the error (i.e. the rest of the output).

    I suspect the problem in both cases, however, is that the bam files you generated don't follow the SAM file conventions.
    The second error is pretty self-explanatory: If you have a read pair, we expect the RG tag to be the same on both mates.

