Attention: Want an end-to-end pipelining solution for GATK Best Practices?


Check out Terra here! For more details on whether this is the right fit for you checkout our blogs here.

What would be the correct approach to simulate reads from two parents?

I have simulated pair end reads from two parental lines. These reads were combined to simulate an F1 cross. Later I aligned the reads with one of the parents and generate the BAM files that were then read by Genome STRiP.
I'm getting these two different errors.

1.
java.lang.RuntimeException: Mismatched read pair records found:

Not sure how to interpret and fix this.

2.
java.lang.IllegalArgumentException: Read pair records have different read groups: scf7180000037249-id.whte-28-21004: ID_3_4,ID_12_4

When I generated the RG tags I run this command:

java -jar /share/apps/picard-tools/AddOrReplaceReadGroups.jar I=F1_indiv${i}4x_bwa_mem_sorted.bam O=F1_indiv${i}_4x_bwa_mem_sorted_rg.bam SORT_ORDER=coordinate CREATE_INDEX=true RGPL=illumina RGID=ID${i}4 RGSM=indiv${i} RGPU=ART${i}_4 RGLB=ART_popv3_whte

But this is not considering that half of the reads in the fastq file were from one parent and the other from the second parent. Is there a way to correctly generate the RG tags in this scenario?

Any help and or comment will be appreciated!!!

Best,

ARW

Comments

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    Regarding the first problem, you would have to provide more information about the error (i.e. the rest of the output).

    I suspect the problem in both cases, however, is that the bam files you generated don't follow the SAM file conventions.
    The second error is pretty self-explanatory: If you have a read pair, we expect the RG tag to be the same on both mates.

Sign In or Register to comment.