We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Lane information in Read Groups

ElisabetTElisabetT Faroe IslandsMember
Dear GATK team,

I am a student participating in a project where we have barcoded data (10X genomics exome data). From this data we want to call variants to use in downstream analysis.

We are using EMA ( https://github.com/arshajii/ema/ ) to align the samples and the plan is to proceed with GATK best practise pipeline (Data pre-processing, Germline (SNPs + Indels)) for variant calling.

We have a Read Group (RG) problem, which we hope you can help us with. This problem starts in the sequencing method:
After barcoding in the Chromium Controller Instrument our data was sequenced on Illumina NextSeq. We can not presume that all the DNA fragments from one GEM (containing the identical barcodes) will be on the same lane. Therefore, we want to align pr sample (all the lanes in one go) to ensure that the DNA sequences containing the same barcode, can be aligned in the same "cloud".
However, if we align in this manner, it is not possible to get the lane information into the RGs while we align.

We have thought about methods to e.g. split the BAM file pr lane and get RG info in after alignment, but before we waste time on it, we just want to ask: From the method our data is generated, is it vital that the lane information is in the Read groups?

And which other information is vital?

We will be using the GATK tools mentioned in the best practise pipeline and we are unsure which tools use e.g. the lane information and what it is used for.

Thanks in advance!



Sign In or Register to comment.