Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

sample re-align and recalibration

Dear GATK team

I have around 20 exome-seq samples, and every 2 samples are barcoded and sequenced in one lane. After having generated the bam files for each sample individually with BWA, I was just wondering which is a better way for GATK realignment and recalibration.. shall I just make the alignment and recalibration for individual exome-seq samples, or shall I just merge samples from each lane and process them at the lane level, or should I just merge all the samples and process everything together?

Thank you for your kind advice!

Answers

  • pdexheimerpdexheimer Member ✭✭✭✭

    This is covered in detail in the Best Practices document - annotate your bams correctly with sample and library information in the read groups, and GATK will basically handle everything. In general, you want to deduplicate at the library level and do everything else at the sample level. You could make an argument for doing BQSR at the lane level, but you should have enough data that splitting the lane in two won't hurt BQSR

  • caswatercaswater Member

    thank you! I was a bit unsure about the 'sample level' in the Best Practice as it recommends to merge lanes for a single sample. However, in my case, my sample/library is just half-lane (~80 milliom reads), so can I still go with the sample-level processing, no need to merge information from other samples? Just wish to confirm with you about this.

    Thank you again!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    You should be fine with sample-level processing.

  • rxy712rxy712 Member

    The information is helpful. Thank you! But what if I have 6 samples per lane, would it be OK to do the sample-level BQSR processing?? For lane1, I have 1 normal and 4 different tumor samples from patient 1, and 1 normal sample from patient 2. Totally I have 42 samples from 8 patients in 7 lanes. BTW, I have realigned tumor/normal samples together for each patient, and used --nWayOut to generate separate bam files (42 bam files). If I do the BQSR on the lane level, how can I get 6 separate bam files??

Sign In or Register to comment.