Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GVCF generated from lane-wise bam and merged bam

fazulurfazulur hyderabadMember

Dear GATK Team,

I ran GATK4 variant calling as per best practices on one WGS sample sequenced in lanes.

Steps followed to get MergedBAM : Aligned lane wise fastq separately, remove duplicates, merge lane bam and again Markduplicates. Variant calling on mergebam. I followed the below reference.

https://gatkforums.broadinstitute.org/gatk/discussion/6057/at-what-point-should-i-merge-read-group-bam-files-belonging-to-the-same-sample-into-a-single-file

Also I ran variant calling on lane-wise bam separately in order to compare 2 lane g.vcf files with merged bam g.vcf
When I compare gvcf generated from individual lane bam and merged bam. it is huge difference in size.

Sample # of lines GVCF Size in GB
lane-1 658655987 7.6G
lane-2 442845977 5.6G
Merged 83563153 1.3G

I have seen less difference when I convert gvcf to vcf using gvcftools extract_variants.But at g.vcf level I am not sure why I am getting this much difference in file sizes.

Could you please help me.

Thanks In Advance
Fazulur Rehaman

Best Answer

Answers

Sign In or Register to comment.