GVCF generated from lane-wise bam and merged bam

fazulurfazulur hyderabadMember

Dear GATK Team,

I ran GATK4 variant calling as per best practices on one WGS sample sequenced in lanes.

Steps followed to get MergedBAM : Aligned lane wise fastq separately, remove duplicates, merge lane bam and again Markduplicates. Variant calling on mergebam. I followed the below reference.

https://gatkforums.broadinstitute.org/gatk/discussion/6057/at-what-point-should-i-merge-read-group-bam-files-belonging-to-the-same-sample-into-a-single-file

Also I ran variant calling on lane-wise bam separately in order to compare 2 lane g.vcf files with merged bam g.vcf
When I compare gvcf generated from individual lane bam and merged bam. it is huge difference in size.

Sample # of lines GVCF Size in GB
lane-1 658655987 7.6G
lane-2 442845977 5.6G
Merged 83563153 1.3G

I have seen less difference when I convert gvcf to vcf using gvcftools extract_variants.But at g.vcf level I am not sure why I am getting this much difference in file sizes.

Could you please help me.

Thanks In Advance
Fazulur Rehaman

Best Answer

Answers

Sign In or Register to comment.