Dear GATK Team,

I ran GATK4 variant calling as per best practices on one WGS sample sequenced in lanes.

Steps followed to get MergedBAM : Aligned lane wise fastq separately, remove duplicates, merge lane bam and again Markduplicates. Variant calling on mergebam. I followed the below reference.


Also I ran variant calling on lane-wise bam separately in order to compare 2 lane g.vcf files with merged bam g.vcf
When I compare gvcf generated from individual lane bam and merged bam. it is huge difference in size.

Sample # of lines GVCF Size in GB
lane-1 658655987 7.6G
lane-2 442845977 5.6G
Merged 83563153 1.3G

I have seen less difference when I convert gvcf to vcf using gvcftools extract_variants.But at g.vcf level I am not sure why I am getting this much difference in file sizes.

Could you please help me.

Thanks In Advance
Fazulur Rehaman

