This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
RAW VCF has a significantly higher amount of variations than the total of (RAW SNPS + RAW INDELS)
I successfully ran a variant calling pipeline and got the raw.vcf file obtained by haplotype caller which contains exactly 4,484,688 records. And then I processed it further by extracting SNPs and INDELs to seperate VCF files using SelectVariants , and it yielded a vcf for raw SNPs with 85,239 records and a vcf for raw INDELs with 14,501 records. As you can see there is a significant decrease of the total number of records. where almost 4 million records were ignored. Could you please explain me what's the process behind all these and why is this happening ?