If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
RAW VCF has a significantly higher amount of variations than the total of (RAW SNPS + RAW INDELS)
I successfully ran a variant calling pipeline and got the raw.vcf file obtained by haplotype caller which contains exactly 4,484,688 records. And then I processed it further by extracting SNPs and INDELs to seperate VCF files using SelectVariants , and it yielded a vcf for raw SNPs with 85,239 records and a vcf for raw INDELs with 14,501 records. As you can see there is a significant decrease of the total number of records. where almost 4 million records were ignored. Could you please explain me what's the process behind all these and why is this happening ?