To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

RAW VCF has a significantly higher amount of variations than the total of (RAW SNPS + RAW INDELS)

NilakshaNilaksha Colombo Sri LankaMember

Hi,
I successfully ran a variant calling pipeline and got the raw.vcf file obtained by haplotype caller which contains exactly 4,484,688 records. And then I processed it further by extracting SNPs and INDELs to seperate VCF files using SelectVariants , and it yielded a vcf for raw SNPs with 85,239 records and a vcf for raw INDELs with 14,501 records. As you can see there is a significant decrease of the total number of records. where almost 4 million records were ignored. Could you please explain me what's the process behind all these and why is this happening ?

Answers

Sign In or Register to comment.