Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GATK 3.5 or 3.8 dropped multiallelic variants containing both SNP and Indel

We noticed that GATK3.5 or 3.8 dropped multiallelic variants containing both SNP and Indel when selecting SNP and INDEL variants separately for filtering. We followed the DNA-seq best-practices. Our workaround is to use vt to decompose the VCF file before the SelectVariants steps. If the multiallelic variants contain all SNP or all Indel, then the variants will be kept after SelectVariants step.

Is there an argument to keep those SNP and Indel variants? Or already modified in GATK4?

Will be happy to provide our VCF file for you to test.

We used the following commands:

Extracts all SNPs

GATK -m 20g SelectVariants \
-R human_g1k_v37_decoy.fasta \
-V $2 \
-L $bed \
--interval_padding 100 \
-selectType SNP \
-o ${2%.vcf}.rawSNP.vcf

Extracts all INDELS

GATK -m 20g SelectVariants \
-R human_g1k_v37_decoy.fasta \
-V $2 \
-L $bed \
--interval_padding 100 \
-selectType INDEL \
-o ${2%.vcf}.rawINDEL.vcf

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @BGuan, saw your tweet -- your interpretation is correct, the -selectType query is exclusive of other types, so -selectType SNP will ONLY select variants that are observed as SNP alleles. If you want to pull out mixed records you can use -selectType MIXED (either separately to output them to a separate file, or in combination with one or the other types). There is currently no way to decompose a given mixed variant record into separate allele types with GATK.

    That being said, are you sure you want to separate the variants into separate files? We have some tutorials that demonstrate how to do that for learning purposes, but that's not actually what we recommend for filtering. If you're following Best Practices and using VQSR, you can leave them all in a single file -- the VQSR tools have some built-in logic to operate on each type independently while ignoring the other types in the same file.

    Sorry about the lag in response, by the way -- we had a bit of accumulated backlog that got put on hold while we transitioned forum responsibilities to our new team member, @bhanuGandham, who is now actively chewing through that backlog and taking care of all new postings. Bhanu will follow up if you have any further questions on this topic. And don't hesitate to tweet @gatk_dev to attract our attention ;)

Sign In or Register to comment.