We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

SelectVariants -env doesn't remove all non-variant sites

I am using GATKv3.7, trying to remove a few samples from a big VCF file and remove all invariant sites. Here is the code I ran:

java -jar GenomeAnalysisTK-3.7-0/GenomeAnalysisTK.jar -T SelectVariants -R reference.fasta -V 142_samples.vcf -xl_sn TC050 -xl_sn TC052 -xl_sn TC054 -xl_sn TC056 -xl_sn TC059 -xl_sn TC060 -xl_sn TC060_09.05 -xl_sn TC062 -xl_sn TCDH29 -xl_sn TCViera -xl_sn TCy -env -o 131_samples.vcf

I use the -env tag, and the header of the resulting VCF does indeed say "Exclude non-variant Sites = true", and while it did remove a lot of the sites that the excluded samples were variant for, I still have many invariant sites in the file (about 20K invariant and 10K variant). Is there a way to remove these invariant sites directly from the VCF using your tools? I can remove them easily from fasta down the line, but I'm not a competent enough coder to modify VCF files like this.



Best Answer


Sign In or Register to comment.