The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

how to select a private SNP with GATK from a multisample VCF file

WimSWimS Member Posts: 27

I ran in to the situation now a couple of times that I need to extract a set of private SNPs from a multisample VCF file. For example in a forward genetics knockout screen of a large set of samples.

It is possible with vcf-contrast from vcf-tools:

vcf-contrast +sample1 -sample2 -sample3 -n input.vcf > private sample1.vcf

vcf-contrast -sample1 +sample2 -sample3 -n input.vcf > private sample2.vcf

vcf-contrast -sample1 -sample2 +sample3 -n input.vcf > private sample3.vcf

After this I still would have to filter out the private 0/0 calls and doing this for a large multisample VCF means entering this command for all the combinations which is not really nice.

Surely this must be possible with GATK. Does anyone know how to do this with GATK.

Maybe it is somewhere in the SelectVariants? The --discordance option looked promissing but there is something about that the samples should be the same? Or is it possible to write another variant walker or a JEXL expression?

P.S. By accident I also posted this question in the XHMM, an admin could remove it there.

Best Answer


  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 ✭✭✭

    See the --excludeNonVariants option in SelectVariants (to be used in conjunction with -sn sample1).

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • WimSWimS Member Posts: 27
    edited January 2013

    That would give the variants for each sample. But then I still need to get the private variants per sample. In other words the variants that are only found in 1 sample. All the other samples should be 0/0 for that variant. It would be nice if this could be done in a single command, since there are 70 samples in my multisample VCF

  • WimSWimS Member Posts: 27

    Yes that should to it I think. How do I add the JEXL expression to SelectVariants? When I try

    java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T SelectVariants -R /referencel/reference_GATK_sorted.fa -V my70Samples.vcf -o selectVariantsPrivate.vcf -select "AC==1"

    I get

    Invalid JEXL expression detected for select-0 with message ![0,7]: 'AC == 1;' == error

  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 ✭✭✭

    This question has been covered in other threads on this forum already.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • WimSWimS Member Posts: 27

    Ok so I had to restrict to BIALLELIC SNPs. If anybody else wants to find all the information in one place:

    java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T SelectVariants -R reference.fa -V input.vcf -o biAllelicPrivate.vcf --restrictAllelesTo BIALLELIC -select "AC==1"

  • JIAGEHAOJIAGEHAO Member Posts: 13

    Hi, may I ask you a question? I have used GATK to call snp in a pair of cancer and normal samples, and obtained the final filtered snp.vcf file. Do you know how to use SelectVariants tool to filter out cancer-specific snps? Do you know the command line to do so? Any suggestion will be appreciated.

Sign In or Register to comment.