The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.10.4 has MAJOR CHANGES that impact throughput of pipelines. Default compression is now 1 instead of 5, and Picard now handles compressed data with the Intel Deflator/Inflator instead of JDK.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the github release page for download and details.

Include no-calls in vcf with only variant sites

Is there a way to include only variant sites and no-calls in your final vcf. I know during SNP calls you can only emit variants, or only confident sites or all. However is there a way to reduce your vcf in the end to only variant sites (vsqr passed) and places where no calls could be made. So the end vcfs have only variant sites and missing data - and everything that is not listed in the vcf file is reference. I need such a file for merging with other vcf files - so that every position that is not in the vcfs while merging can be called ref.

So far i have called snps with emit-all and done vsqr - I now want to reduce vcfs in size by excluding NO_VARINATION sites (but want to keep information on "missing" sites)

Best Answer

Answers

  • othoth OsloMember

    I am also interested in obtaining a vcf including only confident varants and sites with missing data. I could not find the recommended workflow for this; could you please direct me to it? I'm working with a haploid genome, and have therefore been using the UnifiedGenotyper followed by VariantFiltation. To obtain data for the missing regions I have up until now relied on grep of a vcf emitting all sites.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    I would recommend using DiagnoseTargets to identify sites that cannot be called.

  • othoth OsloMember

    Thank you for your recommendation. Since my goal is to create a consensus fasta file from the vcf; I am however concerned about the correspondence between missing data from DiagnoseTargets and sites with missing data in the UnifiedGenotyper vcf (I am correct to assume that sites with missing data are marked by ./. in the last (bar) column of the vcf?)
    I also tried running the DiagnoseTargets tool on my dataset, but got the following error:

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.NullPointerException
    at org.broadinstitute.sting.gatk.io.stubs.OutputStreamArgumentTypeDescriptor.parse(OutputStreamArgumentTypeDescriptor.java:89)
    at org.broadinstitute.sting.commandline.ArgumentTypeDescriptor.parse(ArgumentTypeDescriptor.java:129)
    at org.broadinstitute.sting.commandline.ArgumentSource.parse(ArgumentSource.java:119)
    at org.broadinstitute.sting.commandline.ParsingEngine.loadValueIntoObject(ParsingEngine.java:488)
    at org.broadinstitute.sting.commandline.ParsingEngine.loadArgumentsIntoObject(ParsingEngine.java:408)
    at org.broadinstitute.sting.commandline.ParsingEngine.loadArgumentsIntoObject(ParsingEngine.java:382)
    at org.broadinstitute.sting.commandline.CommandLineProgram.loadArgumentsIntoObject(CommandLineProgram.java:265)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:110)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:107)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    I am correct to assume that sites with missing data are marked by ./. in the last (bar) column of the vcf?

    Yes that's correct.

    I also tried running the DiagnoseTargets tool on my dataset, but got the following error

    Can you please post the full log output including the starting command line?

Sign In or Register to comment.