The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

VariantsToTable - all fields


Can the VariantsToTable walker produce output containing all the fields in the VCF without you having to specify them in the command?





  • CarneiroCarneiro Charlestown, MAMember

    good feature suggestion.

    sorry, not yet :-(

  • KurtKurt Member

    Hello @Kath, this may or may not help you in the meantime. One of my colleagues, asked me to install PyVCF which I think did what you are asking for (I only glanced at the output after I installed and tested it). I think the program in there is called vcf-melt that does the vcf to table conversion. I think the version of Python that I used to install it is 2.7.0.

    I think the command was something like vcf-melt in.vcf > out.txt

  • KathKath Member

    Thanks Kurt. Actually, I have noticed that SnpSift can also convert vcf to table using the extractFields command. This has the advantage of being able to handle snpEFF annotations as well.

  • KathKath Member

    Having said that, it still won't include all fields without me having to specify each one in the command.

  • CarneiroCarneiro Charlestown, MAMember

    You can load the VCF in R directly, if you want all the information in. Takes some manipulation of the data.

  • mmterpstrammterpstra NetherlandsMember
    edited September 2013

    written some perl code to do this should be self explanatory:

    use warnings;
    use strict;
    use Vcf;
    #use Data::Dumper;
    my $use =<<"END";
    use: perl $0 in.vcf
    uses the vcftools library so please add it to PERL5LIB enviroment variable
    practical use example:
    This dumps some basic fields:
    this dumps all info/format fields + some basic fields (CHROM POS REF ALT ID QUAL FILTER)
    java -jar GenomeAnalysisTK.jar \\
         -R reference.fasta \\
         -T VariantsToTable \\
         -V file.vcf \\
         -AMD \\
         -raw \\
         -F CHROM -F POS -REF -ALT -F ID -F QUAL -F FILTER \$(perl $0 file.vcf) \\
         -o results.table
    for vcftools merge compatibility, i have added functionality to skip any field matching with /_[0123456789]$/
    note: don't worry anymore about fixing typoos, missing any fields, exporting unpopulated fields in your VcfTableExport.
    die "Invalid vcf file specified!! '$ARGV[0]'\n $use\n$!" if(not(-e $ARGV[0]));
    sub AccumulateFieldsVCF{
        my $vcfin = $_[0];#'/home/terpstramm/Downloads/s12-193.annotated.vcf';
        my $out;
        #if($_[1] && not(-e $_[1])){
        #   my $vcfout = $_[1];
        #   open($out,'>',$vcfout) or die 'Cannot write vcf file';
        #   warn 'warning: no out.vcf file specified printing to STDOUT'."\n";
            $out =*STDOUT;
        my $vcf = Vcf->new(file=>$vcfin) or die "cannot open vcf file $vcfin\n";
        #print {$out} $vcf->format_header();
        my %FieldInfo;
        my $q;
        while (my $x=$vcf->next_data_hash()){
            #print Dumper($x);
            for my $F (keys(%{$x->{'INFO'}})){
                #vcftools vcf-merge compatibility 
                next if($F =~ /_[0123456789]$/);
            for my $GF (@{$x->{'FORMAT'}}){
            #print {$out} $vcf->format_line($x);
    #        if ($. > 2);
        ##results here
        $q = \%FieldInfo;
        #print Dumper($q);
        print " -F ".join(" -F ",sort(keys(%{$q->{'F'}})));
        print " -GF ".join(" -GF ",sort(keys(%{$q->{'GF'}})));

    do not forget to cite "The Variant Call Format and VCFtools, Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert Handsaker, Gerton Lunter, Gabor Marth, Stephen T. Sherry, Gilean McVean, Richard Durbin and 1000 Genomes Project Analysis Group, Bioinformatics, 2011" and mention the GATK forums :)

  • ericminikelericminikel Member
    edited January 2014

    In case this is useful to you... I had the same problem so I wrote this Unix one-liner to extract all the unique fields that occur in the INFO field and format them like -F FIELD \ so you can copy and paste in to a GATK command:

    cat myvcf.vcf | awk '{print $8}' | perl -pe 's/=.*?(;|\n)/\n/g' | sed 's/;/\n/g' | sort | uniq | awk '{print "-F "$1" \\"}'

    So if I feed it a VCF with an INFO field like this:


    I get back a list like this:


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Thanks for posting this, Eric! I'm sure it'll be useful to others.

  • dobenshaindobenshain Rockville, MDMember
    edited August 2014

    I had a little trouble with Eric's command, I had some additional annotation fields from dbNSFP and SnpEff that were getting split apart incorrectly. I wrote two one-liners, one for INFO fields and one for the FORMAT fields, printing out the available fields in the file from the header (also avoids parsing more of the file than necessary). My implementation uses these one-liners inside a shell script, with the $INPUT variable equaling the path to the VCF file, and the output of each command is saved to a variable, which can be included in the command to run GATK.

    This effectively emits all available INFO and Genotype (FORMAT) fields.

    Please note that the markdown formatting will remove the back-ticks that surround each command after the = sign.

    info_fields=head -5000 $INPUT | perl -ne '/^.*INFO.*ID=([a-zA-Z0-9_]+),/ && print "-F $1 \\ "' | uniq

    Creates a string like:
    -F AC \
    -F AF \
    -F AN \
    -F BaseQRankSum \
    -F ClippingRankSum \
    -F DB \
    -F DP \
    -F DS \
    -F END \
    -F FS \
    -F HaplotypeScore \
    -F InbreedingCoeff \
    -F MLEAC \
    -F MLEAF \
    -F MQ \
    -F MQ0 \
    -F MQRankSum \
    -F QD \
    -F ReadPosRankSum \
    -F VQSLOD \
    -F culprit \
    -F set \
    -F EFF \
    -F LOF \
    -F NMD \
    -F dbNSFP_FATHMM_score_converted \
    -F dbNSFP_MutationTaster_score_converted \
    -F dbNSFP_Ensembl_geneid \
    -F dbNSFP_SIFT_score_converted \
    -F dbNSFP_1000Gp1_ASN_AC \
    -F dbNSFP_Polyphen2_HDIV_score \
    -F dbNSFP_29way_logOdds \
    -F dbNSFP_1000Gp1_AMR_AC \
    -F dbNSFP_ESP6500_EA_AF \
    -F dbNSFP_29way_pi \
    -F dbNSFP_1000Gp1_AFR_AC \
    -F dbNSFP_Uniprot_aapos \
    -F dbNSFP_LRT_Omega \
    -F dbNSFP_1000Gp1_AC \
    -F dbNSFP_Interpro_domain \
    -F dbNSFP_Uniprot_id \
    -F dbNSFP_LRT_score_converted \
    -F dbNSFP_Ensembl_transcriptid \
    -F dbNSFP_ESP6500_AA_AF \
    -F dbNSFP_Polyphen2_HVAR_score \
    -F dbNSFP_Ancestral_allele \
    -F dbNSFP_phyloP \
    -F dbNSFP_MutationAssessor_score_converted \
    -F dbNSFP_1000Gp1_EUR_AC \
    -F TRF_score \
    -F SegDup \

    genotype_fields=head -5000 $INPUT | perl -ne '/^.*FORMAT.*ID=([a-zA-Z0-9_]+),/ && print "-GF $1 \\ "' | uniq

    Creates a string like:
    -GF AD \
    -GF DP \
    -GF GQ \
    -GF GT \
    -GF PL \
    -GF GPP \
    -GF FPP \
    -GF FGT \

Sign In or Register to comment.