Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

VariantsPerSample

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,293Administrator, GSA Member admin
edited September 2012 in GenomeSTRiP Documentation

1. Introduction

The VariantsPerSample annotator is invoked through the SVVariantAnnotator walker, which defines arguments common to all annotators.

The VariantsPerSample annotator is a simple annotator that counts how many variants are in each sampled genome. The output can be rolled up into population level statistics.

In the current implementation, VariantsPerSample uses the GSSAMPLES INFO tag in the input VCF file to determine which samples carry the variant. The input VCF file should not contain genotypes.

Filtered variants are not counted in the totals.

2. Inputs / Arguments

  • -populationMap <map-file> : A tab-delimited input file containing two columns: the sample ID and a population ID for that sample. If supplied, the population information will be carried over into the output report.

3. Annotations

No VCF annotations are produced, but this annotator is used to produce a report file. The report file will contain one line per sample. The report includes the number of variants and also the population if -populationMap is supplied.

4. Example

java -Xmx2g -cp SVToolkit.jar:GenomeAnalysisTK.jar \
    org.broadinstitute.sting.gatk.CommandLineGATK \ 
    -T SVVariantAnnotator \ 
    -A VariantsPerSample \ 
    -R /humgen/1kg/reference/human_g1k_v37.fasta \ 
    -BTI variant \ 
    -B:variant,VCF input.vcf \ 
    -populationMap sample_to_population.map \
    -writeReport \ 
    -reportFile variants_per_sample.dat
Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Sign In or Register to comment.