Frequently Asked Questions

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,467Administrator, GATK Developer admin
edited September 2012 in GenomeSTRiP Documentation

1. What does error message X mean?

See the FAQ section on frequently encountered errors.

2. Can I use Genome STRiP to do discovery or genotyping in a single high-coverage individual?

Genome STRiP is designed to discover and genotype variants in populations and uses the information from multiple individuals simultaneously. Typically you will need data from at least 20 or 30 individuals to get good results.

That being said, it may be possible to use a "background population" along with a single high-coverage individual to run Genome STRiP. The background population does not need to have the same depth of coverage as the target genome you want to process, but reads will need to be aligned to the same reference sequence. A good background population might be 50 or so individuals from the 1000 Genomes Project chosen from diverse population groups. This approach has not been widely tested, although I have looked at targeted resequencing loci using this strategy with some success. If you try this strategy, please share your experiences.

3. Does Genome STRiP only work with deletions?

In the current version, only deletions (relative to the reference) are supported in discovery and genotyping. We are actively working on discovery and genotyping of other kinds of structural variants.

4. Is the source code available?

Not at this time, but we are planning to release the source code shortly.

5. Can I run discovery on a small genomic region?

If you have whole-genome sequence data, you can run on just a small region using the standard -L argument to the GATK. For example

-L
chr1:1000000-2000000
.

If you have targeted resequencing data, where you have only sequenced a small subset of the genome, then you additionally need to set the effective genome size to be smaller. To do this, you currently need to modify the configuration parameters in conf/genstrip_parameters.txt (the file location is specified with the -configFile command line argument).

You will need to change these three parameters:

input.genomeSize = A + X + Y 
input.genomeSizeMale = 2*A + X + Y 
input.genomeSizeFemale = 2*A + 2*X

where A is the total size of the autosomal reference and X and Y are the lengths of the X and Y chromosomes. Note that genomeSize is in haploid bases while genomeSizeMale and genomeSizeFemale are in diploid bases.

Of course, if your target region doesn't include X or Y, then just set genomeSizeMale and genomeSizeFemale to 2*genomeSize. See the installtest configuration file for an example, where the effective genome size is set to 200Kb.

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Sign In or Register to comment.