It looks like you're new here. If you want to get involved, click one of these buttons!
See the FAQ section on frequently encountered errors.
Genome STRiP is designed to discover and genotype variants in populations and uses the information from multiple individuals simultaneously. Typically you will need data from at least 20 or 30 individuals to get good results.
That being said, it may be possible to use a "background population" along with a single high-coverage individual to run Genome STRiP. The background population does not need to have the same depth of coverage as the target genome you want to process, but reads will need to be aligned to the same reference sequence. A good background population might be 50 or so individuals from the 1000 Genomes Project chosen from diverse population groups. This approach has not been widely tested, although I have looked at targeted resequencing loci using this strategy with some success. If you try this strategy, please share your experiences.
In the current version, only deletions (relative to the reference) are supported in discovery and genotyping. We are actively working on discovery and genotyping of other kinds of structural variants.
Not at this time, but we are planning to release the source code shortly.
If you have whole-genome sequence data, you can run on just a small region
using the standard
-L argument to the GATK. For example
If you have targeted resequencing data, where you have only sequenced a small
subset of the genome, then you additionally need to set the effective genome
size to be smaller. To do this, you currently need to modify the configuration
conf/genstrip_parameters.txt (the file location is specified
-configFile command line argument).
You will need to change these three parameters:
input.genomeSize = A + X + Y input.genomeSizeMale = 2*A + X + Y input.genomeSizeFemale = 2*A + 2*X
where A is the total size of the autosomal reference and X and Y are the lengths of the X and Y chromosomes. Note that genomeSize is in haploid bases while genomeSizeMale and genomeSizeFemale are in diploid bases.
Of course, if your target region doesn't include X or Y, then just set
2*genomeSize. See the
configuration file for an example, where the effective genome size is set to
Geraldine Van der Auwera, PhD