I want to genotype known CNVs (from 1000G Phase3, GoNL, etc.) in our samples using GenomeStrip without performing any discovery step at first.

1) Do I have to run only the SVPreprocess steps followed by the SVGenotyper step ? Does the CNVDiscovery and/or LCNVDiscovery pipelines produce metadata that can be useful for the duplications and CNV genotyping or does the SVPreprocess produce all needed metadata ?

2) As far as I understand, imprecise variants are genotyped using the SVTYPE info. The documentation explains SVTYPE=DEL and SVTYPE=CNV. But how does the software consider SVTYPE=DUP ? SVTYPE=INS ? SVTYPE=DEL_ALU and so on which are present in 1000G Phase3 vcf file ? Does SVGenotyper consider all SVTYPE except DEL as CNV so that it will try to genotype different copy number alleles ? If true, does that mean that a SVTYPE=DUP can be genotyped as a deletion for example if GenomeStrip finds it's not a pure duplication (so that we can't force the SVTYPE) ?




    Hi @bhandsaker,

    Any news concerning my questions ?


    Sorry I didn't respond right away. I apparently don't receive notifications any more when people post here and I can't figure out how to turn them on without turning on some really general category.

    Question 1: You need to run SVPreprocess before any of the other pipelines, which all require the metadata produced by SVPreprocess. The other pipelines are all independent. You can just run SVPreprocess followed by SVGenotyper.

    Question 2: As a practical matter, I would suggest recoding the list of input sites and setting SVTYPE to either CNV or DEL. Or you could even genotype each site both ways if you want by creating two input records with different SVTYPE values.

    If SVTYPE is set to DEL, then GS will treat this as a bi-allelic variant and it will also encode the genotypes in the GT field and generate GQ/GL fields. If SVTYPE is set to CNV, then CN/CNQ/CNL are emitted (diploid copy number), but not GT/GQ/GL. A corollary of this is that a DEL site can never be called copy number 3. However the reverse is not true: a CNV site can be called any copy number.

    The SVTYPE values DUP and (for historical reasons) DUP:TANDEM are accepted as synonyms for CNV. Other unrecognized SVTYPE values should default to using the DEL model. There is also some historical code that recognizes INS and does something strange, so you should try to avoid using INS.

    Hope this helps.

    Hi @bhandsaker ,

    Thank you for your clarifications. I will run SVPreprocess to produce metadata and then SVGenotyper to genotype the known sites.
    I will let the SVTYPE=DEL for deletions and recode all other sites as SVTYPE=CNV.


