If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Using GenomeStrip to genotype known vcf

bgrenierbgrenier FranceMember
edited October 2017 in GenomeSTRiP


I want to genotype known CNVs (from 1000G Phase3, GoNL, etc.) in our samples using GenomeStrip without performing any discovery step at first.

1) Do I have to run only the SVPreprocess steps followed by the SVGenotyper step ? Does the CNVDiscovery and/or LCNVDiscovery pipelines produce metadata that can be useful for the duplications and CNV genotyping or does the SVPreprocess produce all needed metadata ?

2) As far as I understand, imprecise variants are genotyped using the SVTYPE info. The documentation explains SVTYPE=DEL and SVTYPE=CNV. But how does the software consider SVTYPE=DUP ? SVTYPE=INS ? SVTYPE=DEL_ALU and so on which are present in 1000G Phase3 vcf file ? Does SVGenotyper consider all SVTYPE except DEL as CNV so that it will try to genotype different copy number alleles ? If true, does that mean that a SVTYPE=DUP can be genotyped as a deletion for example if GenomeStrip finds it's not a pure duplication (so that we can't force the SVTYPE) ?




  • bgrenierbgrenier FranceMember

    Hi @bhandsaker,

    Any news concerning my questions ?


  • bhandsakerbhandsaker admin Member, Broadie, Moderator admin
    edited November 2017

    Sorry I didn't respond right away. I apparently don't receive notifications any more when people post here and I can't figure out how to turn them on without turning on some really general category.

    Question 1: You need to run SVPreprocess before any of the other pipelines, which all require the metadata produced by SVPreprocess. The other pipelines are all independent. You can just run SVPreprocess followed by SVGenotyper.

    Question 2: As a practical matter, I would suggest recoding the list of input sites and setting SVTYPE to either CNV or DEL. Or you could even genotype each site both ways if you want by creating two input records with different SVTYPE values.

    If SVTYPE is set to DEL, then GS will treat this as a bi-allelic variant and it will also encode the genotypes in the GT field and generate GQ/GL fields. If SVTYPE is set to CNV, then CN/CNQ/CNL are emitted (diploid copy number), but not GT/GQ/GL. A corollary of this is that a DEL site can never be called copy number 3. However the reverse is not true: a CNV site can be called any copy number.

    The SVTYPE values DUP and (for historical reasons) DUP:TANDEM are accepted as synonyms for CNV. Other unrecognized SVTYPE values should default to using the DEL model. There is also some historical code that recognizes INS and does something strange, so you should try to avoid using INS.

    Hope this helps.

    Post edited by bhandsaker on
  • bgrenierbgrenier FranceMember

    Hi @bhandsaker ,

    Thank you for your clarifications. I will run SVPreprocess to produce metadata and then SVGenotyper to genotype the known sites.
    I will let the SVTYPE=DEL for deletions and recode all other sites as SVTYPE=CNV.


Sign In or Register to comment.