Information about the result of CNVDiscoveryPipeline

hi Bob,I got the result of CNVDiscoveryPipeline
In the vcf file:
SVTYPE=CNV,How could I known this CNV is belong to DEL or Dup?
and if it is Dup,how many times the CNV repeat?
and what are the mean CN,CNQ,CNL,CNP fields respectively?
Thank you very much!



  • bhandsakerbhandsaker Member, Broadie, Moderator

    Those four fields are defined in the VCF specification (and are listed in the VCF header).
    CN is the integer copy number call from Genome STRiP.
    CNQ/CNL/CNP are analogous to GQ/GL/GP. They represent:
    CNQ: phred-scaled quality of the CN call
    CNL: Vector of log10 likelihoods of each CN state starting from zero up to some maximum derived from the data (copy number states above the maximum have negligible likelihood)
    CNP: Like CNL, but a posterior likelihood based on the frequency distribution in the population estimated from the genotyped cohort

    You didn't mention CNF, but this is the "fractional" copy number, which is a point estimate of the most likely copy number based on read depth alone. This isn't currently in the VCF spec.

    To determine if a particular sample carries a deletion or duplication, compare CN to the expected ploidy for that sample at that site (i.e. taking into account sex on the sex chromosomes).

    You may also be interested in the CopyNumberClass annotator, which emits the distribution of observed copy numbers and also classifies the sites as DEL/DUP/MIXED (MIXED meaning that there is evidence for both deletion and duplication alleles compared to the reference).

