about CopyNumberClassAnnotator, AlleleFrequencyAnnotator

SunhyeSunhye KoreaMember

Hi.
I have 3 questions.

I wonder what CNCATEGORY means in CopyNumberClassAnnotator's output.
For SVGenotyper's out, I run to annotate using CopyNumberClassAnnotator.
First, What do "NA" in CNCATEGORY mean? This is not Deletion? This is false positive variant? Can I filter this site ?
Second, for CNV output, "MIXED" in CNCATEGORY means multiallele CNV ?

Finally, AlleleFrequencyAnnotator don't annotate CNV output.

ID      CALLRATE        CNMIN   CNMAX   CNALLELES       NNONREF NVARIANT        CNCATEGORY      CNDIST
CNV_1_766706_768670     2       NA      NA      124     124     0       0       0       0       0       0       
CNV_1_767278_768670     2       NA      NA      124     124     0       0       0       0       0       0       
CNV_1_810031_812465     2       NA      NA      124     124     0       0       0       0       0       0       
CNV_1_821039_823733     2       NA      NA      124     124     0       0       0       0       0       0       
CNV_1_825927_828492     2       NA      NA      124     124     0       0       0       0       0       0    

Do AlleleFrequencyAnnotator only calculate for SV's output?

Best Answer

Answers

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    What do "NA" in CNCATEGORY mean?

    Best if you post an example.

    Second, for CNV output, "MIXED" in CNCATEGORY means multiallele CNV ?

    No, it means there are observed copy numbers both above and below the expected reference copy number.
    They are multi-allelic, but multi-allelic duplications may also show up as category DUP.

    Do AlleleFrequencyAnnotator only calculate for SV's output?

    That doesn't look like AlleleFrequencyAnnotator output.
    But to answer your question quoting from the documentation:

    • The AlleleFrequency annotator uses the genotype calls from the VCF file to compute allele frequency statistics.
    • Only bi-allelic variants are currently supported and the GT/FT/GQ tags are used to determine the genotype values.
  • SunhyeSunhye KoreaMember

    Hi @bhandsaker
    Thanks for your reply and I'm so sorry my late reply.

    First, my example about "NA" in CNCATEGORY.

    ID      CALLRATE        CNMIN   CNMAX   CNALLELES       NNONREF NVARIANT        CNCATEGORY      CNDIST
    DEL_P0001_1     0.629   1       2       2       12      12      DEL     0,12,27
    DEL_P0001_2     0.000   NA      NA      0       0       0       NA      NA
    DEL_P0001_3     1.000   2       2       1       0       0       NA      0,0,62
    DEL_P0001_4     1.000   2       2       1       0       0       NA      0,0,62
    DEL_P0001_5     1.000   2       2       1       0       0       NA      0,0,62
    DEL_P0001_6     1.000   0       2       2       23      23      DEL     5,18,39
    DEL_P0001_7     0.000   NA      NA      0       0       0       NA      NA
    DEL_P0001_8     0.000   NA      NA      0       0       0       NA      NA
    DEL_P0001_9     0.000   NA      NA      0       0       0       NA      NA
    DEL_P0001_10    0.613   1       2       2       34      4       DEL     0,34,4
    

    What does 'NA' mean?

    Second, then, Does value greater than 2 in CNALLELES determine multi-allele CNV?

    Finally, I used AlleleFrequencyAnnotator to get allele frequency for CNV output.
    my script,

    classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar"
        java -Xmx4g -cp ${classpath} \
            org.broadinstitute.sv.main.SVAnnotator \
            -A AlleleFrequency \
            -R ${bundle}/human_g1k_v37.fasta \
            --ploidyMapFile ${bundle}/human_g1k_v37.ploidymap.txt \
            -vcf ${CNVfilename} \
            -populationMapFile ${bundle}/sample_Population.map \
            -writeReport true \
            -reportDirectory CNVreportdir \
    

    However, likely above result CNV's allele frequency was not calculated.
    for example my CNV output,

        1       766706  CNV_1_766706_768670     A       <CNV>   .       PASS    END=768670;GCFRACTION=0.46;GCLENGTH=1965;GLALTFREQ=NA;GLALTSUM=0.000;GLHETSUM=0.000;GLINBREEDINGCOEFF=NA;GLREFFREQ=NA;GLREFSUM=0.000;GSCALLRATE=0.984;GSCLUSTERSEP=5.62;GSCLUSTERSEPWEIGHTEDMEAN=6.17;GSCLUSTERSEPWEIGHTEDMEDIAN=5.36;GSCNALLELES=2;GSCNCATEGORY=DEL;GSCNDIST=5,14,42;GSCNMAX=2;GSCNMIN=0;GSCNQUAL=127.2247;GSDUPLICATEOVERLAP=NA;GSDUPLICATES=NA;GSDUPLICATESCORE=NA;GSELENGTH=1199;GSM1=0.903;GSM2=1.77;GSNNONREF=19;GSNONVARSCORE=NA;GSNVARIANT=19;GSVDJFRACTION=0.000;SVTYPE=CNV    GT:CN:CNF:CNL:CNP:CNQ:FT        .:1:1.0047:-45.77,-0.00,-4.68,-12.37:-46.24,-0.00,-4.21,-25.87:42.1:PASS        .:1:1.1277:-62.49,-0.00,-3.76,-11.65:-62.95,-0.00,-3.30,-25.16:33.0:PASS        .:1:0.9465:-41.50,-0.00,-5.31,-13.36:-41.97,-0.00,-4.85,-26.86:48.5:PASS        .:1:0.8702:-34.63,-0.00,-5.92,-14.11:-35.10,-0.00,-5.45,-27.62:54.5:PASS        .:2:1.9533:-165.87,-7.77,-0.00,-3.26:-166.80,-8.23,-0.00,-17.24:82.3:PASS       .:0:0.0000:-0.00,-8.47,-16.75,-24.96:-0.00,-8.01,-15.82,-38.01:80.1:PASS        .:2:2.0526:-158.43,-8.20,-0.00,-2.34:-159.36,-8.66,-0.00,-16.31:86.6:PASS       .:2:1.7345:-155.51,-5.09,-0.00,-5.27:-156.44,-5.55,-0.00,-19.24:55.5:PASS       .:2:1.7496:-159.81,-5.42,-0.00,-5.23:-160.74,-5.88,-0.00,-19.20:58.8:PASS       .:1:1.0837:-55.93,-0.00,-4.11,-11.92:-56.39,-0.00,-3.65,-25.42:36.5:PASS        
    .........
    

    'GT' is only described using '.', not ATCG or 0,1.
    So, How do CNV's allele frequency calculate ?

  • SunhyeSunhye KoreaMember

    Hi @bhandsaker .
    I understand by your detail explaination.
    Thanks bob!

Sign In or Register to comment.