GATK somatic CNV pipeline NaNs

kmegqkmegq BroadMember, Broadie

Dear GATK Team,

I ran the GATK somatic CNV calling pipeline on Terra (v1.3.1 for the PoN and 1.3.0 for the somatic pair workflow) on unpaired canine tumor WGS data. I noticed that I am seeing runs of NaNs in portions of the ModelSegmentsTumor output (in the MINOR_ALLELE_FRACTION columns), corresponding to locations where NUM_POINTS_ALLELE_FRACTION is zero. Is this something that is expected, or a sign that there is a problem?

chr26   29292001    29467000    168 0   1.494626    1.511347    1.528258    NaN NaN NaN
chr26   29467001    29829000    325 0   0.802825    0.814096    0.828611    NaN NaN NaN
chr26   29829001    30150000    231 0   1.690182    1.700493    1.721356    NaN NaN NaN
chr26   30152001    30391000    188 0   0.830539    0.850131    0.868545    NaN NaN NaN

In this run, the list of chromosomes passed in for the intervals argument was not sorted numerically, so the tool plotted the chromosomes out of order. Would this cause any problems if the PoN was not in the same order?

Thank you for your help!



  • sleeslee Member, Broadie, Dev ✭✭✭

    @kmegq That's expected when NUM_POINTS_ALLELE_FRACTION is zero, as you may have guessed---we don't have any allele counts from SNP sites in those segments to use in our estimate of the minor allele fraction.

    At what point in the workflow (i.e., to which tool) did you pass intervals which were sorted in a different order from the PoN? I would expect that the PoN would throw an error at the DenoiseReadCounts step if the order of the intervals in the PoN did not match that for the read counts. In contrast, if you simply passed a sequence dictionary with a different order to the plotting tools, there shouldn't be any issues.

