How is the power calculated?

ignasiofignasiof Member
edited November 2012 in Ask the GATK team

When I run the following:

java -Xmx2g -jar muTect-1.0.27783.jar --analysis_type MuTect

I see a list of parameters, not fully documented. I noticed that the results are very sensitive to some of them. Precisely, I am most interested in the last three, which deal with the power calculation:


Could you please explain how the constants are used to calculate the power? Is the "absolute copy number data" necessary? Thank you.

  • ignasiofignasiof Member

    I have noticed that the power reported increases if I increase --power_constant_qscore. I have two questions: 1) what does the --power_constant_qscore mean? 2) Is the power reported in the --power_file the product tumor_power * normal_power, or is it either one of them?

  • ignasiofignasiof Member

    I'm sorry to be so insistent, but I'd really like to figure this out. I was hoping that the power estimates would help me measure the searchable space of the genome, and estimate the proportion of mutant sites in a less biased way than just counting the detected mutants and dividing by the number of sites analysed. The fact is that muTect is giving me very low estimates of the power, mainly driven by the power in the "tumor" sample. For example, some mutations, labelled as "KEEP", are discovered at sites where the power to detect the variant in the "tumor" sample was supposed to be 0. Similarly, the average power reported along the genome is about 0.005, which suggests that the mutations observed are just a tiny fraction of what there is. I doubt that's the case, and I am inclined to think that the power estimation is downwardly biased. I have played with the parameters with no luck.

    Could you please give me some hints of how the power is calculated, how reliable it is or what parameters I should be tuning? Thank you.

  • kcibulkcibul Cambridge, MAMember, Broadie, Dev ✭✭✭

    First -- thanks for you patience. We have recently published MuTect in Nature Biotechnology (see post in this forum) which describes exactly how power is calculated. To answer your specific questions:

    --power_constant_qscore is the constant base quality score used in the power calculation (e.g. Q30 bases)
    --power_file is as you described tumor_power * normal_power (if you have a normal)

    I'm surprised that you're finding so many mutations at sites that have zero power. If you can provide an example line of output (with headers) that would be helpful.

