Power¶
In this tutorial, we explore
The concept of power in hypothesis testing
Power computation in
mgcpy
Comparison of methods
Theory¶
Consider,
We wish to test:
For a testing procedure \(T\), we define \(\alpha_n\) to be the probability of Type I error. That is,
Similarly, we define \(\beta_n\) to be the probability of Type II error.
Finally, the power is defined as:
or the probability of correctly rejecting the null when the alternative is true. A common desideratum for a testing procedure is to have as high of a power as possible, subject to \(\alpha_n(T) \leq \alpha\), where \(\alpha\) is some specified “significance level”. When many alternatives are possible, power is a property of not only the test, but the particular distribution of the alternative. Implicitly, it depends on the sample size as well.
Power in mgcpy
¶
[1]:
from mgcpy.independence_tests.dcorr import DCorr
from mgcpy.independence_tests.rv_corr import RVCorr
from mgcpy.benchmarks.power import power
from mgcpy.benchmarks.simulations import linear_sim
mgcpy
comes in built with 20 simulation functions that model various types of dependencies that random variables can have (linear, spiral, sinusoidal, etc.) The power function takes an Independence_Test
object and function (that takes arguments num_samples
, num_dimensions
, and noise
) to simulate data. Using these, estimates the power of the test under the alternative posed by the simulation.
We first estimate the power of DCorr
and Pearson
on linearly related data. Without any noise, we expect this relationship to be perfectly discernable, i.e. a power of 1. For the following simulations we have sample size n = 100
and number of dimensions d = 1
.
[2]:
dcorr = DCorr()
pearson = RVCorr(which_test = 'pearson')
p = power(pearson, linear_sim)
q = power(dcorr, linear_sim)
print("The power of Pearson's correlation against a linear alternative is: %f" % p)
print("The power of DCorr against a linear alternative is: %f" % q)
The power of Pearson's correlation against a linear alternative is: 1.000000
The power of DCorr against a linear alternative is: 1.000000
By adding noise, we see a decrease in power of both tests.
[3]:
p = power(pearson, linear_sim, noise = 3.0)
q = power(dcorr, linear_sim, noise = 3.0)
print("The power of Pearson's correlation against a linear alternative is: %f" % p)
print("The power of DCorr against a linear alternative is: %f" % q)
The power of Pearson's correlation against a linear alternative is: 0.507000
The power of DCorr against a linear alternative is: 0.439000
When we change the simulation to a highly nonlinearly related distribution, such as a spiral, Pearson’s correlation is incomporable to DCorr
. Similarly, MGC
will have high power in this nonlinear setting than even DCorr
.
[4]:
from mgcpy.independence_tests.mgc import MGC
from mgcpy.benchmarks.simulations import spiral_sim
mgc = MGC()
p = power(pearson, spiral_sim)
q = power(dcorr, spiral_sim)
r = power(mgc, spiral_sim)
print("The power of Pearson's correlation against a spiral alternative is: %f" % p)
print("The power of DCorr against a spiral alternative is: %f" % q)
print("The power of MGC against a spiral alternative is: %f" % r)
The power of Pearson's correlation against a spiral alternative is: 0.130000
The power of DCorr against a spiral alternative is: 0.304000
The power of MGC against a spiral alternative is: 1.000000
Finally, we present a high-dimensional square shape at low sample size to show the effectiveness of MGC
in such a setting.
[5]:
from mgcpy.benchmarks.simulations import square_sim
d = 20
n = 30
p = power(pearson, square_sim, num_samples = n, noise = 1, num_dimensions = d)
q = power(dcorr, square_sim, num_samples = n, noise = 1, num_dimensions = d)
r = power(mgc, square_sim, num_samples = n, noise = 1, num_dimensions = d)
print("The power of Pearson's correlation against a square alternative at n = %d and d = %d is: %f" % (n, d, p))
print("The power of DCorr correlation against a square alternative at n = %d and d = %d is: %f" % (n, d, q))
print("The power of MGC correlation against a square alternative at n = %d and d = %d is: %f" % (n, d, r))
The power of Pearson's correlation against a square alternative at n = 30 and d = 20 is: 0.056000
The power of DCorr correlation against a square alternative at n = 30 and d = 20 is: 0.040000
The power of MGC correlation against a square alternative at n = 30 and d = 20 is: 0.059000