Benchmarks

Simulations

Linear simulation

mgcpy.benchmarks.simulations.linear_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1)[source]

Function for generating a linear simulation.

Parameters
  • num_samp -- number of samples for the simulation

  • num_dim -- number of dimensions for the simulation

  • noise -- noise level of the simulation, defaults to 1

  • indep -- whether to sample x and y independently, defaults to false

  • low -- the lower limit of the data matrix, defaults to -1

  • high -- the upper limit of the data matrix, defaults to 1

Returns

the data matrix and a response array

Exponential simulation

mgcpy.benchmarks.simulations.exp_sim(num_samp, num_dim, noise=3, indep=False, low=0, high=3)[source]

Function for generating an exponential simulation.

Parameters
  • num_samp -- number of samples for the simulation

  • num_dim -- number of dimensions for the simulation

  • noise -- noise level of the simulation, defaults to 10

  • indep -- whether to sample x and y independently, defaults to false

  • low -- the lower limit of the data matrix, defaults to 0

  • high -- the upper limit of the data matrix, defaults to 3

Returns

the data matrix and a response array

Cubic simulation

mgcpy.benchmarks.simulations.cub_sim(num_samp, num_dim, noise=15, indep=False, low=-1, high=1, cub_coeff=array([-12, 48, 128]), scale=1e-05)[source]

Function for generating a cubic simulation.

Parameters
  • num_samp -- number of samples for the simulation

  • num_dim -- number of dimensions for the simulation

  • noise -- noise level of the simulation, defaults to 80

  • indep -- whether to sample x and y independently, defaults to False

  • low -- the lower limit of the data matrix, defaults to -1

  • high -- the upper limit of the data matrix, defaults to 1

  • cub_coeff -- coefficients of the cubic function where each value corresponds to the respective order coefficientj, defaults to [-12, 48, 128]

  • scale -- scaling center of the cubic, defaults to 1/3

Returns

the data matrix and a response array

Joint Normal simulation

mgcpy.benchmarks.simulations.joint_sim(num_samp, num_dim, noise=0.5)[source]

Function for generating a joint-normal simulation.

Parameters
  • num_samp -- number of samples for the simulation

  • num_dim -- number of dimensions for the simulation

  • noise -- noise level of the simulation, defaults to 0.5

Returns

the data matrix and a response array

Step simulation

mgcpy.benchmarks.simulations.step_sim(num_samp, num_dim, noise=0.1, indep=False, low=-1, high=1)[source]

Function for generating a joint-normal simulation.

Parameters
  • num_samp -- number of samples for the simulation

  • num_dim -- number of dimensions for the simulation

  • noise -- noise level of the simulation, defaults to 0.1

  • indep -- whether to sample x and y independently, defaults to false

  • low -- the lower limit of the data matrix, defaults to -1

  • high -- the upper limit of the data matrix, defaults to 1

Returns

the data matrix and a response array

Quadratic simulation

mgcpy.benchmarks.simulations.quad_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1, amp=5)[source]

Function for generating a quadratic simulation.

Parameters
  • num_samp -- number of samples for the simulation

  • num_dim -- number of dimensions for the simulation

  • noise -- noise level of the simulation, defaults to 1

  • indep -- whether to sample x and y independently, defaults to false

  • low -- the lower limit of the data matrix, defaults to -1

  • high -- the upper limit of the data matrix, defaults to 1

  • amp -- amplitude of the quadratic simulation, defaults to 5

Returns

the data matrix and a response array

W-Shaped simulation

mgcpy.benchmarks.simulations.w_sim(num_samp, num_dim, noise=0.5, indep=False, low=-1, high=1)[source]

Function for generating a w-shaped simulation.

Parameters
  • num_samp -- number of samples for the simulation

  • num_dim -- number of dimensions for the simulation

  • noise -- noise level of the simulation, defaults to 1

  • indep -- whether to sample x and y independently, defaults to false

  • low -- the lower limit of the data matrix, defaults to -1

  • high -- the upper limit of the data matrix, defaults to 1

Returns

the data matrix and a response array

Spiral simulation

mgcpy.benchmarks.simulations.spiral_sim(num_samp, num_dim, noise=0.4, low=0, high=5)[source]

Function for generating a spiral simulation.

Parameters
  • num_samp -- number of samples for the simulation

  • num_dim -- number of dimensions for the simulation

  • noise -- noise level of the simulation, defaults to 0.4

  • low -- the lower limit of the data matrix, defaults to 0

  • high -- the upper limit of the data matrix, defaults to 5

Returns

the data matrix and a response array

Uncorrelated Bernoulli simulation

mgcpy.benchmarks.simulations.ubern_sim(num_samp, num_dim, noise=0.05, bern_prob=0.5)[source]

Function for generating an uncorrelated bernoulli simulation.

Parameters
  • num_samp -- number of samples for the simulation

  • num_dim -- number of dimensions for the simulation

  • noise -- noise level of the simulation, defaults to 0.5

  • bern_prob -- the bernoulli probability, defaults to 0.5

Returns

the data matrix and a response array

Logarithmic simulation

mgcpy.benchmarks.simulations.log_sim(num_samp, num_dim, noise=3, indep=False, base=2)[source]

Function for generating a logarithmic simulation.

Parameters
  • num_samp -- number of samples for the simulation

  • num_dim -- number of dimensions for the simulation

  • noise -- noise level of the simulation, defaults to 1

  • indep -- whether to sample x and y independently, defaults to false

  • base -- the base of the log, defaults to 2

Returns

the data matrix and a response array

Nth Root simulation

mgcpy.benchmarks.simulations.root_sim(num_samp, num_dim, noise=0.25, indep=False, low=-1, high=1, n_root=4)[source]

Function for generating an nth root simulation.

Parameters
  • num_samp -- number of samples for the simulation

  • num_dim -- number of dimensions for the simulation

  • noise -- noise level of the simulation, defaults to 1

  • indep -- whether to sample x and y independently, defaults to false

  • low -- the lower limit of the data matrix, defaults to -1

  • high -- the upper limit of the data matrix, defaults to 1

  • n_root -- the root of the simulation, defaults to 4

Returns

the data matrix and a response array

Sinusoidal simulation

mgcpy.benchmarks.simulations.sin_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1, period=12.566370614359172)[source]

Function for generating a sinusoid simulation.

Note: For producing 4*pi and 16*pi simulations, change the period to the respective value.

Parameters
  • num_samp -- number of samples for the simulation

  • num_dim -- number of dimensions for the simulation

  • noise -- noise level of the simulation, defaults to 1

  • indep -- whether to sample x and y independently, defaults to false

  • low -- the lower limit of the data matrix, defaults to -1

  • high -- the upper limit of the data matrix, defaults to 1

  • period -- the period of the sine wave, defaults to 4*pi

Returns

the data matrix and a response array

Square/Diamond simulation

mgcpy.benchmarks.simulations.square_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1, period=-1.5707963267948966)[source]

Function for generating a square or diamond simulation.

Note: For producing square or diamond simulations, change the period to -pi/2 or -pi/4.

Parameters
  • num_samp -- number of samples for the simulation

  • num_dim -- number of dimensions for the simulation

  • noise -- noise level of the simulation, defaults to 0.05

  • indep -- whether to sample x and y independently, defaults to false

  • low -- the lower limit of the data matrix, defaults to -1

  • high -- the upper limit of the data matrix, defaults to 1

  • period -- the period of the sine and cosine square equation, defaults to 4*pi

Returns

the data matrix and a response array

Two Parabolas simulation

mgcpy.benchmarks.simulations.two_parab_sim(num_samp, num_dim, noise=2, low=-1, high=1, prob=0.5)[source]

Function for generating a two parabolas simulation.

Parameters
  • num_samp -- number of samples for the simulation

  • num_dim -- number of dimensions for the simulation

  • noise -- noise level of the simulation, defaults to 2

  • low -- the lower limit of the data matrix, defaults to -1

  • high -- the upper limit of the data matrix, defaults to 1

  • prob -- the binomial probability, defaults to 0.5

Returns

the data matrix and a response array

Circle/Ellipse simulation

mgcpy.benchmarks.simulations.circle_sim(num_samp, num_dim, noise=0.1, low=-1, high=1, radius=1)[source]

Function for generating a circle or ellipse simulation.

Note: For producing circle or ellipse simulations, change the radius to 1 or 5.

Parameters
  • num_samp -- number of samples for the simulation

  • num_dim -- number of dimensions for the simulation

  • noise -- noise level of the simulation, defaults to 0.4

  • low -- the lower limit of the data matrix, defaults to -1

  • high -- the upper limit of the data matrix, defaults to 1

  • radius -- the radius of the circle or ellipse, defaults to 1

Returns

the data matrix and a response array

Multiplicative Noise simulation

mgcpy.benchmarks.simulations.multi_noise_sim(num_samp, num_dim)[source]

Function for generating a multiplicative noise simulation.

Parameters
  • num_samp -- number of samples for the simulation

  • num_dim -- number of dimensions for the simulation

Returns

the data matrix and a response array

Multimodal Independence simulation

mgcpy.benchmarks.simulations.multi_indep_sim(num_samp, num_dim, prob=0.5, sep1=3, sep2=2)[source]

Function for generating a multimodal independence simulation.

Parameters
  • num_samp -- number of samples for the simulation

  • num_dim -- number of dimensions for the simulation

  • prob -- the binomial probability, defaults to 0.5

  • sep1 -- determines the size and separation of clusters, defaults to 3

  • sep2 -- determines the size and separation of clusters, defaults to 2

Returns

the data matrix and a response array

Power Computations

mgcpy.benchmarks.power.power(independence_test, sample_generator, num_samples=100, num_dimensions=1, noise=0.0, repeats=1000, alpha=0.05, simulation_type='')[source]

Estimate the power of an independence test given a simulator to sample from

Parameters
  • independence_test (Object(Independence_Test)) -- an object whose class inherits from the Independence_Test abstract class

  • sample_generator (FunctionType or callable()) -- a function used to generate simulation from simulations with parameters given by the following arguments - num_samples: default to 100 - num_dimensions: default to 1 - noise: default to 0

  • num_samples (int) -- the number of samples generated by the simulation (default to 100)

  • num_dimensions (int) -- the number of dimensions of the samples generated by the simulation (default to 1)

  • noise (float) -- the noise used in simulation (default to 0)

  • repeats (int) -- the number of times we generate new samples to estimate the null/alternative distribution (default to 1000)

  • alpha (float) -- the type I error level (default to 0.05)

  • simulation_type (string) -- specify simulation when necessary (default to empty string)

Return empirical_power

the estimated power

Return type

numpy.float

Example

>>> from mgcpy.benchmarks.power import power
>>> from mgcpy.independence_tests.mgc.mgc import MGC
>>> from mgcpy.benchmarks.simulations import circle_sim
>>> mgc = MGC()
>>> mgc_power = power(mgc, circle_sim, num_samples=100, num_dimensions=2, simulation_type='ellipse')