Independence Tests

Multiscale Graph Correlation (MGC)

Main MGC Independence Test Module

class mgcpy.independence_tests.mgc.MGC(compute_distance_matrix=None, base_global_correlation='mgc')[source]
Parameters
  • compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix

  • base_global_correlation (string) -- specifies which global correlation to build up-on, including 'mgc','dcor','mantel', and 'rank'. Defaults to mgc.

Methods

get_name(self)

return

the name of the independence test

p_value(self, matrix_X, matrix_Y[, ...])

Tests independence between two datasets using MGC and permutation test.

p_value_block(self, matrix_X, matrix_Y[, ...])

Tests independence between two datasets using block permutation test.

test_statistic(self, matrix_X, matrix_Y[, ...])

Computes the MGC measure between two datasets.

test_statistic(self, matrix_X, matrix_Y, is_fast=False, fast_mgc_data={})[source]

Computes the MGC measure between two datasets.

  • It first computes all the local correlations

  • Then, it returns the maximal statistic among all local correlations based on thresholding.

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*q] data matrix, a matrix with n samples in q dimensions

  • is_fast (boolean) -- is a boolean flag which specifies if the test_statistic should be computed (approximated) using the fast version of mgc. This defaults to False.

  • fast_mgc_data (dictonary) --

    a dict of fast mgc params, refer: self._fast_mgc_test_statistic

    • sub_samples

      specifies the number of subsamples.

Returns

returns a list of two items, that contains:

  • test_statistic

    the sample MGC statistic within [-1, 1]

  • independence_test_metadata

    a dict of metadata with the following keys: - :local_correlation_matrix: a 2D matrix of all local correlations within [-1,1] - :optimal_scale: the estimated optimal scale as an [x, y] pair.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc import MGC
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc = MGC()
>>> mgc_statistic, test_statistic_metadata = mgc.test_statistic(X, Y)
p_value(self, matrix_X, matrix_Y, replication_factor=1000, is_fast=False, fast_mgc_data={})[source]

Tests independence between two datasets using MGC and permutation test.

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*q] data matrix, a matrix with n samples in q dimensions

  • replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

  • is_fast (boolean) -- is a boolean flag which specifies if the p_value should be computed (approximated) using the fast version of mgc. This defaults to False.

  • fast_mgc_data (dictonary) --

    a dict of fast mgc params, , refer: self._fast_mgc_p_value

    • sub_samples

      specifies the number of subsamples.

Returns

returns a list of two items, that contains:

  • p_value

    P-value of MGC

  • metadata

    a dict of metadata with the following keys:

    • test_statistic

      the sample MGC statistic within [-1, 1]

    • p_local_correlation_matrix

      a 2D matrix of the P-values of the local correlations

    • local_correlation_matrix

      a 2D matrix of all local correlations within [-1,1]

    • optimal_scale

      the estimated optimal scale as an [x, y] pair.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc import MGC
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc = MGC()
>>> p_value, metadata = mgc.p_value(X, Y, replication_factor = 100)
get_name(self)
Returns

the name of the independence test

Return type

string

p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)

Tests independence between two datasets using block permutation test.

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*q] data matrix, a matrix with n samples in q dimensions

  • replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

  • p_value

    P-value of MGC

  • metadata

    a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)

MGC Time Series

class mgcpy.independence_tests.mgcx.MGCX(compute_distance_matrix=None, max_lag=0)[source]
Parameters
  • compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix

  • base_global_correlation (string) -- specifies which global correlation to build up-on, including 'mgc','dcor','mantel', and 'rank'. Defaults to mgc.

  • max_lag (int) -- Furthest lag to check for dependence.

Methods

get_name(self)

return

the name of the independence test

p_value(self, matrix_X, matrix_Y[, ...])

Tests independence between two datasets using MGC_TS and block permutation test.

p_value_block(self, matrix_X, matrix_Y[, ...])

Tests independence between two datasets using block permutation test.

test_statistic(self, matrix_X, matrix_Y[, p])

Computes the MGCX measure between two time series datasets.

test_statistic(self, matrix_X, matrix_Y, p=None)[source]

Computes the MGCX measure between two time series datasets.

  • It first computes all the local correlations

  • Then, it returns the maximal statistic among all local correlations based on thresholding.

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*q] data matrix, a matrix with n samples in q dimensions

  • p (float) -- bandwidth parameter for Bartlett Kernel.

Returns

returns a list of two items, that contains:

  • test_statistic

    the sample mgc_ts statistic (not necessarily within [-1,1])

  • test_statistic_metadata

    a dict of metadata with the following keys: - :dist_mtx_X: the distance matrix of sample X - :dist_mtx_Y: the distance matrix of sample X

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc import MGC
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> mgc_ts_statistic, test_statistic_metadata = mgc.test_statistic(X, Y)
p_value(self, matrix_X, matrix_Y, replication_factor=1000)[source]

Tests independence between two datasets using MGC_TS and block permutation test.

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*q] data matrix, a matrix with n samples in q dimensions

  • replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

  • p_value

    P-value of MGC

  • metadata

    a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)
get_name(self)
Returns

the name of the independence test

Return type

string

p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)

Tests independence between two datasets using block permutation test.

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*q] data matrix, a matrix with n samples in q dimensions

  • replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

  • p_value

    P-value of MGC

  • metadata

    a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)

Biased and Unbiased Distance Correlation (Dcorr) and Mantel

class mgcpy.independence_tests.dcorr.DCorr(compute_distance_matrix=None, which_test='unbiased', is_paired=False)[source]
Parameters
  • compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix

  • which_test (string) -- the type of global correlation to use, can be 'unbiased', 'biased' 'mantel'

Methods

compute_global_covariance(self, dist_mtx_X, ...)

Helper function: Compute the global covariance using distance matrix A and B

get_name(self)

return

the name of the independence test

p_value(self, matrix_X, matrix_Y[, ...])

Compute the p-value if the correlation test is unbiased, p-value can be computed using a t test otherwise computed using permutation test

p_value_block(self, matrix_X, matrix_Y[, ...])

Tests independence between two datasets using block permutation test.

test_statistic(self, matrix_X, matrix_Y[, ...])

Computes the distance correlation between two datasets.

unbiased_T(self, matrix_X, matrix_Y)

Helper function: Compute the t-test statistic for unbiased dcorr

test_statistic(self, matrix_X, matrix_Y, is_fast=False, fast_dcorr_data={})[source]

Computes the distance correlation between two datasets.

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*d] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*d] data matrix, a matrix with n samples in q dimensions

  • is_fast (boolean) -- is a boolean flag which specifies if the test_statistic should be computed (approximated) using the fast version of dcorr. This defaults to False.

  • fast_dcorr_data (dictonary) --

    a dict of fast dcorr params, refer: self._fast_dcorr_test_statistic

    • sub_samples

      specifies the number of subsamples.

Returns

returns a list of two items, that contains:

  • test_statistic

    the sample dcorr statistic within [-1, 1]

  • independence_test_metadata

    a dict of metadata with the following keys: - :variance_X: the variance of the data matrix X - :variance_Y: the variance of the data matrix Y

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.dcorr import DCorr
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> dcorr = DCorr(which_test = 'unbiased')
>>> dcorr_statistic, test_statistic_metadata = dcorr.test_statistic(X, Y)
compute_global_covariance(self, dist_mtx_X, dist_mtx_Y)[source]

Helper function: Compute the global covariance using distance matrix A and B

Parameters
  • dist_mtx_X (2D numpy.array) -- a [n*n] distance matrix

  • dist_mtx_Y (2D numpy.array) -- a [n*n] distance matrix

Returns

the data covariance or variance based on the distance matrices

Return type

numpy.float

unbiased_T(self, matrix_X, matrix_Y)[source]

Helper function: Compute the t-test statistic for unbiased dcorr

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*d] matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*d] matrix, a matrix with n samples in q dimensions

Returns

test statistic of t-test for unbiased dcorr

Return type

numpy.float

p_value(self, matrix_X, matrix_Y, replication_factor=1000, is_fast=False, fast_dcorr_data={})[source]

Compute the p-value if the correlation test is unbiased, p-value can be computed using a t test otherwise computed using permutation test

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*d] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*d] data matrix, a matrix with n samples in q dimensions

  • replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

  • is_fast (boolean) -- is a boolean flag which specifies if the test_statistic should be computed (approximated) using the fast version of dcorr. This defaults to False.

  • fast_dcorr_data (dictonary) --

    a dict of fast dcorr params, refer: self._fast_dcorr_test_statistic

    • sub_samples

      specifies the number of subsamples.

Returns

p-value of distance correlation

Return type

numpy.float

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.dcorr import DCorr
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> dcorr = DCorr()
>>> p_value, metadata = dcorr.p_value(X, Y, replication_factor = 100)
get_name(self)
Returns

the name of the independence test

Return type

string

p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)

Tests independence between two datasets using block permutation test.

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*q] data matrix, a matrix with n samples in q dimensions

  • replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

  • p_value

    P-value of MGC

  • metadata

    a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)

Dcorr Time Series

class mgcpy.independence_tests.dcorrx.DCorrX(compute_distance_matrix=None, which_test='unbiased', max_lag=0)[source]
Parameters
  • compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix

  • which_test (string) -- the type of distance covariance estimate to use, can be 'unbiased', 'biased' 'mantel'

  • max_lag (int) -- Maximum lead/lag to check for dependence between X_t and Y_t+j (M parameter)

Methods

get_name(self)

return

the name of the independence test

p_value(self, matrix_X, matrix_Y[, ...])

Compute the p-value if the correlation test is unbiased, p-value can be computed using a t test otherwise computed using permutation test

p_value_block(self, matrix_X, matrix_Y[, ...])

Tests independence between two datasets using block permutation test.

test_statistic(self, matrix_X, matrix_Y[, p])

Computes the (summed across lags) cross distance covariance estimate between two time series.

test_statistic(self, matrix_X, matrix_Y, p=None)[source]

Computes the (summed across lags) cross distance covariance estimate between two time series.

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*q] data matrix, a matrix with n samples in q dimensions

  • p (float) -- bandwidth parameter for Bartlett Kernel.

Returns

returns a list of two items, that contains:

  • test_statistic

    the sample cdcv statistic (not necessarily within [-1,1])

  • test_statistic_metadata

    a dict of metadata with the following keys: - :dist_mtx_X: the distance matrix of sample X - :dist_mtx_Y: the distance matrix of sample X

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.dcorr import DCorr
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> cdcv = CDCV(which_test = 'unbiased')
>>> cdcv_statistic = cdcv.test_statistic(X, Y)
p_value(self, matrix_X, matrix_Y, replication_factor=1000)[source]

Compute the p-value if the correlation test is unbiased, p-value can be computed using a t test otherwise computed using permutation test

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*d] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*d] data matrix, a matrix with n samples in q dimensions

  • replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

p-value of distance correlation

Return type

numpy.float

Returns

returns a list of two items, that contains:

  • p_value

    ta numpy.float containing the p-value of the observed test statistic.

  • p_value_metadata

    a dict of metadata with the following keys: - :null_distribution: the estimated (discrete) distribution of the test statistic

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.dcorr import DCorr
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> cdcv = CDCV()
>>> p_value, metadata = dcorr.p_value(X, Y, replication_factor = 100)
get_name(self)
Returns

the name of the independence test

Return type

string

p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)

Tests independence between two datasets using block permutation test.

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*q] data matrix, a matrix with n samples in q dimensions

  • replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

  • p_value

    P-value of MGC

  • metadata

    a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)

Heller Heller Gorfine (HHG)

class mgcpy.independence_tests.hhg.HHG(compute_distance_matrix=None)[source]
Parameters

compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix

Methods

get_name(self)

return

the name of the independence test

p_value(self[, matrix_X, matrix_Y, ...])

Tests independence between two datasets using HHG and permutation test.

p_value_block(self, matrix_X, matrix_Y[, ...])

Tests independence between two datasets using block permutation test.

test_statistic(self, matrix_X, matrix_Y)

Computes the HHG correlation measure between two datasets.

test_statistic(self, matrix_X, matrix_Y)[source]

Computes the HHG correlation measure between two datasets.

Parameters
  • matrix_X (2D numpy.array) -- a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) -- a [n*q] data matrix, a matrix with n samples in q dimensions

  • replication_factor (int) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

Return type

float, dict

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.hhg import HHG
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
              0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
              1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> hhg = HHG()
>>> hhg_test_stat = hhg.test_statistic(X, Y)
p_value(self, matrix_X=None, matrix_Y=None, replication_factor=1000)[source]

Tests independence between two datasets using HHG and permutation test.

Parameters
  • matrix_X (2D numpy.array) -- a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) -- a [n*q] data matrix, a matrix with n samples in q dimensions

  • replication_factor (int) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

  • p_value_

    P-value

  • p_value_metadata_

    (optional) a dict of metadata other than the p_value, that the independence tests computes in the process

Return type

float, dict

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.hhg import HHG
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
              0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
              1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> hhg = HHG()
>>> hhg_p_value = hhg.p_value(X, Y)
get_name(self)
Returns

the name of the independence test

Return type

string

p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)

Tests independence between two datasets using block permutation test.

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*q] data matrix, a matrix with n samples in q dimensions

  • replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

  • p_value

    P-value of MGC

  • metadata

    a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)

Kendall and Spearman

class mgcpy.independence_tests.kendall_spearman.KendallSpearman(compute_distance_matrix=None, which_test='kendall')[source]
Parameters
  • compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix

  • which_test (str) -- specifies which test to use, including 'kendall' or 'spearman'

Methods

get_name(self)

return

the name of the independence test

p_value(self, matrix_X, matrix_Y[, ...])

Tests independence between two datasets using the independence test.

p_value_block(self, matrix_X, matrix_Y[, ...])

Tests independence between two datasets using block permutation test.

test_statistic(self, matrix_X, matrix_Y)

Computes the Spearman's rho or Kendall's tau measure between two datasets.

test_statistic(self, matrix_X, matrix_Y)[source]

Computes the Spearman's rho or Kendall's tau measure between two datasets. - Implments scipy.stats's implementation for both

Parameters
  • matrix_X (1D numpy.array) -- a [n*1] data matrix, a matrix with n samples in 1 dimension

  • matrix_Y (1D numpy.array) -- a [n*1] data matrix, a matrix with n samples in 1 dimension

Returns

returns a list of two items, that contains:

Return type

float, dict

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.kendall_spearman import KendallSpearman
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
              0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
              1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> kendall_spearman = KendallSpearman()
>>> kendall_spearman_stat = kendall_spearman.test_statistic(X, Y)
p_value(self, matrix_X, matrix_Y, replication_factor=1000)[source]

Tests independence between two datasets using the independence test.

Parameters
  • matrix_X (2D numpy.array) -- a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) -- a [n*q] data matrix, a matrix with n samples in q dimensions

  • replication_factor (int) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

  • p_value_

    P-value

  • p_value_metadata_

    (optional) a dict of metadata other than the p_value, that the independence tests computes in the process

Return type

float, dict

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.kendall_spearman import KendallSpearman
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
              0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
              1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> kendall_spearman = KendallSpearman()
>>> kendall_spearman_p_value = kendall_spearman.p_value(X, Y)
get_name(self)
Returns

the name of the independence test

Return type

string

p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)

Tests independence between two datasets using block permutation test.

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*q] data matrix, a matrix with n samples in q dimensions

  • replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

  • p_value

    P-value of MGC

  • metadata

    a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)

Multivariate Distance Matrix Regression (MDMR)

Main MDMR Independence Test Module

class mgcpy.independence_tests.mdmr.MDMR(compute_distance_matrix=None)[source]
Parameters

compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix

Methods

get_name(self)

return

the name of the independence test

ind_p_value(self, matrix_X, matrix_Y[, ...])

Individual predictor variable p-values calculation

p_value(self, matrix_X, matrix_Y[, ...])

Tests independence between two datasets using MGC and permutation test.

p_value_block(self, matrix_X, matrix_Y[, ...])

Tests independence between two datasets using block permutation test.

test_statistic(self, matrix_X, matrix_Y[, ...])

Computes MDMR Pseudo-F statistic between two datasets.

get_name(self)[source]
Returns

the name of the independence test

Return type

string

test_statistic(self, matrix_X, matrix_Y, permutations=0, individual=0, disttype='cityblock')[source]

Computes MDMR Pseudo-F statistic between two datasets.

  • It first takes the distance matrix of Y (by )

  • Next it regresses X into a portion due to Y and a portion due to residual

  • The p-value is for the null hypothesis that the variable of X is not correlated with Y's distance matrix

Parameters
  • data_matrix_X (2D numpy.array) --

    (optional, default picked from class attr) is interpreted as:

    • a [n*d] data matrix, a matrix with n samples in d dimensions

  • data_matrix_Y (2D numpy.array) --

    (optional, default picked from class attr) is interpreted as:

    • a [n*d] data matrix, a matrix with n samples in d dimensions

  • 'individual' -- -integer, 0 or 1 with value 0 tests the entire X matrix (default) with value 1 tests the entire X matrix and then each predictor variable individually

Returns

with individual = 0, returns 1 values, with individual = 1 returns 2 values, containing:

-the test statistic of the entire X matrix -for individual = 1, an array with the variable of X in the first column,

the test statistic in the second, and the permutation p-value in the third (which here will always be 1)

Return type

list

p_value(self, matrix_X, matrix_Y, replication_factor=1000)[source]

Tests independence between two datasets using MGC and permutation test.

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as:

    • a [n*d] data matrix, a matrix with n samples in d dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as:

    • a [n*d] data matrix, a matrix with n samples in d dimensions

  • replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items,that contains:

  • p_value

    P-value of MGC

  • p_value_metadata

Return type

list

ind_p_value(self, matrix_X, matrix_Y, permutations=1000, individual=1, disttype='cityblock')[source]

Individual predictor variable p-values calculation

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as:

    • a [n*d] data matrix, a matrix with n samples in d dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as:

    • a [n*d] data matrix, a matrix with n samples in d dimensions

p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)

Tests independence between two datasets using block permutation test.

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*q] data matrix, a matrix with n samples in q dimensions

  • replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

  • p_value

    P-value of MGC

  • metadata

    a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)

Pearson's Correlation, RV, Canonical Analysis (CCA)

class mgcpy.independence_tests.rv_corr.RVCorr(compute_distance_matrix=None, which_test='rv')[source]
Parameters
  • compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix

  • which_test (str) -- specifies which test to use, including 'rv', 'pearson', and 'cca'.

Methods

get_name(self)

return

the name of the independence test

p_value(self, matrix_X, matrix_Y[, ...])

Tests independence between two datasets using the independence test.

p_value_block(self, matrix_X, matrix_Y[, ...])

Tests independence between two datasets using block permutation test.

test_statistic(self[, matrix_X, matrix_Y])

Computes the Pearson/RV/CCa correlation measure between two datasets.

test_statistic(self, matrix_X=None, matrix_Y=None)[source]

Computes the Pearson/RV/CCa correlation measure between two datasets.

  • Default computes linear correlation for RV

  • Computes pearson's correlation

  • Calculates local linear correlations for CCa

Parameters
  • matrix_X (2D numpy.array) -- a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) -- a [n*q] data matrix, a matrix with n samples in q dimensions

  • replication_factor (int) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

Return type

float, dict

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.rv_corr import RVCorr
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
              0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
              1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> rvcorr = RVCorr()
>>> rvcorr_test_stat = rvcorr.test_statistic(X, Y)
p_value(self, matrix_X, matrix_Y, replication_factor=1000)[source]

Tests independence between two datasets using the independence test.

Parameters
  • matrix_X (2D numpy.array) -- a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) -- a [n*q] data matrix, a matrix with n samples in q dimensions

  • replication_factor (int) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

  • p_value_

    P-value

  • p_value_metadata_

    (optional) a dict of metadata other than the p_value, that the independence tests computes in the process

Return type

float, dict

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.rv_corr import RVCorr
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
              0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
              1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> rvcorr = RVCorr()
>>> rvcorr_p_value = rvcorr.p_value(X, Y)
get_name(self)
Returns

the name of the independence test

Return type

string

p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)

Tests independence between two datasets using block permutation test.

Parameters
  • matrix_X (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) --

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*q] data matrix, a matrix with n samples in q dimensions

  • replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

  • p_value

    P-value of MGC

  • metadata

    a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)