Independence Tests¶

Multiscale Graph Correlation (MGC)¶

Main MGC Independence Test Module

class mgcpy.independence_tests.mgc.MGC(compute_distance_matrix=None, base_global_correlation='mgc')[source]¶

Parameters

compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix
base_global_correlation (string) -- specifies which global correlation to build up-on, including 'mgc','dcor','mantel', and 'rank'. Defaults to mgc.

Methods

`get_name`(self)	return the name of the independence test
`p_value`(self, matrix_X, matrix_Y[, ...])	Tests independence between two datasets using MGC and permutation test.
`p_value_block`(self, matrix_X, matrix_Y[, ...])	Tests independence between two datasets using block permutation test.
`test_statistic`(self, matrix_X, matrix_Y[, ...])	Computes the MGC measure between two datasets.

test_statistic(self, matrix_X, matrix_Y, is_fast=False, fast_mgc_data={})[source]¶

Computes the MGC measure between two datasets.

It first computes all the local correlations

Then, it returns the maximal statistic among all local correlations based on thresholding.

Parameters

matrix_X (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*q] data matrix, a matrix with n samples in q dimensions
is_fast (boolean) -- is a boolean flag which specifies if the test_statistic should be computed (approximated) using the fast version of mgc. This defaults to False.
fast_mgc_data (dictonary) --
a dict of fast mgc params, refer: self._fast_mgc_test_statistic
- sub_samples
  
  specifies the number of subsamples.

Returns

returns a list of two items, that contains:

test_statistic

the sample MGC statistic within [-1, 1]
independence_test_metadata

a dict of metadata with the following keys: - :local_correlation_matrix: a 2D matrix of all local correlations within [-1,1] - :optimal_scale: the estimated optimal scale as an [x, y] pair.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc import MGC
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc = MGC()
>>> mgc_statistic, test_statistic_metadata = mgc.test_statistic(X, Y)

p_value(self, matrix_X, matrix_Y, replication_factor=1000, is_fast=False, fast_mgc_data={})[source]¶

Tests independence between two datasets using MGC and permutation test.

Parameters

matrix_X (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*q] data matrix, a matrix with n samples in q dimensions
replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.
is_fast (boolean) -- is a boolean flag which specifies if the p_value should be computed (approximated) using the fast version of mgc. This defaults to False.
fast_mgc_data (dictonary) --
a dict of fast mgc params, , refer: self._fast_mgc_p_value
- sub_samples
  
  specifies the number of subsamples.

Returns

returns a list of two items, that contains:

p_value

P-value of MGC
metadata
a dict of metadata with the following keys:
- test_statistic
  
  the sample MGC statistic within [-1, 1]
- p_local_correlation_matrix
  
  a 2D matrix of the P-values of the local correlations
- local_correlation_matrix
  
  a 2D matrix of all local correlations within [-1,1]
- optimal_scale
  
  the estimated optimal scale as an [x, y] pair.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc import MGC
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc = MGC()
>>> p_value, metadata = mgc.p_value(X, Y, replication_factor = 100)

get_name(self)¶

Returns: the name of the independence test
Return type: string

p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)¶

Tests independence between two datasets using block permutation test.

Parameters

matrix_X (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*q] data matrix, a matrix with n samples in q dimensions
replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

p_value

P-value of MGC
metadata

a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)

MGC Time Series¶

class mgcpy.independence_tests.mgcx.MGCX(compute_distance_matrix=None, max_lag=0)[source]¶

Parameters

compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix
base_global_correlation (string) -- specifies which global correlation to build up-on, including 'mgc','dcor','mantel', and 'rank'. Defaults to mgc.
max_lag (int) -- Furthest lag to check for dependence.

Methods

`get_name`(self)	return the name of the independence test
`p_value`(self, matrix_X, matrix_Y[, ...])	Tests independence between two datasets using MGC_TS and block permutation test.
`p_value_block`(self, matrix_X, matrix_Y[, ...])	Tests independence between two datasets using block permutation test.
`test_statistic`(self, matrix_X, matrix_Y[, p])	Computes the MGCX measure between two time series datasets.

test_statistic(self, matrix_X, matrix_Y, p=None)[source]¶

Computes the MGCX measure between two time series datasets.

It first computes all the local correlations

Then, it returns the maximal statistic among all local correlations based on thresholding.

Parameters

matrix_X (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*q] data matrix, a matrix with n samples in q dimensions
p (float) -- bandwidth parameter for Bartlett Kernel.

Returns

returns a list of two items, that contains:

test_statistic

the sample mgc_ts statistic (not necessarily within [-1,1])
test_statistic_metadata

a dict of metadata with the following keys: - :dist_mtx_X: the distance matrix of sample X - :dist_mtx_Y: the distance matrix of sample X

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc import MGC
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> mgc_ts_statistic, test_statistic_metadata = mgc.test_statistic(X, Y)

p_value(self, matrix_X, matrix_Y, replication_factor=1000)[source]¶

Tests independence between two datasets using MGC_TS and block permutation test.

Parameters

matrix_X (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*q] data matrix, a matrix with n samples in q dimensions
replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

p_value

P-value of MGC
metadata

a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)

get_name(self)¶

Returns: the name of the independence test
Return type: string

p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)¶

Tests independence between two datasets using block permutation test.

Parameters

matrix_X (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*q] data matrix, a matrix with n samples in q dimensions
replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

p_value

P-value of MGC
metadata

a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)

Biased and Unbiased Distance Correlation (Dcorr) and Mantel¶

class mgcpy.independence_tests.dcorr.DCorr(compute_distance_matrix=None, which_test='unbiased', is_paired=False)[source]¶

Parameters

compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix
which_test (string) -- the type of global correlation to use, can be 'unbiased', 'biased' 'mantel'

Methods

`compute_global_covariance`(self, dist_mtx_X, ...)	Helper function: Compute the global covariance using distance matrix A and B
`get_name`(self)	return the name of the independence test
`p_value`(self, matrix_X, matrix_Y[, ...])	Compute the p-value if the correlation test is unbiased, p-value can be computed using a t test otherwise computed using permutation test
`p_value_block`(self, matrix_X, matrix_Y[, ...])	Tests independence between two datasets using block permutation test.
`test_statistic`(self, matrix_X, matrix_Y[, ...])	Computes the distance correlation between two datasets.
`unbiased_T`(self, matrix_X, matrix_Y)	Helper function: Compute the t-test statistic for unbiased dcorr

test_statistic(self, matrix_X, matrix_Y, is_fast=False, fast_dcorr_data={})[source]¶

Computes the distance correlation between two datasets.

Parameters

matrix_X (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*d] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*d] data matrix, a matrix with n samples in q dimensions
is_fast (boolean) -- is a boolean flag which specifies if the test_statistic should be computed (approximated) using the fast version of dcorr. This defaults to False.
fast_dcorr_data (dictonary) --
a dict of fast dcorr params, refer: self._fast_dcorr_test_statistic
- sub_samples
  
  specifies the number of subsamples.

Returns

returns a list of two items, that contains:

test_statistic

the sample dcorr statistic within [-1, 1]
independence_test_metadata

a dict of metadata with the following keys: - :variance_X: the variance of the data matrix X - :variance_Y: the variance of the data matrix Y

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.dcorr import DCorr
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> dcorr = DCorr(which_test = 'unbiased')
>>> dcorr_statistic, test_statistic_metadata = dcorr.test_statistic(X, Y)

compute_global_covariance(self, dist_mtx_X, dist_mtx_Y)[source]¶

Helper function: Compute the global covariance using distance matrix A and B

Parameters

dist_mtx_X (2D numpy.array) -- a [n*n] distance matrix
dist_mtx_Y (2D numpy.array) -- a [n*n] distance matrix

Returns

the data covariance or variance based on the distance matrices

Return type

numpy.float

unbiased_T(self, matrix_X, matrix_Y)[source]¶

Helper function: Compute the t-test statistic for unbiased dcorr

Parameters

matrix_X (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*d] matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*d] matrix, a matrix with n samples in q dimensions

Returns

test statistic of t-test for unbiased dcorr

Return type

numpy.float

p_value(self, matrix_X, matrix_Y, replication_factor=1000, is_fast=False, fast_dcorr_data={})[source]¶

Compute the p-value if the correlation test is unbiased, p-value can be computed using a t test otherwise computed using permutation test

Parameters

matrix_X (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*d] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*d] data matrix, a matrix with n samples in q dimensions
replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.
is_fast (boolean) -- is a boolean flag which specifies if the test_statistic should be computed (approximated) using the fast version of dcorr. This defaults to False.
fast_dcorr_data (dictonary) --
a dict of fast dcorr params, refer: self._fast_dcorr_test_statistic
- sub_samples
  
  specifies the number of subsamples.

Returns

p-value of distance correlation

Return type

numpy.float

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.dcorr import DCorr
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> dcorr = DCorr()
>>> p_value, metadata = dcorr.p_value(X, Y, replication_factor = 100)

get_name(self)¶

Returns: the name of the independence test
Return type: string

p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)¶

Tests independence between two datasets using block permutation test.

Parameters

matrix_X (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*q] data matrix, a matrix with n samples in q dimensions
replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

p_value

P-value of MGC
metadata

a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)

Dcorr Time Series¶

class mgcpy.independence_tests.dcorrx.DCorrX(compute_distance_matrix=None, which_test='unbiased', max_lag=0)[source]¶

Parameters

compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix
which_test (string) -- the type of distance covariance estimate to use, can be 'unbiased', 'biased' 'mantel'
max_lag (int) -- Maximum lead/lag to check for dependence between X_t and Y_t+j (M parameter)

Methods

`get_name`(self)	return the name of the independence test
`p_value`(self, matrix_X, matrix_Y[, ...])	Compute the p-value if the correlation test is unbiased, p-value can be computed using a t test otherwise computed using permutation test
`p_value_block`(self, matrix_X, matrix_Y[, ...])	Tests independence between two datasets using block permutation test.
`test_statistic`(self, matrix_X, matrix_Y[, p])	Computes the (summed across lags) cross distance covariance estimate between two time series.

test_statistic(self, matrix_X, matrix_Y, p=None)[source]¶

Computes the (summed across lags) cross distance covariance estimate between two time series.

Parameters

matrix_X (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*q] data matrix, a matrix with n samples in q dimensions
p (float) -- bandwidth parameter for Bartlett Kernel.

Returns

returns a list of two items, that contains:

test_statistic

the sample cdcv statistic (not necessarily within [-1,1])
test_statistic_metadata

a dict of metadata with the following keys: - :dist_mtx_X: the distance matrix of sample X - :dist_mtx_Y: the distance matrix of sample X

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.dcorr import DCorr
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> cdcv = CDCV(which_test = 'unbiased')
>>> cdcv_statistic = cdcv.test_statistic(X, Y)

p_value(self, matrix_X, matrix_Y, replication_factor=1000)[source]¶

Compute the p-value if the correlation test is unbiased, p-value can be computed using a t test otherwise computed using permutation test

Parameters

matrix_X (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*d] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*d] data matrix, a matrix with n samples in q dimensions
replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

p-value of distance correlation

Return type

numpy.float

Returns

returns a list of two items, that contains:

p_value

ta numpy.float containing the p-value of the observed test statistic.
p_value_metadata

a dict of metadata with the following keys: - :null_distribution: the estimated (discrete) distribution of the test statistic

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.dcorr import DCorr
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> cdcv = CDCV()
>>> p_value, metadata = dcorr.p_value(X, Y, replication_factor = 100)

get_name(self)¶

Returns: the name of the independence test
Return type: string

p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)¶

Tests independence between two datasets using block permutation test.

Parameters

matrix_X (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*q] data matrix, a matrix with n samples in q dimensions
replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

p_value

P-value of MGC
metadata

a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)

Heller Heller Gorfine (HHG)¶

class mgcpy.independence_tests.hhg.HHG(compute_distance_matrix=None)[source]¶

Parameters: compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix

Methods

`get_name`(self)	return the name of the independence test
`p_value`(self[, matrix_X, matrix_Y, ...])	Tests independence between two datasets using HHG and permutation test.
`p_value_block`(self, matrix_X, matrix_Y[, ...])	Tests independence between two datasets using block permutation test.
`test_statistic`(self, matrix_X, matrix_Y)	Computes the HHG correlation measure between two datasets.

test_statistic(self, matrix_X, matrix_Y)[source]¶

Computes the HHG correlation measure between two datasets.

Parameters

matrix_X (2D numpy.array) -- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) -- a [n*q] data matrix, a matrix with n samples in q dimensions
replication_factor (int) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

test_statistic_

test statistic
test_statistic_metadata_

(optional) a dict of metadata other than the p_value, that the independence tests computes in the process

Return type

float, dict

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.hhg import HHG

>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
              0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
              1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> hhg = HHG()
>>> hhg_test_stat = hhg.test_statistic(X, Y)

p_value(self, matrix_X=None, matrix_Y=None, replication_factor=1000)[source]¶

Tests independence between two datasets using HHG and permutation test.

Parameters

matrix_X (2D numpy.array) -- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) -- a [n*q] data matrix, a matrix with n samples in q dimensions
replication_factor (int) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

p_value_

P-value
p_value_metadata_

(optional) a dict of metadata other than the p_value, that the independence tests computes in the process

Return type

float, dict

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.hhg import HHG

>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
              0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
              1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> hhg = HHG()
>>> hhg_p_value = hhg.p_value(X, Y)

get_name(self)¶

Returns: the name of the independence test
Return type: string

p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)¶

Tests independence between two datasets using block permutation test.

Parameters

matrix_X (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*q] data matrix, a matrix with n samples in q dimensions
replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

p_value

P-value of MGC
metadata

a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)

Kendall and Spearman¶

class mgcpy.independence_tests.kendall_spearman.KendallSpearman(compute_distance_matrix=None, which_test='kendall')[source]¶

Parameters

compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix
which_test (str) -- specifies which test to use, including 'kendall' or 'spearman'

Methods

`get_name`(self)	return the name of the independence test
`p_value`(self, matrix_X, matrix_Y[, ...])	Tests independence between two datasets using the independence test.
`p_value_block`(self, matrix_X, matrix_Y[, ...])	Tests independence between two datasets using block permutation test.
`test_statistic`(self, matrix_X, matrix_Y)	Computes the Spearman's rho or Kendall's tau measure between two datasets.

test_statistic(self, matrix_X, matrix_Y)[source]¶

Computes the Spearman's rho or Kendall's tau measure between two datasets. - Implments scipy.stats's implementation for both

Parameters

matrix_X (1D numpy.array) -- a [n*1] data matrix, a matrix with n samples in 1 dimension
matrix_Y (1D numpy.array) -- a [n*1] data matrix, a matrix with n samples in 1 dimension

Returns

returns a list of two items, that contains:

test_stat_

test statistic
test_statistic_metadata_

(optional) a dict of metadata other than the p_value, that the independence tests computes in the process

Return type

float, dict

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.kendall_spearman import KendallSpearman

>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
              0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
              1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> kendall_spearman = KendallSpearman()
>>> kendall_spearman_stat = kendall_spearman.test_statistic(X, Y)

p_value(self, matrix_X, matrix_Y, replication_factor=1000)[source]¶

Tests independence between two datasets using the independence test.

Parameters

matrix_X (2D numpy.array) -- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) -- a [n*q] data matrix, a matrix with n samples in q dimensions
replication_factor (int) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

p_value_

P-value
p_value_metadata_

(optional) a dict of metadata other than the p_value, that the independence tests computes in the process

Return type

float, dict

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.kendall_spearman import KendallSpearman

>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
              0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
              1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> kendall_spearman = KendallSpearman()
>>> kendall_spearman_p_value = kendall_spearman.p_value(X, Y)

get_name(self)¶

Returns: the name of the independence test
Return type: string

p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)¶

Tests independence between two datasets using block permutation test.

Parameters

matrix_X (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*q] data matrix, a matrix with n samples in q dimensions
replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

p_value

P-value of MGC
metadata

a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)

Multivariate Distance Matrix Regression (MDMR)¶

Main MDMR Independence Test Module

class mgcpy.independence_tests.mdmr.MDMR(compute_distance_matrix=None)[source]¶

Parameters: compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix

Methods

`get_name`(self)	return the name of the independence test
`ind_p_value`(self, matrix_X, matrix_Y[, ...])	Individual predictor variable p-values calculation
`p_value`(self, matrix_X, matrix_Y[, ...])	Tests independence between two datasets using MGC and permutation test.
`p_value_block`(self, matrix_X, matrix_Y[, ...])	Tests independence between two datasets using block permutation test.
`test_statistic`(self, matrix_X, matrix_Y[, ...])	Computes MDMR Pseudo-F statistic between two datasets.

get_name(self)[source]¶

Returns: the name of the independence test
Return type: string

test_statistic(self, matrix_X, matrix_Y, permutations=0, individual=0, disttype='cityblock')[source]¶

Computes MDMR Pseudo-F statistic between two datasets.

It first takes the distance matrix of Y (by )
Next it regresses X into a portion due to Y and a portion due to residual
The p-value is for the null hypothesis that the variable of X is not correlated with Y's distance matrix

Parameters

data_matrix_X (2D numpy.array) --
(optional, default picked from class attr) is interpreted as:
- a [n*d] data matrix, a matrix with n samples in d dimensions
data_matrix_Y (2D numpy.array) --
(optional, default picked from class attr) is interpreted as:
- a [n*d] data matrix, a matrix with n samples in d dimensions
'individual' -- -integer, 0 or 1 with value 0 tests the entire X matrix (default) with value 1 tests the entire X matrix and then each predictor variable individually

Returns

with individual = 0, returns 1 values, with individual = 1 returns 2 values, containing:

-the test statistic of the entire X matrix -for individual = 1, an array with the variable of X in the first column,

the test statistic in the second, and the permutation p-value in the third (which here will always be 1)

Return type

list

p_value(self, matrix_X, matrix_Y, replication_factor=1000)[source]¶

Tests independence between two datasets using MGC and permutation test.

Parameters

matrix_X (2D numpy.array) --
is interpreted as:
- a [n*d] data matrix, a matrix with n samples in d dimensions
matrix_Y (2D numpy.array) --
is interpreted as:
- a [n*d] data matrix, a matrix with n samples in d dimensions
replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items,that contains:

p_value

P-value of MGC
p_value_metadata

Return type

list

ind_p_value(self, matrix_X, matrix_Y, permutations=1000, individual=1, disttype='cityblock')[source]¶

Individual predictor variable p-values calculation

Parameters

matrix_X (2D numpy.array) --
is interpreted as:
- a [n*d] data matrix, a matrix with n samples in d dimensions
matrix_Y (2D numpy.array) --
is interpreted as:
- a [n*d] data matrix, a matrix with n samples in d dimensions

p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)¶

Tests independence between two datasets using block permutation test.

Parameters

matrix_X (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*q] data matrix, a matrix with n samples in q dimensions
replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

p_value

P-value of MGC
metadata

a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)

Pearson's Correlation, RV, Canonical Analysis (CCA)¶

class mgcpy.independence_tests.rv_corr.RVCorr(compute_distance_matrix=None, which_test='rv')[source]¶

Parameters

compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix
which_test (str) -- specifies which test to use, including 'rv', 'pearson', and 'cca'.

Methods

`get_name`(self)	return the name of the independence test
`p_value`(self, matrix_X, matrix_Y[, ...])	Tests independence between two datasets using the independence test.
`p_value_block`(self, matrix_X, matrix_Y[, ...])	Tests independence between two datasets using block permutation test.
`test_statistic`(self[, matrix_X, matrix_Y])	Computes the Pearson/RV/CCa correlation measure between two datasets.

test_statistic(self, matrix_X=None, matrix_Y=None)[source]¶

Computes the Pearson/RV/CCa correlation measure between two datasets.

Default computes linear correlation for RV
Computes pearson's correlation
Calculates local linear correlations for CCa

Parameters

matrix_X (2D numpy.array) -- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) -- a [n*q] data matrix, a matrix with n samples in q dimensions
replication_factor (int) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

test_statistic_

test statistic
test_statistic_metadata_

(optional) a dict of metadata other than the p_value, that the independence tests computes in the process

Return type

float, dict

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.rv_corr import RVCorr

>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
              0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
              1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> rvcorr = RVCorr()
>>> rvcorr_test_stat = rvcorr.test_statistic(X, Y)

p_value(self, matrix_X, matrix_Y, replication_factor=1000)[source]¶

Tests independence between two datasets using the independence test.

Parameters

matrix_X (2D numpy.array) -- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) -- a [n*q] data matrix, a matrix with n samples in q dimensions
replication_factor (int) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

p_value_

P-value
p_value_metadata_

(optional) a dict of metadata other than the p_value, that the independence tests computes in the process

Return type

float, dict

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.rv_corr import RVCorr

>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
              0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
              1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> rvcorr = RVCorr()
>>> rvcorr_p_value = rvcorr.p_value(X, Y)

get_name(self)¶

Returns: the name of the independence test
Return type: string

p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)¶

Tests independence between two datasets using block permutation test.

Parameters

matrix_X (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) --
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*q] data matrix, a matrix with n samples in q dimensions
replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to 1000.

Returns

returns a list of two items, that contains:

p_value

P-value of MGC
metadata

a dict of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.

Return type

list

Example:

>>> import numpy as np
>>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS
>>>
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045,
...           0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1)
>>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312,
...           1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1)
>>> mgc_ts = MGC_TS()
>>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)