Independence Tests¶
Multiscale Graph Correlation (MGC)¶
Main MGC Independence Test Module
- 
class mgcpy.independence_tests.mgc.MGC(compute_distance_matrix=None, base_global_correlation='mgc')[source]¶
- Parameters
- compute_distance_matrix ( - FunctionTypeor- callable()) -- a function to compute the pairwise distance matrix, given a data matrix
- base_global_correlation (string) -- specifies which global correlation to build up-on, including 'mgc','dcor','mantel', and 'rank'. Defaults to mgc. 
 
 - Methods - get_name(self)- return
- the name of the independence test 
 - p_value(self, matrix_X, matrix_Y[, ...])- Tests independence between two datasets using MGC and permutation test. - p_value_block(self, matrix_X, matrix_Y[, ...])- Tests independence between two datasets using block permutation test. - test_statistic(self, matrix_X, matrix_Y[, ...])- Computes the MGC measure between two datasets. - 
test_statistic(self, matrix_X, matrix_Y, is_fast=False, fast_mgc_data={})[source]¶
- Computes the MGC measure between two datasets. - It first computes all the local correlations 
- Then, it returns the maximal statistic among all local correlations based on thresholding. 
 - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*p]data matrix, a matrix with- nsamples in- pdimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*q]data matrix, a matrix with- nsamples in- qdimensions
 
- is_fast (boolean) -- is a boolean flag which specifies if the test_statistic should be computed (approximated) using the fast version of mgc. This defaults to False. 
- fast_mgc_data (dictonary) -- - a - dictof fast mgc params, refer: self._fast_mgc_test_statistic- sub_samples
- specifies the number of subsamples. 
 
 
 
- Returns
- returns a list of two items, that contains: - test_statistic
- the sample MGC statistic within [-1, 1] 
 
- independence_test_metadata
- a - dictof metadata with the following keys: - :local_correlation_matrix: a 2D matrix of all local correlations within- [-1,1]- :optimal_scale: the estimated optimal scale as an- [x, y]pair.
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc import MGC >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc = MGC() >>> mgc_statistic, test_statistic_metadata = mgc.test_statistic(X, Y) 
 - 
p_value(self, matrix_X, matrix_Y, replication_factor=1000, is_fast=False, fast_mgc_data={})[source]¶
- Tests independence between two datasets using MGC and permutation test. - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*p]data matrix, a matrix with- nsamples in- pdimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*q]data matrix, a matrix with- nsamples in- qdimensions
 
- replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to - 1000.
- is_fast (boolean) -- is a boolean flag which specifies if the p_value should be computed (approximated) using the fast version of mgc. This defaults to False. 
- fast_mgc_data (dictonary) -- - a - dictof fast mgc params, , refer: self._fast_mgc_p_value- sub_samples
- specifies the number of subsamples. 
 
 
 
- Returns
- returns a list of two items, that contains: - p_value
- P-value of MGC 
 
- metadata
- a - dictof metadata with the following keys:- test_statistic
- the sample MGC statistic within - [-1, 1]
 
- p_local_correlation_matrix
- a 2D matrix of the P-values of the local correlations 
 
- local_correlation_matrix
- a 2D matrix of all local correlations within - [-1,1]
 
- optimal_scale
- the estimated optimal scale as an - [x, y]pair.
 
 
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc import MGC >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc = MGC() >>> p_value, metadata = mgc.p_value(X, Y, replication_factor = 100) 
 - 
get_name(self)¶
- Returns
- the name of the independence test 
- Return type
- string 
 
 - 
p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)¶
- Tests independence between two datasets using block permutation test. - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*p]data matrix, a matrix with- nsamples in- pdimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*q]data matrix, a matrix with- nsamples in- qdimensions
 
- replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to - 1000.
 
- Returns
- returns a list of two items, that contains: - p_value
- P-value of MGC 
 
- metadata
- a - dictof metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc_ts = MGC_TS() >>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100) 
 
MGC Time Series¶
- 
class mgcpy.independence_tests.mgcx.MGCX(compute_distance_matrix=None, max_lag=0)[source]¶
- Parameters
- compute_distance_matrix ( - FunctionTypeor- callable()) -- a function to compute the pairwise distance matrix, given a data matrix
- base_global_correlation (string) -- specifies which global correlation to build up-on, including 'mgc','dcor','mantel', and 'rank'. Defaults to mgc. 
- max_lag (int) -- Furthest lag to check for dependence. 
 
 - Methods - get_name(self)- return
- the name of the independence test 
 - p_value(self, matrix_X, matrix_Y[, ...])- Tests independence between two datasets using MGC_TS and block permutation test. - p_value_block(self, matrix_X, matrix_Y[, ...])- Tests independence between two datasets using block permutation test. - test_statistic(self, matrix_X, matrix_Y[, p])- Computes the MGCX measure between two time series datasets. - 
test_statistic(self, matrix_X, matrix_Y, p=None)[source]¶
- Computes the MGCX measure between two time series datasets. - It first computes all the local correlations 
- Then, it returns the maximal statistic among all local correlations based on thresholding. 
 - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*p]data matrix, a matrix with- nsamples in- pdimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*q]data matrix, a matrix with- nsamples in- qdimensions
 
- p (float) -- bandwidth parameter for Bartlett Kernel. 
 
- Returns
- returns a list of two items, that contains: - test_statistic
- the sample mgc_ts statistic (not necessarily within [-1,1]) 
 
- test_statistic_metadata
- a - dictof metadata with the following keys: - :dist_mtx_X: the distance matrix of sample X - :dist_mtx_Y: the distance matrix of sample X
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc import MGC >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc_ts = MGC_TS() >>> mgc_ts_statistic, test_statistic_metadata = mgc.test_statistic(X, Y) 
 - 
p_value(self, matrix_X, matrix_Y, replication_factor=1000)[source]¶
- Tests independence between two datasets using MGC_TS and block permutation test. - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*p]data matrix, a matrix with- nsamples in- pdimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*q]data matrix, a matrix with- nsamples in- qdimensions
 
- replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to - 1000.
 
- Returns
- returns a list of two items, that contains: - p_value
- P-value of MGC 
 
- metadata
- a - dictof metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc_ts = MGC_TS() >>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100) 
 - 
get_name(self)¶
- Returns
- the name of the independence test 
- Return type
- string 
 
 - 
p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)¶
- Tests independence between two datasets using block permutation test. - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*p]data matrix, a matrix with- nsamples in- pdimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*q]data matrix, a matrix with- nsamples in- qdimensions
 
- replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to - 1000.
 
- Returns
- returns a list of two items, that contains: - p_value
- P-value of MGC 
 
- metadata
- a - dictof metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc_ts = MGC_TS() >>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100) 
 
Biased and Unbiased Distance Correlation (Dcorr) and Mantel¶
- 
class mgcpy.independence_tests.dcorr.DCorr(compute_distance_matrix=None, which_test='unbiased', is_paired=False)[source]¶
- Parameters
- compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix 
- which_test (string) -- the type of global correlation to use, can be 'unbiased', 'biased' 'mantel' 
 
 - Methods - compute_global_covariance(self, dist_mtx_X, ...)- Helper function: Compute the global covariance using distance matrix A and B - get_name(self)- return
- the name of the independence test 
 - p_value(self, matrix_X, matrix_Y[, ...])- Compute the p-value if the correlation test is unbiased, p-value can be computed using a t test otherwise computed using permutation test - p_value_block(self, matrix_X, matrix_Y[, ...])- Tests independence between two datasets using block permutation test. - test_statistic(self, matrix_X, matrix_Y[, ...])- Computes the distance correlation between two datasets. - unbiased_T(self, matrix_X, matrix_Y)- Helper function: Compute the t-test statistic for unbiased dcorr - 
test_statistic(self, matrix_X, matrix_Y, is_fast=False, fast_dcorr_data={})[source]¶
- Computes the distance correlation between two datasets. - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*d]data matrix, a matrix with- nsamples in- pdimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*d]data matrix, a matrix with- nsamples in- qdimensions
 
- is_fast (boolean) -- is a boolean flag which specifies if the test_statistic should be computed (approximated) using the fast version of dcorr. This defaults to False. 
- fast_dcorr_data (dictonary) -- - a - dictof fast dcorr params, refer: self._fast_dcorr_test_statistic- sub_samples
- specifies the number of subsamples. 
 
 
 
- Returns
- returns a list of two items, that contains: - test_statistic
- the sample dcorr statistic within [-1, 1] 
 
- independence_test_metadata
- a - dictof metadata with the following keys: - :variance_X: the variance of the data matrix X - :variance_Y: the variance of the data matrix Y
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.dcorr import DCorr >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> dcorr = DCorr(which_test = 'unbiased') >>> dcorr_statistic, test_statistic_metadata = dcorr.test_statistic(X, Y) 
 - 
compute_global_covariance(self, dist_mtx_X, dist_mtx_Y)[source]¶
- Helper function: Compute the global covariance using distance matrix A and B - Parameters
- dist_mtx_X (2D numpy.array) -- a [n*n] distance matrix 
- dist_mtx_Y (2D numpy.array) -- a [n*n] distance matrix 
 
- Returns
- the data covariance or variance based on the distance matrices 
- Return type
- numpy.float 
 
 - 
unbiased_T(self, matrix_X, matrix_Y)[source]¶
- Helper function: Compute the t-test statistic for unbiased dcorr - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*d]matrix, a matrix with- nsamples in- pdimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*d]matrix, a matrix with- nsamples in- qdimensions
 
 
- Returns
- test statistic of t-test for unbiased dcorr 
- Return type
- numpy.float 
 
 - 
p_value(self, matrix_X, matrix_Y, replication_factor=1000, is_fast=False, fast_dcorr_data={})[source]¶
- Compute the p-value if the correlation test is unbiased, p-value can be computed using a t test otherwise computed using permutation test - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*d]data matrix, a matrix with- nsamples in- pdimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*d]data matrix, a matrix with- nsamples in- qdimensions
 
- replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to - 1000.
- is_fast (boolean) -- is a boolean flag which specifies if the test_statistic should be computed (approximated) using the fast version of dcorr. This defaults to False. 
- fast_dcorr_data (dictonary) -- - a - dictof fast dcorr params, refer: self._fast_dcorr_test_statistic- sub_samples
- specifies the number of subsamples. 
 
 
 
- Returns
- p-value of distance correlation 
- Return type
- numpy.float 
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.dcorr import DCorr >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> dcorr = DCorr() >>> p_value, metadata = dcorr.p_value(X, Y, replication_factor = 100) 
 - 
get_name(self)¶
- Returns
- the name of the independence test 
- Return type
- string 
 
 - 
p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)¶
- Tests independence between two datasets using block permutation test. - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*p]data matrix, a matrix with- nsamples in- pdimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*q]data matrix, a matrix with- nsamples in- qdimensions
 
- replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to - 1000.
 
- Returns
- returns a list of two items, that contains: - p_value
- P-value of MGC 
 
- metadata
- a - dictof metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc_ts = MGC_TS() >>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100) 
 
Dcorr Time Series¶
- 
class mgcpy.independence_tests.dcorrx.DCorrX(compute_distance_matrix=None, which_test='unbiased', max_lag=0)[source]¶
- Parameters
- compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix 
- which_test (string) -- the type of distance covariance estimate to use, can be 'unbiased', 'biased' 'mantel' 
- max_lag (int) -- Maximum lead/lag to check for dependence between X_t and Y_t+j (M parameter) 
 
 - Methods - get_name(self)- return
- the name of the independence test 
 - p_value(self, matrix_X, matrix_Y[, ...])- Compute the p-value if the correlation test is unbiased, p-value can be computed using a t test otherwise computed using permutation test - p_value_block(self, matrix_X, matrix_Y[, ...])- Tests independence between two datasets using block permutation test. - test_statistic(self, matrix_X, matrix_Y[, p])- Computes the (summed across lags) cross distance covariance estimate between two time series. - 
test_statistic(self, matrix_X, matrix_Y, p=None)[source]¶
- Computes the (summed across lags) cross distance covariance estimate between two time series. - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*p]data matrix, a matrix with- nsamples in- pdimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*q]data matrix, a matrix with- nsamples in- qdimensions
 
- p (float) -- bandwidth parameter for Bartlett Kernel. 
 
- Returns
- returns a list of two items, that contains: - test_statistic
- the sample cdcv statistic (not necessarily within [-1,1]) 
 
- test_statistic_metadata
- a - dictof metadata with the following keys: - :dist_mtx_X: the distance matrix of sample X - :dist_mtx_Y: the distance matrix of sample X
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.dcorr import DCorr >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> cdcv = CDCV(which_test = 'unbiased') >>> cdcv_statistic = cdcv.test_statistic(X, Y) 
 - 
p_value(self, matrix_X, matrix_Y, replication_factor=1000)[source]¶
- Compute the p-value if the correlation test is unbiased, p-value can be computed using a t test otherwise computed using permutation test - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*d]data matrix, a matrix with- nsamples in- pdimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*d]data matrix, a matrix with- nsamples in- qdimensions
 
- replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to - 1000.
 
- Returns
- p-value of distance correlation 
- Return type
- numpy.float 
- Returns
- returns a list of two items, that contains: - p_value
- ta - numpy.floatcontaining the p-value of the observed test statistic.
 
- p_value_metadata
- a - dictof metadata with the following keys: - :null_distribution: the estimated (discrete) distribution of the test statistic
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.dcorr import DCorr >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> cdcv = CDCV() >>> p_value, metadata = dcorr.p_value(X, Y, replication_factor = 100) 
 - 
get_name(self)¶
- Returns
- the name of the independence test 
- Return type
- string 
 
 - 
p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)¶
- Tests independence between two datasets using block permutation test. - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*p]data matrix, a matrix with- nsamples in- pdimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*q]data matrix, a matrix with- nsamples in- qdimensions
 
- replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to - 1000.
 
- Returns
- returns a list of two items, that contains: - p_value
- P-value of MGC 
 
- metadata
- a - dictof metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc_ts = MGC_TS() >>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100) 
 
Heller Heller Gorfine (HHG)¶
- 
class mgcpy.independence_tests.hhg.HHG(compute_distance_matrix=None)[source]¶
- Parameters
- compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix 
 - Methods - get_name(self)- return
- the name of the independence test 
 - p_value(self[, matrix_X, matrix_Y, ...])- Tests independence between two datasets using HHG and permutation test. - p_value_block(self, matrix_X, matrix_Y[, ...])- Tests independence between two datasets using block permutation test. - test_statistic(self, matrix_X, matrix_Y)- Computes the HHG correlation measure between two datasets. - 
test_statistic(self, matrix_X, matrix_Y)[source]¶
- Computes the HHG correlation measure between two datasets. - Parameters
- matrix_X (2D numpy.array) -- a [n*p] data matrix, a matrix with n samples in p dimensions 
- matrix_Y (2D numpy.array) -- a [n*q] data matrix, a matrix with n samples in q dimensions 
- replication_factor (int) -- specifies the number of replications to use for the permutation test. Defaults to 1000. 
 
- Returns
- returns a list of two items, that contains: - test_statistic_
- test statistic 
 
- test_statistic_metadata_
- (optional) a - dictof metadata other than the p_value, that the independence tests computes in the process
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.hhg import HHG - >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> hhg = HHG() >>> hhg_test_stat = hhg.test_statistic(X, Y) 
 - 
p_value(self, matrix_X=None, matrix_Y=None, replication_factor=1000)[source]¶
- Tests independence between two datasets using HHG and permutation test. - Parameters
- matrix_X (2D numpy.array) -- a [n*p] data matrix, a matrix with n samples in p dimensions 
- matrix_Y (2D numpy.array) -- a [n*q] data matrix, a matrix with n samples in q dimensions 
- replication_factor (int) -- specifies the number of replications to use for the permutation test. Defaults to 1000. 
 
- Returns
- returns a list of two items, that contains: - p_value_
- P-value 
 
- p_value_metadata_
- (optional) a - dictof metadata other than the p_value, that the independence tests computes in the process
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.hhg import HHG - >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> hhg = HHG() >>> hhg_p_value = hhg.p_value(X, Y) 
 - 
get_name(self)¶
- Returns
- the name of the independence test 
- Return type
- string 
 
 - 
p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)¶
- Tests independence between two datasets using block permutation test. - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*p]data matrix, a matrix with- nsamples in- pdimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*q]data matrix, a matrix with- nsamples in- qdimensions
 
- replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to - 1000.
 
- Returns
- returns a list of two items, that contains: - p_value
- P-value of MGC 
 
- metadata
- a - dictof metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc_ts = MGC_TS() >>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100) 
 
Kendall and Spearman¶
- 
class mgcpy.independence_tests.kendall_spearman.KendallSpearman(compute_distance_matrix=None, which_test='kendall')[source]¶
- Parameters
- compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix 
- which_test (str) -- specifies which test to use, including 'kendall' or 'spearman' 
 
 - Methods - get_name(self)- return
- the name of the independence test 
 - p_value(self, matrix_X, matrix_Y[, ...])- Tests independence between two datasets using the independence test. - p_value_block(self, matrix_X, matrix_Y[, ...])- Tests independence between two datasets using block permutation test. - test_statistic(self, matrix_X, matrix_Y)- Computes the Spearman's rho or Kendall's tau measure between two datasets. - 
test_statistic(self, matrix_X, matrix_Y)[source]¶
- Computes the Spearman's rho or Kendall's tau measure between two datasets. - Implments scipy.stats's implementation for both - Parameters
- matrix_X (1D numpy.array) -- a [n*1] data matrix, a matrix with n samples in 1 dimension 
- matrix_Y (1D numpy.array) -- a [n*1] data matrix, a matrix with n samples in 1 dimension 
 
- Returns
- returns a list of two items, that contains: - test_stat_
- test statistic 
 
- test_statistic_metadata_
- (optional) a - dictof metadata other than the p_value, that the independence tests computes in the process
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.kendall_spearman import KendallSpearman - >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> kendall_spearman = KendallSpearman() >>> kendall_spearman_stat = kendall_spearman.test_statistic(X, Y) 
 - 
p_value(self, matrix_X, matrix_Y, replication_factor=1000)[source]¶
- Tests independence between two datasets using the independence test. - Parameters
- matrix_X (2D numpy.array) -- a [n*p] data matrix, a matrix with n samples in p dimensions 
- matrix_Y (2D numpy.array) -- a [n*q] data matrix, a matrix with n samples in q dimensions 
- replication_factor (int) -- specifies the number of replications to use for the permutation test. Defaults to 1000. 
 
- Returns
- returns a list of two items, that contains: - p_value_
- P-value 
 
- p_value_metadata_
- (optional) a - dictof metadata other than the p_value, that the independence tests computes in the process
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.kendall_spearman import KendallSpearman - >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> kendall_spearman = KendallSpearman() >>> kendall_spearman_p_value = kendall_spearman.p_value(X, Y) 
 - 
get_name(self)¶
- Returns
- the name of the independence test 
- Return type
- string 
 
 - 
p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)¶
- Tests independence between two datasets using block permutation test. - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*p]data matrix, a matrix with- nsamples in- pdimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*q]data matrix, a matrix with- nsamples in- qdimensions
 
- replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to - 1000.
 
- Returns
- returns a list of two items, that contains: - p_value
- P-value of MGC 
 
- metadata
- a - dictof metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc_ts = MGC_TS() >>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100) 
 
Multivariate Distance Matrix Regression (MDMR)¶
Main MDMR Independence Test Module
- 
class mgcpy.independence_tests.mdmr.MDMR(compute_distance_matrix=None)[source]¶
- Parameters
- compute_distance_matrix ( - FunctionTypeor- callable()) -- a function to compute the pairwise distance matrix, given a data matrix
 - Methods - get_name(self)- return
- the name of the independence test 
 - ind_p_value(self, matrix_X, matrix_Y[, ...])- Individual predictor variable p-values calculation - p_value(self, matrix_X, matrix_Y[, ...])- Tests independence between two datasets using MGC and permutation test. - p_value_block(self, matrix_X, matrix_Y[, ...])- Tests independence between two datasets using block permutation test. - test_statistic(self, matrix_X, matrix_Y[, ...])- Computes MDMR Pseudo-F statistic between two datasets. - 
test_statistic(self, matrix_X, matrix_Y, permutations=0, individual=0, disttype='cityblock')[source]¶
- Computes MDMR Pseudo-F statistic between two datasets. - It first takes the distance matrix of Y (by ) 
- Next it regresses X into a portion due to Y and a portion due to residual 
- The p-value is for the null hypothesis that the variable of X is not correlated with Y's distance matrix 
 - Parameters
- data_matrix_X (2D numpy.array) -- - (optional, default picked from class attr) is interpreted as: - a - [n*d]data matrix, a matrix with n samples in d dimensions
 
- data_matrix_Y (2D numpy.array) -- - (optional, default picked from class attr) is interpreted as: - a - [n*d]data matrix, a matrix with n samples in d dimensions
 
- 'individual' -- -integer, 0 or 1 with value 0 tests the entire X matrix (default) with value 1 tests the entire X matrix and then each predictor variable individually 
 
- Returns
- with individual = 0, returns 1 values, with individual = 1 returns 2 values, containing: - -the test statistic of the entire X matrix -for individual = 1, an array with the variable of X in the first column, - the test statistic in the second, and the permutation p-value in the third (which here will always be 1) 
- Return type
 
 - 
p_value(self, matrix_X, matrix_Y, replication_factor=1000)[source]¶
- Tests independence between two datasets using MGC and permutation test. - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as: - a - [n*d]data matrix, a matrix with- nsamples in- ddimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as: - a - [n*d]data matrix, a matrix with- nsamples in- ddimensions
 
- replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to - 1000.
 
- Returns
- returns a list of two items,that contains: - p_value
- P-value of MGC 
 
- p_value_metadata
 
 
- Return type
 
 - 
ind_p_value(self, matrix_X, matrix_Y, permutations=1000, individual=1, disttype='cityblock')[source]¶
- Individual predictor variable p-values calculation - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as: - a - [n*d]data matrix, a matrix with- nsamples in- ddimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as: - a - [n*d]data matrix, a matrix with- nsamples in- ddimensions
 
 
 
 - 
p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)¶
- Tests independence between two datasets using block permutation test. - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*p]data matrix, a matrix with- nsamples in- pdimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*q]data matrix, a matrix with- nsamples in- qdimensions
 
- replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to - 1000.
 
- Returns
- returns a list of two items, that contains: - p_value
- P-value of MGC 
 
- metadata
- a - dictof metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc_ts = MGC_TS() >>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100) 
 
Pearson's Correlation, RV, Canonical Analysis (CCA)¶
- 
class mgcpy.independence_tests.rv_corr.RVCorr(compute_distance_matrix=None, which_test='rv')[source]¶
- Parameters
- compute_distance_matrix (FunctionType or callable()) -- a function to compute the pairwise distance matrix, given a data matrix 
- which_test (str) -- specifies which test to use, including 'rv', 'pearson', and 'cca'. 
 
 - Methods - get_name(self)- return
- the name of the independence test 
 - p_value(self, matrix_X, matrix_Y[, ...])- Tests independence between two datasets using the independence test. - p_value_block(self, matrix_X, matrix_Y[, ...])- Tests independence between two datasets using block permutation test. - test_statistic(self[, matrix_X, matrix_Y])- Computes the Pearson/RV/CCa correlation measure between two datasets. - 
test_statistic(self, matrix_X=None, matrix_Y=None)[source]¶
- Computes the Pearson/RV/CCa correlation measure between two datasets. - Default computes linear correlation for RV 
- Computes pearson's correlation 
- Calculates local linear correlations for CCa 
 - Parameters
- matrix_X (2D numpy.array) -- a [n*p] data matrix, a matrix with n samples in p dimensions 
- matrix_Y (2D numpy.array) -- a [n*q] data matrix, a matrix with n samples in q dimensions 
- replication_factor (int) -- specifies the number of replications to use for the permutation test. Defaults to 1000. 
 
- Returns
- returns a list of two items, that contains: - test_statistic_
- test statistic 
 
- test_statistic_metadata_
- (optional) a - dictof metadata other than the p_value, that the independence tests computes in the process
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.rv_corr import RVCorr - >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> rvcorr = RVCorr() >>> rvcorr_test_stat = rvcorr.test_statistic(X, Y) 
 - 
p_value(self, matrix_X, matrix_Y, replication_factor=1000)[source]¶
- Tests independence between two datasets using the independence test. - Parameters
- matrix_X (2D numpy.array) -- a [n*p] data matrix, a matrix with n samples in p dimensions 
- matrix_Y (2D numpy.array) -- a [n*q] data matrix, a matrix with n samples in q dimensions 
- replication_factor (int) -- specifies the number of replications to use for the permutation test. Defaults to 1000. 
 
- Returns
- returns a list of two items, that contains: - p_value_
- P-value 
 
- p_value_metadata_
- (optional) a - dictof metadata other than the p_value, that the independence tests computes in the process
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.rv_corr import RVCorr - >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> rvcorr = RVCorr() >>> rvcorr_p_value = rvcorr.p_value(X, Y) 
 - 
get_name(self)¶
- Returns
- the name of the independence test 
- Return type
- string 
 
 - 
p_value_block(self, matrix_X, matrix_Y, replication_factor=1000)¶
- Tests independence between two datasets using block permutation test. - Parameters
- matrix_X (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*p]data matrix, a matrix with- nsamples in- pdimensions
 
- matrix_Y (2D numpy.array) -- - is interpreted as either: - a - [n*n]distance matrix, a square matrix with zeros on diagonal for- nsamples OR
- a - [n*q]data matrix, a matrix with- nsamples in- qdimensions
 
- replication_factor (integer) -- specifies the number of replications to use for the permutation test. Defaults to - 1000.
 
- Returns
- returns a list of two items, that contains: - p_value
- P-value of MGC 
 
- metadata
- a - dictof metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.
 
 
- Return type
 - Example: - >>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc_ts = MGC_TS() >>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)