MGC-X and DCorr-X: Independence Testing for Time Series¶
In this tutorial, we explore
- The theory behind the Cross Distance Correlation (DCorr-X) and Cross Multiscale Graph Correlation (MGC-X) tests 
- The unique methodological features such as optimal scale and optimal lag 
- The features of the implementation 
Theory¶
Notation¶
Let \(\mathbb{N}\) be the non-negative integers \(\{0, 1, 2, ...\}\), and \(\mathbb{R}\) be the real line \((-\infty, \infty)\). Let \(F_X\), \(F_Y\), and \(F_{X,Y}\) represent the marginal and joint distributions of random variables \(X\) and \(Y\), whose realizations exist in \(\mathcal{X}\) and \(\mathcal{Y}\), respectively. Similarly, Let \(F_{X_t}\), \(F_{Y_s}\), and \(F_{(X_t,Y_s)}\) represent the marginal and joint distributions of the time-indexed random variables \(X_t\) and \(Y_s\) at timesteps \(t\) and \(s\). For this work, assume \(\mathcal{X} = \mathbb{R}^p\) and \(\mathcal{Y} = \mathbb{R}^q\) for \(p, q > 0\). Finally, let \(\{(X_t,Y_t)\}_{t=-\infty}^{\infty}\) represent the full, jointly-sampled time series, structured as a countably long list of observations \((X_t, Y_t)\).
Problem Statement¶
The test addresses the problem of independence testing for time series. To formalize the problem, consider a strictly stationary time series \(\{(X_t,Y_t)\}_{t=-\infty}^{\infty}\), with the observed sample \(\{(X_1,Y_1),...,(X_n, Y_n)\}\). Choose some \(M \in \mathbb{N}\), the maximum_lag hyperparameter. We test the independence of two series via the following hypothesis.
The null hypothesis implies that for any \((M+1)\)-length stretch in the time series, \(X_t\) is pairwise independent of present and past values \(Y_{t-j}\) spaced \(j\) timesteps away (including \(j=0\)). A corresponding test for whether \(Y_t\) is dependent on past values of \(X_t\) is available by swapping the labels of each time series. Finally, the hyperparameter \(M\) governs the maximum number of timesteps in the past for which we check the influence of \(Y_{t-j}\) on \(X_t\). This \(M\) can be chosen for computation considerations, as well as for specific subject matter purposes, e.g. a signal from one region of the brain might only influence be able to influence another within 20 time steps implies \(M = 20\).
The Test Statistic¶
Define the cross-distance correlation at lag \(j\) as
Where \(\text{DCorr}(\cdot, \cdot)\) is the distance correlation function. Assuming strict stationarity of \(\{(X_t,Y_t)\}\) is important in even defining \(\text{DCorr}(j)\), as the parameter depends only on the spacing \(j\), and not the timestep \(t\) of \(X_t\) and \(Y_{t-j}\). Similarly, let \(\text{DCorr}n(j)\) be its estimator, with \(\text{MGC}_n(j)\) being the \(\text{MGC}\) test statistic evaluated for \(\{X_t\}\) and \(\{Y_{t-j}\}\). The \(\text{DCorr-X}^M\) test statistic is
Similarly, the \(\text{MGC-X}\) test statistic is
While \(\text{MGC-X}\) is more computationally intensive than \(\text{DCorr-X}\), \(\text{MGC-X}\) employs multiscale analysis to achieve better finite-sample power in high-dimensional, nonlinear, and structured data settings [1].
The P-Value¶
Let \(T_n\) represent either of the test statistics above. To compute the p-value, one need to estimate the null distribution of \(T_n\), namely its distribution under indepdendence pair of data. A typical permutation test would permute the indices \(\{1,2,3,...,n\}\), reorder the series \(\{Y_t\}\) according to this permutation, and \(T_n\) would be computed on \(\{X_t\}\) and the reordered \(\{Y_t\}\). This procedure would be repeated \(K\) times, generating \(K\) samples of the test statistic under the null. This permutation test requires exchangeability of the sequence \(\{Y_t\}\), which would be true in the i.i.d. case, but is generally violated in the time series case. Instead, a block permutation captures the dependence between elements of the series, as described in \cite{politis2003}. Letting \(\lceil \cdot \rceil\) be the ceiling function, this procedure partitions the list of indices into size \(b\) “blocks”, and permutes the \(\lceil \frac{n}{b} \rceil\) blocks in order to generate samples of the test statistic under the null. Specifically,
- Choose a random permutation of the indices \(\{0, 1, 2, ..., \lceil \frac{n}{b} \rceil\}\). 
- From index \(i\) in the permutation, produce block \(B_{i} = (Y_{bi+1},Y_{bi+2},...,Y_{bi + b})\), which is a section of the series \(\{Y_t\}\). 
- Let the series \(\{Y_{\pi(1)}, ..., Y_{\pi(n)}\} = (B_1, B_2, ..., B_{\frac{n}{b}})\), where \(\pi\) maps indices \(\{1,2,...,n\}\) to the new, block permuted indices. 
- Compute \(T^{(r)}_n\) on the series \(\{(X_t, Y_{\pi(t)})\}_{t=1}^n\) for replicate \(r\). 
Repeat this procedure \(K\) times (typically \(K = 100\) or \(1000\)), and let \(T^{(0)}_n = T_n\), with:
where \(\mathbb{I}\{\cdot\}\) is the indicator function.
Using DCorr-X and MGC-X¶
[1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import random
from scipy.stats import pearsonr
from mgcpy.independence_tests.dcorrx import DCorrX
from mgcpy.independence_tests.mgcx import MGCX
Simulate time series¶
Let \(\epsilon_t\) and \(\eta_t\) be i.i.d. standard normally distributed.
Independent AR(1):
[2]:
def indep_ar1(n, phi = 0.5, sigma2 = 1.0):
    # X_t and Y_t are univarite AR(1) with phi = 0.5 for both series.
    # Noise follows N(0, sigma2).
    # Innovations.
    epsilons = np.random.normal(0.0, sigma2, n)
    etas = np.random.normal(0.0, sigma2, n)
    X = np.zeros(n)
    Y = np.zeros(n)
    X[0] = epsilons[0]
    Y[0] = etas[0]
    # AR(1) process.
    for t in range(1,n):
        X[t] = phi*X[t-1] + epsilons[t]
        Y[t] = phi*Y[t-1] + etas[t]
    return X, Y
Crosscorrelated AR(1):
[3]:
def cross_corr_ar1(n, phi = 0.5, sigma2 = 1.0):
    # X_t and Y_t are together a bivarite AR(1) with Phi = [0 0.5; 0.5 0].
    # Innovations follow N(0, sigma2).
    # Innovations.
    epsilons = np.random.normal(0.0, sigma2, n)
    etas = np.random.normal(0.0, sigma2, n)
    X = np.zeros(n)
    Y = np.zeros(n)
    X[0] = epsilons[0]
    Y[0] = etas[0]
    for t in range(1,n):
        X[t] = phi*Y[t-1] + epsilons[t]
        Y[t] = phi*X[t-1] + etas[t]
    return X, Y
Nonlinearly related at lag 1:
[4]:
def nonlinear_lag1(n, phi = 1, sigma2 = 1):
    # X_t and Y_t are together a bivarite nonlinear process.
    # Innovations follow N(0, sigma2).
    # Innovations.
    epsilons = np.random.normal(0.0, sigma2, n)
    etas = np.random.normal(0.0, sigma2, n)
    X = np.zeros(n)
    Y = np.zeros(n)
    Y[0] = etas[0]
    for t in range(1,n):
        X[t] = phi*epsilons[t]*Y[t-1]
        Y[t] = etas[t]
    return X, Y
Plot time series¶
[5]:
def plot_ts(X, Y, title, xlab = "X_t", ylab = "Y_t"):
    n = X.shape[0]
    t = range(1, n + 1)
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15,7.5))
    fig.suptitle(title)
    plt.rcParams.update({'font.size': 15})
    ax1.plot(t, X)
    ax1.plot(t, Y)
    ax1.legend(['X_t', 'Y_t'], loc = 'upper left', prop={'size': 12})
    ax1.set_xlabel("t")
    ax2.scatter(X,Y, color="black")
    ax2.set_ylabel(ylab)
    ax2.set_xlabel(xlab)
Explore with DCorr-X and MGC-X.¶
[6]:
def compute_dcorrx(X, Y, max_lag):
    dcorrx = DCorrX(max_lag = max_lag, which_test = 'unbiased')
    dcorrx_statistic, metadata = dcorrx.test_statistic(X, Y)
    p_value, _ = dcorrx.p_value(X, Y)
    optimal_lag = metadata['optimal_lag']
    print("DCorrX test statistic:", dcorrx_statistic)
    print("P Value:", p_value)
    print("Optimal Lag:", optimal_lag)
def compute_mgcx(X, Y, max_lag):
    mgcx = MGCX(max_lag = max_lag)
    mgcx_statistic, metadata = mgcx.test_statistic(X, Y)
    p_value, _ = mgcx.p_value(X, Y)
    optimal_lag = metadata['optimal_lag']
    optimal_scale = metadata['optimal_scale']
    print("MGCX test statistic:", mgcx_statistic)
    print("P Value:", p_value)
    print("Optimal Lag:", optimal_lag)
    print("Optimal Scale:", optimal_scale)
[7]:
n = 40
max_lag = 0
X, Y = indep_ar1(n)
plot_ts(X, Y, "Independent AR(1)")
compute_dcorrx(X, Y, max_lag)
compute_mgcx(X, Y, max_lag)
DCorrX test statistic: 0.0
P Value: 0.418
Optimal Lag: 0
MGCX test statistic: 0.0
P Value: 0.424
Optimal Lag: 0
Optimal Scale: [40, 40]
 
In the crosscorrelated time series, the linear dependence will not be apparent at lag 0, but will be at lag 1.
[8]:
n = 40
max_lag = 0
X, Y = cross_corr_ar1(n)
plot_ts(X, Y, "Crosscorrelated AR(1)")
compute_dcorrx(X, Y, max_lag)
compute_mgcx(X, Y, max_lag)
DCorrX test statistic: 0.14380139143970183
P Value: 0.028
Optimal Lag: 0
MGCX test statistic: 0.14550875825340037
P Value: 0.069
Optimal Lag: 0
Optimal Scale: [40, 40]
 
[9]:
max_lag = 1
X, Y = cross_corr_ar1(n)
plot_ts(X[1:n], Y[0:(n-1)], "Crosscorrelated AR(1) - Y_t lagged by 1", ylab = "Y_{t-1}")
compute_dcorrx(X, Y, max_lag)
compute_mgcx(X, Y, max_lag)
DCorrX test statistic: 0.35649077356316006
P Value: 0.003
Optimal Lag: 1
MGCX test statistic: 0.35511264701472667
P Value: 0.012
Optimal Lag: 1
Optimal Scale: [39, 39]
 
The final example is a nonlinearly related series, for which the Pearson’s correlation may be insufficient.
[10]:
X, Y = nonlinear_lag1(n)
print("Pearson's Correlation at lag 0: " + str(pearsonr(X,Y)[0]))
print("Pearson's Correlation at lag 1: " + str(pearsonr(X[1:n],Y[0:(n-1)])[0]))
Pearson's Correlation at lag 0: -0.16740183702829184
Pearson's Correlation at lag 1: 0.2840269513248397
[11]:
plot_ts(X, Y, "Nonlinearly related at lag 1")
compute_dcorrx(X, Y, max_lag)
compute_mgcx(X, Y, max_lag)
DCorrX test statistic: 0.07967130243446058
P Value: 0.159
Optimal Lag: 1
MGCX test statistic: 0.35820807204909766
P Value: 0.008
Optimal Lag: 0
Optimal Scale: [35, 13]
 
Understanding the Optimal Lag¶
The optimal lag allows the user to understand better the temporal nature of the relationship between \(X_t\) and \(Y_t\). The polt below shows the empirical distribution of the optimal lag estimate for \(\text{MGC-X}\) as \(n\) increases.
[12]:
# Plot the distribution of optimal lag estimates.
def opt_lag_dist(optimal_lags_dcorrx, optimal_lags_mgcx, n, M = 10):
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15,7.5), sharey = True)
    plt.rcParams.update({'font.size': 15})
    ax1.set_xlabel('Lag j')
    ax1.set_title("DCorr-X, n = %d" % n)
    ax2.set_xlabel('Lag j')
    ax2.set_title("MGC-X, n = %d" % n)
    ax1.set_ylabel("Freq. of Optimal Lag Estimates")
    # Optimal lag predictions.
    weights = np.ones_like(optimal_lags_dcorrx)/float(len(optimal_lags_dcorrx))
    ax1.hist(optimal_lags_dcorrx,
             bins = np.arange(M)-0.5,
             weights = weights,
             align = 'mid',
             edgecolor ='black',
             color = 'blue')
    weights = np.ones_like(optimal_lags_mgcx)/float(len(optimal_lags_mgcx))
    ax2.hist(optimal_lags_mgcx,
             bins = np.arange(M)-0.5,
             weights = weights,
             align = 'mid',
             edgecolor ='black',
             color = 'red')
    plt.show()
We simulate a nonlinear process that has clear dependence at lag 3.
[13]:
def nonlinear_lag3(n, phi = 1, sigma2 = 1):
    # X_t and Y_t are together a bivarite nonlinear process.
    # Innovations follow N(0, sigma2).
    # Innovations.
    epsilons = np.random.normal(0.0, sigma2, n)
    etas = np.random.normal(0.0, sigma2, n)
    X = np.zeros(n)
    Y = np.zeros(n)
    for t in range(3):
        Y[t] = etas[t]
    for t in range(3,n):
        X[t] = phi*epsilons[t]*Y[t-3]
        Y[t] = etas[t]
    return X, Y
[14]:
M = 10
num_sims = 100
dcorrx = DCorrX(max_lag = M)
mgcx = MGCX(max_lag = M)
optimal_lags_dcorrx = np.zeros(num_sims)
optimal_lags_mgcx = np.zeros(num_sims)
# Run experiments.
for n in [15, 30, 60]:
    for t in range(num_sims):
        X, Y = nonlinear_lag3(n)
        test_statistic, metadata = dcorrx.test_statistic(X, Y)
        optimal_lags_dcorrx[t] = metadata['optimal_lag']
        test_statistic, metadata = mgcx.test_statistic(X, Y)
        optimal_lags_mgcx[t] = metadata['optimal_lag']
    opt_lag_dist(optimal_lags_dcorrx, optimal_lags_mgcx, n)
 
 
 
DCorrX and MGCX both close in on the correct lag as n increases, with MGCX having higher accuracy due to advantages in nonlinear settings.