A population version of Spearman’s rank correlation has been defined in the case of continuous variables.
Consider a population distributed according to two variates \(X_1\) and \(X_2.\) Two members \((X_1, X_2)\) and \((X_1', X_2')\) of the population will be called concordant if:
\[X_1 < X_1', \; X_2 < X_2' \text{ OR } X_1 > X_1',\; X_2 > X_2'.\]
They will be called discordant if:
\[X_1 < X_1', \; X_2 > X_2' \text{ OR } X_1 > X_1',\; X_2 < X_2'.\] The probabilities of concordance and discordance are denoted with \(P_c,\) and \(P_d\) respectively. The population version of Spearman’s \(\rho\) is defined as proportional to the difference between the probability of concordance, and the probability of discordance for two vectors \((X_1, X_2)\) and \((X_1', X_2'),\) where \((X_1, X_2)\) has distribution \(F_{X_1X_2}\) with marginal distribution functions \(F_{X_1}\) and \(F_{X_2}\) and \(X_1\), \(X_2\) are independent, and \((X_1, X_2)\) and \((X_1', X_2')\) are independent.
\[\rho = 3 \left(\Pr[(X_1 − X_1')(X_2 − X_2') > 0] − \Pr[(X_1 − X_1')(X_2 − X_2') < 0]\right)\] The above definition is valid only for populations for which the probabilities of \(X_1 = X_1'\) or \(X_2 = X_2'\) are zero. The main types of such populations are an infinite population with both \(X_1\) and \(X_2\) distributed continuously, or a finite population where \(X_1\) and \(X_2\) have disjoint ranges.
This seems virtually identical to the concept of population Kendall’s tau:
In trying to illustrate pictorially the intuition behind the definition of the population Kendall’s tau as
\[\tau (X_1, X_2) =\Pr\left[(X_1-X_1')\,(X_2 - X_2') >0 \right]- \Pr\left[(X_1-X_1')\,(X_2 - X_2') <0 \right]\tag 1\]
and the relationship both 1. Between the iid random vector \((X_1,X_2)\) and its copy \((X_1', X_2')\); as well as 2. Between these random vectors (2-tuples of dependent random variables) and their joint and marginal densities and distributions (pdf and cdf’s), we can take a look at a somewhat illustrative value of \(X_1\) in a bivariate normal distribution with a covariance \(\mathrm{cov}(X_1,X_2)=0.2.\)
In the example, \(X_1\sim N(0,0.25)\) and \(X_2 \sim N(0,0.25).\) The Pearson correlation is, therefore,
\[\rho(X_1,X_2)=\frac{\mathrm{cov}(X_1,X_2)}{\sigma_{X_1}\,\sigma_{X_2}}=\frac{0.2}{\sqrt{0.25\times0.25}}=0.8.\]
If we consider an illustrative draw from \(X_1\) towards the left of the distribution, e.g. \(x_1=-1,\) the conditional mean of \(X_2\) given this value of \(X_1\) will be given by
\[\begin{align} \mathbb E(X_2|X_1=x)=\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(x-\mu_X)= -0.8 \end{align}\]
and these conditional mean values will increase from left to right linearly as on the following plot:
and assuming constant variance, its value will be \(\mathrm{var}(X_2\vert X_1=x)=\sigma^2_{X_2}\,(1-\rho^2)=0.09\)
Now, looking at the first part of equation (1), \((X_1-X_1')\) will be negative whenever the independent draw from the identical rv \(X_1'\) is less negative than \(x_1,\) which is highly probable: \(\Pr(x_2 > -1) = 0.977.\)
Looking at second term of the multiplication, i.e. \((X_2-X_2'),\) we know that \(\mathbb E[X_2\vert X_1=-1]\) is below the mean of the marginal distribution of \(X_2,\) which was designed to be \(\mu_{X_2}=0.\)
Therefore, precisely because it is highly likely that \(x_1' > x_1,\) rendering \((X_1-X_1')<0,\) it is also going to be more likely for \(x_2'>x_2,\) since it will more probably come from a conditional normal distribution with \(\mathbb E[X_2'\vert X_1'=x_1' ] > \mathbb E[X_2\vert X_1=-1 ],\) in which case $(X_2-X_2’) $ will also be negative, rendering \((X_1-X_1')\,(X_2 - X_2') >0.\)
The same argument (inverted) holds if we were to look at a value of \(x_1 =+1.\)
Therefore, the positive correlation imposed on this bivariate normal, would indeed result in a positive Kendall \(\tau\) if we swept (integrate) from \(-\infty\) to \(+\infty,\) to the extent that \(\Pr\left[(X_1-X_1')\,(X_2 - X_2') >0 \right]> \Pr\left[(X_1-X_1')\,(X_2 - X_2') <0 \right].\)
The inverse would be easy to show if the correlation decided upon had been negative.
The objective is to look at dependency structure in a vector of random variables, knowing their marginal distributions.
We want to convert the random variables and convert them to \(U(0,1).\)
It is well known that a real-valued, continuous, and strictly monotone function of a single variable possesses an inverse on its range. It is also known that one can drop the assumptions of continuity and strict monotonicity (even the assumption of considering points in the range) to obtain the notion of a generalized inverse. One can often work with generalized inverses as one does with ordinary inverses.
For an increasing function \(T : \mathbb R \to \mathbb R\) with \(T(−\infty) = \lim_{x \downarrow −\infty} T(x)\) and \(T(\infty) =\lim_{x \uparrow \infty} T(x),\) the generalized inverse
\(T^- : \mathbb R \to \bar{\mathbb R} = [−\infty, \infty]\)
of \(T\) is defined by
\[T^-(y)=\inf\{x\in \mathbb R: T(x) \geq y\}, y\in \mathbb R,\]
with the convention that \(\inf \emptyset =\infty.\) If \(T:\mathbb R\to [0,1]\) is a distribution function, \(T^-:[0,1]\to \bar {\mathbb R}\) is also called a quantile function of \(T.\)
In this context \(T\) is the cdf \(F.\) We have that \(F(F^-(y))=y.\) And \(X \sim F^-(u).\)
[From here on, instead of \(y\) we will use \(u\)]
So if we have a vector of random variables, \(\{X_1, \dots,X_n\}\) and we think of them as the variable \(u,\) we have that \(X_i = F^-_i(u)\) and \(F(X_i) = U(0,1).\) So each variable will be transformed as \(\{F_1,\dots,F_n\}.\)
We can say that \(F_i(X_i) \sim U_i\) and \(F_i^- (U_i) \sim X_i.\)
We can look at the vector \(\vec u =(F_1(X_1), \dots, F_n(X_n)),\) and call the joint distribution of vector \(\vec u\) the copula:
\[\Pr(U_1 \leq u_1, \dots, U_n \leq u_n).\]
So the copula, \(\color{red}C\) is a joint distribution function on \([0,1]^d\) with marginal distributions standard uniform:
For example, if we assume independence between \(X_i\) variables,
\[C(\vec u) = \Pr(U_1 \leq u_1, \dots, U_d \leq u_d) = \Pr(F_1(X_1)\leq u_1, \dots, F_d(X_d) \leq u_d)=\Pi_{i=1}^d (F_i(X_i)\leq u_i)= \Pi_{i=1}^d u_i.\]
This is the independence copula.
In the case of a bivariate independent copula (here), \(C(u,v)=uv,\) the plot will be
LOoking at the surface plot from the side, it is easy to see that the conditions of a copula
Sklar’s theorem:
Given a joint distribution function \(F\) with marginal distribution functions \(F_1,\dots, F_d,\) there exists a function \(C\) (copula), such that \(F(X_1,\dots, X_d)=C(F_1(x_1),\dots,F_d(x_d)),\) and that for \(F\) continuous, \(C\) is uniquely given by \(C(u_1,\dots,u_d)= F(F_1^-(u_1),\dots,F_d^-(u_d)).\)
Linear correlation does not translate well into the copula space, but rank correlation does - Spearman or Kindall;
\[\begin{align} \tau(X_1, X_2) &= 4 \,\mathbb E\left[ C(U_1, U_2) \right]-1\\[2ex] &=4\int C(u,v)dC(u,v)-1 \end{align}\]
\[\begin{align} \rho (X_1, X_2) &=12 \, \mathbb E [U_1, U_2] -3 \\[2ex] &=12\int\int uv\,dC(u,v)-3\\[2ex] &=r(U_1, U_2) \end{align}\]
The coefficient of upper tail dependence tells us how high values of one variable relate to high values of the other:
\[\lambda_U(X_1, X_2) \lim_{U\to 1}\frac{1 + C(U, U) -2U}{1-U}\]
Random variable \(X_1, \dots, X_d\) are comonotone if their distribution can be written as
\[(X_1,X_2)\overset{D}=\alpha_1(Z), \dots, \alpha_d(Z))\]
for the same stochastic variable \(Z\) and strictly increasing functions \(\alpha_i.\)
Only defined for two variables, whose distribution can be expressed as
\[(X_1, X_2) \overset{D}=\alpha(Z), \beta(Z)\] with alpha increasing and beta decreasing.
It was blamed for the financial meltdown of 2008, because it was used to calculate the risk of default of financial instruments.
If \(\vec X = X_1, X_2, \dots\) are normally distributed \(\mathcal N(0,\Sigma)\) with \(\Sigma\) being the dependence matrix with \(1\)’s in the diagonal,
\[C_\Sigma^\mathrm{Ga}(\vec u)=\Phi_\Sigma \left(\Phi^{-1}(u_1), \dots, \Phi^{-1}(u_d) \right)\] with \(\Phi_\Sigma\) being the joint distribution function; and \(\Phi^{-1}\) the inverse for the standard normal.
We can take \(\vec Y\) as the linear transformation \(\vec Y = \vec \mu + \beta \vec X\) with \(\beta\) corresponding to a diagonal matrix (0’s off diagonal) with \(\sigma_1,\dots,\sigma_d\) in the diagonal. Then,
\[Y\sim \mathcal N(\vec \mu, \beta^\top \Sigma \beta)\]
which has the same copula.
In the bivariate case, if \(X\) and \(Y\) are marginally distributed as \(\mathcal N(0,1),\) and have a joint distribution determined by the covariance matrix \(\begin{bmatrix}1 &\rho\\\rho&1 \end{bmatrix}\) with \(\rho\) being the correlation. To build up such joint distribution we can resort to \(Z\sim \mathcal N(1,0)\) with \(\mathrm{cor}(X,Z)=0.\) And we define \[Y= \rho X+\sqrt{1-\rho^2} Z.\] Because \(X\) and \(Z\) have expected value of zero, so will \(Y.\) As for the variance of \(Y,\)
\[\begin{align} \sigma^2(Y)&=\sigma^2\left( \rho X+\sqrt{1-\rho^2} Z \right)\\[2ex] &= \rho^2\mathrm{var}(X)+\left(\sqrt{1-\rho^2} \right)^2\mathrm{var}(Z)+2\rho\sqrt{1-\rho^2}\mathrm{cov}(X,Z)\\[2ex] &= \rho^2 + 1 -\rho^2=1 \end{align}\]
because \(\mathrm{cov}(X,Z)=0.\)
So, since \(Y\) is a linear combinations of two mutually independent normal random variables, it will be Gaussian with mean 0, variance 1, and correlated to \(X\) through \(\rho.\)
This is reflected in this answer in CV.SE in the function
correlatedValue = function(x, r){
r2 = r**2
ve = 1-r2
SD = sqrt(ve)
e = rnorm(length(x), mean=0, sd=SD)
y = r*x + e
return(y)
}
where \(r=\rho\) and \(Y= \rho X+ \sqrt{1-\rho^2}Z\)
A copula that can be expressed as
\[C(u,v) = \varphi^{[-1]}\left( \varphi(u) + \varphi(v) \right)\]
where \(\varphi\) is a strictly decreasing, convex continuous function, called the generator
and \(\varphi^{[-1]}\) is the pseudo-inverse of it.
\(\varphi^{[-1]}(t)\) is \(\varphi^{-1}\) (i.e. the actual inverse function) if \(0\leq t \leq \varphi(0),\) and \(\varphi^{[-1]}(t)=0\) if \(\varphi(0)\leq t < \infty.\)
Example
Let \(\varphi_\theta(t) = (1-t)^\theta\) for \(1\leq \theta < \infty\)
Now \(\varphi_\theta(0) =1\) and
\(\varphi^{[-1]}(t)=(1-t)^{-\theta}\) at \(0\leq t \leq 1\) and zero otherwise.
The copula through this generator is
\[\begin{align} C(u,v) &= \varphi^{[-1]}(\varphi(u)+\varphi(v))\\[2ex] &= \varphi^{[-1]}\left( (1-u)^\theta + (1-v)^\theta \right)\\[2ex] &=\left(1 - \left [(1-u)^\theta + (1-v)^\theta \right] \right)^{-\theta} \end{align}\]
From Wikipedia:
So, for instance, in the case of the Cayton’s copula, the generator is
\[\varphi(u) =\frac{u^{-\theta}-1}{\theta}\] which can be proven to be strictly decreasing and convex (second derivative).
The copula, will be given by \(C(u_1, u_2) = \varphi^{-1}\left(\varphi(u_1) + \varphi(u_2) \right).\) Calculating
\[\varphi^{-1}(t) = \left(t\,\theta + 1\right)^{-1/\theta}\] with \(\theta > 0.\) Now we can go back to the general AC formula and substitute
\[C(u_1, u_2) = \left( u_1^{-\theta} + u_2^{-\theta} -1 \right)^{-1/\theta}\]
In Archimedean copulas Kendall’s tau can be reduced to
\[\tau = 1 + 4 \int_0^1 \frac{\varphi(u)}{\varphi'(u)}du\]
and the sample estimate of Kendall’s tau
\[\hat \tau = \Pr\left[(x_i- x_j)(y_i - y_j) >0 \right ]- \Pr\left[ (x_i - x_j)(y_i -y_j) <0\right]= \frac{c}{\binom{n}{2}}-\frac{d}{\binom{n}{2}}\]
Using \(\tau = 1 + 4 \int_0^1 \frac{\varphi(u)}{\varphi(u)}du\), get the generator. For instance tau is
\[\tau = \frac{\theta}{\theta+2}\]
for Clayton’s copula.
\[\tau = 1 - \frac{4}{\theta}\left[ \frac{1}{x}\int_0^\theta \frac{t}{\exp(t)-1}dt (-\theta)-1 \right]\]
for Frank’s copula.
\[\tau = \left(\frac{3\theta-2}{\theta} \right)-2/3 \left(1 -\frac{1}{\theta} \right)^2 \ln(1-\theta)\]
for Ali-Mikhail-Haq copula. And
\[\tau = \frac{\theta -1}{\theta}\]
for Gumble-Hougard.
Simulation with a particular copula to generate a dependent pair \(U\) and \(V:\)
\(U = [\sim U(0,1)]\); \(T=[\sim U(0,1)]\); \(S= \varphi'(U)/T\); \(W=\varphi^{'(-1)}(S)\); and \(V=\varphi^{[-1]}(\varphi(W) - \varphi(U))\)
NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.