When introducing copulas, it is commonly admitted that copulas are interesting because they allow to model the marginals and the dependence structure separately. The motivation is probably Sklar’s theorem, which says that given some marginal cumulative distribution functions (say and , in dimension 2), and a copula (denoted ), then we can generate a multivariate cumulative distribution function with marginals the one specified previously, using

But this separability might be misleading. Consider the case of a fully parametric model,

Assume that those distributions are continuous, so that we can write the likelihood using densities,

and the log-likelihood is

The first part is the log-likelihood if we consider the first marginal (only). The second part is the log-likelihood if we consider the second marginal (only). If the two components are not independent (i.e. the copula density is not equal to 1 everywhere) the third part cannot be considered as null, and so, in a general context,

where

while

In order to illustrate this point, consider a bivariate lognormal distribution (obtained by taking the exponential of a Gaussian vector)

> mu1=1 > mu2=2 > MU=c(mu1,mu2) > s1=1 > s2=sqrt(2) > r=.8 > SIGMA=matrix(c(s1^2,r*s1*s2,r*s1*s2,s2^2),2,2) > library(mnormt) > set.seed(1) > Z=exp(rmnorm(25,MU,SIGMA))

If we believe that marginals and correlations can be treated separately, we can start with marginal distributions.

> library(MASS) > (p1=fitdistr(Z[,1],"lognormal")) meanlog sdlog 1.1686652 0.9309119 (0.1861824) (0.1316508) > (p2=fitdistr(Z[,2],"lognormal")) meanlog sdlog 2.2181721 1.1684049 (0.2336810) (0.1652374)

Based on those marginal distributions, define and , and consider the maximum likelihood estimator of the copula parameter, obtained from this pseudo sample,

Numerically, we get (since we consider a Gaussian copula, which is the true copula generated here)

> library(copula) > Gcop=normalCopula(.3,dim=2) > U=cbind(plnorm(Z[,1],p1$estimate[1],p1$estimate[2]), + plnorm(Z[,2],p2$estimate[1],p2$estimate[2])) > fitCopula(Gcop,data=U,method="ml") fitCopula() estimation based on 'maximum likelihood' and a sample of size 25. Estimate Std. Error z value Pr(>|z|) rho.1 0.86530 0.03799 22.77

But clearly, we did not treat the dependence structure separately, since it was a function of marginal distributions,

If we consider a global optimization problem, then results are different. The joint density can be derived (see e.g. Mostafa & Mahmoud (1964))

> dbivlognorm=function(x,theta){ + mu1=theta[1] + mu2=theta[2] + s1=theta[3] + s2=theta[4] + r=theta[5] + a1=(log(x[,1])-mu1)/s1 + a2=(log(x[,2])-mu2)/s2 + d=1/(2*pi*s1*s2*sqrt(1-r^2))*1/(x[,1]*x[,2])* + exp(-(a1^2-2*r*a1*a2+a2^2)/(2*(1-r^2))) + return(d) + } > LogLik=function(theta){ + return(-sum(log(dbivlognorm(Z,theta))))} > optim(par=c(0,0,1,1,0),fn=LogLik)$par [1] 1.1655359 2.2159767 0.9237853 1.1610132 0.8645052

The difference is not huge, but still. The estimators are not identical. From a statistical point of view, we can hardly treat the marginals and the dependence structure separately.

Another point we should keep in mind is that the estimation of the copula parameter depends on the margins, not only through the parameters, but more deeply, through the choice of the marginal distributions (that might be misspecified). For instance, if we assume that margins are exponentially distributed,

> (p1=fitdistr(Z[,1],"exponential")) rate 0.22288362 (0.04457672) > (p2=fitdistr(Z[,2],"exponential")) rate 0.06543665 (0.01308733)

the estimation of the parameter of the Gaussian copula yields

> U=cbind(pexp(Z[,1],p1$estimate[1]), + pexp(Z[,2],p2$estimate[1])) > fitCopula(Gcop,data=U,method="ml") fitCopula() estimation based on 'maximum likelihood' and a sample of size 25. Estimate Std. Error z value Pr(>|z|) rho.1 0.87421 0.03617 24.17 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 The maximized loglikelihood is 15.4 Optimization converged

The problem is that since we misspecify marginal distribution, our pseudo sample is defined on the unit-interval, but there is no chance that we get uniform margins. If we generate a sample of size 500 with the code above,

> x <- U[,1]; y <- U[,2] > xhist <- hist(x, plot=FALSE) ; yhist <- hist(y, plot=FALSE) > top <- max(c(xhist$counts, yhist$counts)) > nf <- layout(matrix(c(2,0,1,3),2,2,byrow=TRUE), c(3,1), c(1,3), TRUE) > par(mar=c(3,3,1,1)) > plot(x, y, xlab="", ylab="",col="red",xlim=0:1,ylim=0:1) > par(mar=c(0,3,1,1)) > barplot(xhist$counts, axes=FALSE, ylim=c(0, top), + space=0,col="light green") > par(mar=c(3,0,1,1)) > barplot(yhist$counts, axes=FALSE, xlim=c(0, top), + space=0, horiz=TRUE,col="light blue")

If we compare with the previous case, when marginal distribution were well-specified, we can clearly see that the dependence structure depends on marginal distributions,