1. Introduction

This report investigates the dependency between the stock prices of Nvidia and AMD, two leading companies in the GPU market. Using copula techniques, we model the joint dependence of their stock returns while accounting for their individual marginal distributions. This approach captures potential tail dependencies and non-linear relationships that traditional correlation measures might overlook.

2. Data Import and Preparation

2.1 Load Packages

library(fitdistrplus)
library(copula)
library(VineCopula)
library(ggplot2)
library(summarytools)

2.2 Import data

amd_data <- read.csv("/home/dell/R/midterm/MacroTrends_Data_Download_AMD.csv", header = TRUE)
nvda_data <- read.csv("/home/dell/R/midterm/MacroTrends_Data_Download_NVDA.csv", header = TRUE)


amd_data$date <- as.Date(amd_data$date, format = "%Y-%m-%d")
nvda_data$date <- as.Date(nvda_data$date, format = "%Y-%m-%d")

start_date <- as.Date("2020-01-01")
end_date <- as.Date("2023-03-16")

amd <- amd_data[amd_data$date >= start_date & amd_data$date <= end_date, ]
nvda <- nvda_data[nvda_data$date >= start_date & nvda_data$date <= end_date, ]

2.3 Calculate Daily Log Returns

The daily log returns is used instead of raw prices because returns are stationary, allowing for meaningful statistical analysis. Log returns stabilize the variance, make the data scale-independent, and enable easier comparison between assets. Additionally, they ensure time-additivity and are more appropriate for modeling dependence and joint distributions

amd_adj <- amd$"close"
nvda_adj <- nvda$"close"

amd_returns <- diff(log(amd_adj))
nvda_returns <- diff(log(nvda_adj))

returns_data <- data.frame(
  Date = amd$date[-1],
  AMD_Returns = amd_returns,
  NVDA_Returns = nvda_returns
)

3. Marginal Distributions

3.1 Histograms

hist(amd_returns, breaks = 30, main = "AMD Daily Returns", xlab = "Log Returns")

hist(nvda_returns, breaks = 30, main = "NVIDIA Daily Returns", xlab = "Log Returns")

The histogram of daily log returns for both AMD and NVIDIA shows a roughly symmetric bell-shaped distribution centered around zero. However, compared to a Normal distribution, there are noticeably more extreme values in both tails.

3.2 Q-Q Plots

qqnorm(amd_returns, main = "Q-Q Plot for AMD Returns")
qqline(amd_returns)

qqnorm(nvda_returns, main = "Q-Q Plot for NVIDIA Returns")
qqline(nvda_returns)

The points in both the lower-left and upper-right corners of both variables fall away from the line, showing that the empirical data has fatter tails than the normal distribution, confirming the presence of heavy tails.

Heavy tails in the data led us to choose the Student’s t-distribution, as it better captures extreme values than the Normal distribution.

3.3 Fit Distributions

ddt_ls <- function(x, m, s, df) { dt((x - m) / s, df) / s }

pdt_ls <- function(q, m, s, df) { pt((q - m) / s, df) }

qdt_ls <- function(p, m, s, df) { qt(p, df) * s + m }

fit_amd_t <- fitdist( data = amd_returns, distr = “dt_ls”, start = list(m = mean(amd_returns), s = sd(amd_returns), df = 5) )

fit_nvda_t <- fitdist( data = nvda_returns, distr = “dt_ls”, start = list(m = mean(nvda_returns), s = sd(nvda_returns), df = 5) )

Summary of Fitted Distributions

Parameter Estimates (Student’s t Distribution)

Parameter AMD Estimate AMD Std. Error NVIDIA Estimate NVIDIA Std. Error
m 0.00056906 0.00081523 0.00253001 0.00085504
s 0.02541353 0.00089315 0.02684952 0.00089060
df 4.77887335 0.66512249 5.23219381 0.73697972

Log-Likelihood, AIC, and BIC

Statistic AMD NVIDIA
Log-Likelihood 2655.730 2609.796
AIC -5305.460 -5213.592
BIC -5289.938 -5198.070

Correlation Matrices of Estimated Parameters

AMD Correlation Matrix

m s df
m 1.000000000 0.008037739 0.008940574
s 0.008037739 1.000000000 0.703536243
df 0.008940574 0.703536243 1.000000000

NVIDIA Correlation Matrix

m s df
m 1.00000000 -0.03249407 -0.03744379
s -0.03249407 1.00000000 0.67552588
df -0.03744379 0.67552588 1.00000000

The fitted Student’s t-distribution captures the heavy tails of both assets, with low degrees of freedom and better AIC/BIC than the normal distribution. This validates its use as the marginal distribution before applying the copula model.

4. Dependence Structure

4.1 Rank Correlations

kendall_tau <- cor(amd_returns, nvda_returns, method = "kendall")
spearman_rho <- cor(amd_returns, nvda_returns, method = "spearman")

kendall_tau
## [1] 0.6434669
spearman_rho
## [1] 0.8282584

Kendall’s Tau (0.64) and Spearman’s Rho (0.83) show a strong positive dependence between AMD and NVIDIA returns, supporting the use of a copula to model their joint behavior

4.2 Pseudo-Observations Plot

u_amd <- pobs(as.matrix(amd_returns))[, 1]
u_nvda <- pobs(as.matrix(nvda_returns))[, 1]

plot(u_amd, u_nvda, main = "Pseudo-Observations: AMD vs NVIDIA", xlab = "u_amd", ylab = "u_nvda", pch = 16, cex = 0.5)

The pseudo-observations show strong positive dependence and clustering in both tails, indicating symmetric tail dependence, which justifies using the t-copula for modeling the joint behavior.

5. Copula Fitting and Selection

5.1 Fit Copulas

copula_gaussian <- BiCopEst(u1 = u_amd, u2 = u_nvda, family = 1)
copula_t <- BiCopEst(u1 = u_amd, u2 = u_nvda, family = 2)
copula_clayton <- BiCopEst(u1 = u_amd, u2 = u_nvda, family = 3)
copula_gumbel <- BiCopEst(u1 = u_amd, u2 = u_nvda, family = 4)
copula_frank <- BiCopEst(u1 = u_amd, u2 = u_nvda, family = 5)

5.2 Compare Copulas

copula_comparison <- data.frame(
  Copula = c("Gaussian", "t-Copula", "Clayton", "Gumbel", "Frank"),
  LogLikelihood = c(copula_gaussian$logLik, copula_t$logLik, copula_clayton$logLik, copula_gumbel$logLik, copula_frank$logLik),
  AIC = c(copula_gaussian$AIC, copula_t$AIC, copula_clayton$AIC, copula_gumbel$AIC, copula_frank$AIC),
  BIC = c(copula_gaussian$BIC, copula_t$BIC, copula_clayton$BIC, copula_gumbel$BIC, copula_frank$BIC)
)

copula_comparison
##     Copula LogLikelihood       AIC       BIC
## 1 Gaussian      468.3981 -934.7962 -930.1041
## 2 t-Copula      490.6681 -977.3361 -967.9520
## 3  Clayton      419.4093 -836.8185 -832.1264
## 4   Gumbel      433.6611 -865.3222 -860.6301
## 5    Frank      461.1189 -920.2377 -915.5457

Based on the comparison of log-likelihood, AIC, and BIC values, the Student’s t-copula provides the best fit for the joint distribution of AMD and NVIDIA returns. It has the highest log-likelihood (490.67) and the lowest AIC (-977.34) and BIC (-967.95) among all the copulas tested. This indicates that the t-copula captures the dependence structure more accurately, particularly the symmetric tail dependence observed in the pseudo-observations. The Gaussian copula performed second best but lacks tail dependence, while the Archimedean copulas (Clayton, Gumbel, and Frank) showed poorer fits, likely due to their inability to model both upper and lower tail dependence simultaneously.

5.3 Visualize Best-Fit Copula

u <- v <- seq(0.01, 0.99, length.out = 50)
grid <- expand.grid(u = u, v = v)

z <- matrix(
  BiCopPDF(u1 = grid$u, u2 = grid$v,
           family = copula_t$family,
           par = copula_t$par,
           par2 = copula_t$par2),
  nrow = length(u), ncol = length(v)
)

persp(u, v, z,
      theta = 30, phi = 30, expand = 0.6,
      col = "lightblue",
      xlab = "u (AMD)",
      ylab = "v (NVIDIA)",
      zlab = "Density",
      main = "t-Copula Density: 3D Surface Plot")

library(ggplot2)

copula_df <- data.frame(
  u = grid$u,
  v = grid$v,
  density = as.vector(z)
)

ggplot(copula_df, aes(x = u, y = v, z = density)) +
  geom_contour_filled(breaks = seq(0, 26, by = 2)) +   # Custom breaks like your second plot
  scale_fill_viridis_d(option = "plasma") +            # Optional: better color scheme
  labs(
    title = "Contour Plot of t-Copula Density",
    x = "u (AMD)",
    y = "v (NVIDIA)"
  ) +
  theme_minimal()

The contour plot of the fitted Student’s t-copula shows regions of high joint density along the diagonal from the lower-left to the upper-right, confirming the strong positive dependence between AMD and NVIDIA returns. The plot also reveals symmetric tail dependence, with denser areas in both the lower-left (joint extreme losses) and upper-right (joint extreme gains) corners. This visualization supports the statistical selection of the t-copula, demonstrating its ability to capture the likelihood of simultaneous extreme events in both assets.