Problem 1: Online Updating for Least Squares and Autoregressive Time Series Models

Objective

We aim to implement an online recursive least squares algorithm to build an autoregressive time series model that predicts daily Citibike trips in NYC. The model uses:

The high temperature of the day (TMAX).
The number of trips from the past 7 days.

The goal is to minimize the mean squared error of predictions over time and observe how the model coefficients and \(R^2\) evolve.

Mathematical Formulation

For each day \(t > 7\):

\[ N_{trips, t} = \sum_{i=1}^{7} \theta_i N_{trips, t-i} + \theta_8 TMAX_t \]

Recursive Least Squares Algorithm

Initialize at time \(t\) where \(X_t\) is invertible.
Compute the Gram matrix: \(G_{(t)} = X_{(t)}^T X_{(t)}\) and the product: \(h_{(t)} = X_{(t)}^T y_{(t)}\)
Solve for coefficients \(\theta\): \(G_{(t)} \theta_{(t)} = h_{(t)}\)
Update recursively as new data \(x_{t+1}\) and \(y_{t+1}\) arrive: \(G_{(t+1)} = G_{(t)} + x_{t+1} x_{t+1}^T\) \(h_{(t+1)} = h_{(t)} + y_{t+1} x_{t+1}\)

(a) Solution to Problem 1(a)

Implementation

Load Data

citibike_data <- read_csv("Daily_citibike_trips.csv")

## Rows: 3949 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (2): daily_trips, TMAX
## date (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(citibike_data)

## # A tibble: 6 × 3
##   date       daily_trips  TMAX
##   <date>           <dbl> <dbl>
## 1 2013-05-31         362    90
## 2 2013-06-01        8959    90
## 3 2013-06-02       15467    88
## 4 2013-06-03        7700    78
## 5 2013-06-04       15849    75
## 6 2013-06-05       15724    74

Prepare Data

citibike_data <- citibike_data %>%
  mutate(across(everything(), as.numeric)) %>%
  arrange(date)

for (i in 1:7) {
  citibike_data <- citibike_data %>%
    mutate(!!paste0("lag_", i) := lag(daily_trips, i))
}

citibike_data <- na.omit(citibike_data)

Recursive Least Squares Implementation

X <- as.matrix(citibike_data %>% select(starts_with("lag_"), TMAX))
y <- citibike_data$daily_trips
n <- ncol(X)
t <- nrow(X)

G <- t(X[1:n, ]) %*% X[1:n, ]
h <- t(X[1:n, ]) %*% y[1:n]
theta <- solve(G, h)

theta_history <- matrix(NA, nrow = t - n, ncol = n)
R2_history <- numeric(t - n)

for (i in (n + 1):t) {
  x_new <- matrix(X[i, ], nrow = n, ncol = 1)
  y_new <- y[i]
  
  G <- G + x_new %*% t(x_new)
  h <- h + y_new * x_new
  
  theta <- solve(G, h)
  theta_history[i - n, ] <- theta
  
  y_pred <- X[i, ] %*% theta
  R2_history[i - n] <- 1 - sum((y[1:i] - X[1:i, ] %*% theta)^2) / sum((y[1:i] - mean(y[1:i]))^2)
}

Visualize Results

theta_history_df <- as_tibble(theta_history)

## Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if
## `.name_repair` is omitted as of tibble 2.0.0.
## ℹ Using compatibility `.name_repair`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

theta_history_df$time <- (n + 1):t

theta_history_df %>%
  pivot_longer(-time, names_to = "Coefficient", values_to = "Value") %>%
  ggplot(aes(x = time, y = Value, color = Coefficient)) +
  geom_line() +
  labs(title = "Evolution of Coefficients over Time",
       x = "Time", y = "Coefficient Value")

tibble(time = (n + 1):t, R2 = R2_history) %>%
  ggplot(aes(x = time, y = R2)) +
  geom_line() +
  labs(title = "R² over Time",
       x = "Time", y = "R²")

Results and Observations for Problem 1(a)

Stability and trends in coefficients: Initially, the coefficients exhibit large fluctuations due to limited data and instability in the recursive updates. However, as more data accumulates over time, the coefficients stabilize and converge towards consistent values, indicating a stable autoregressive model.
R² over time: The \(R^2\) value starts relatively low with noticeable volatility during the early periods. As the model receives more observations, \(R^2\) improves and stabilizes around values between 0.75 and 0.9, demonstrating that the model’s explanatory power increases as the data grows.

(b) Solution to Problem 1(b)

Nonlinear Temperature Transformation

To account for the potential nonlinear effect of temperature on Citibike trips, we define a new feature that captures the discomfort of extreme temperatures. For example, we can use:

citibike_data <- citibike_data %>%
  mutate(TMAX_nonlin = -(TMAX - 75)^2)

Prepare Data with Nonlinear Feature

X_nonlin <- as.matrix(citibike_data %>% select(starts_with("lag_"), TMAX_nonlin))
y <- citibike_data$daily_trips
n <- ncol(X_nonlin)
t <- nrow(X_nonlin)

G <- t(X_nonlin[1:n, ]) %*% X_nonlin[1:n, ]
h <- t(X_nonlin[1:n, ]) %*% y[1:n]
theta <- solve(G, h)

theta_history_nonlin <- matrix(NA, nrow = t - n, ncol = n)
R2_history_nonlin <- numeric(t - n)

for (i in (n + 1):t) {
  x_new <- matrix(X_nonlin[i, ], nrow = n, ncol = 1)
  y_new <- y[i]
  
  G <- G + x_new %*% t(x_new)
  h <- h + y_new * x_new
  
  theta <- solve(G, h)
  theta_history_nonlin[i - n, ] <- theta
  
  y_pred <- X_nonlin[i, ] %*% theta
  R2_history_nonlin[i - n] <- 1 - sum((y[1:i] - X_nonlin[1:i, ] %*% theta)^2) / sum((y[1:i] - mean(y[1:i]))^2)
}

Visualize Nonlinear Model Results

theta_history_nonlin_df <- as_tibble(theta_history_nonlin)
theta_history_nonlin_df$time <- (n + 1):t

theta_history_nonlin_df %>%
  pivot_longer(-time, names_to = "Coefficient", values_to = "Value") %>%
  ggplot(aes(x = time, y = Value, color = Coefficient)) +
  geom_line() +
  labs(title = "Evolution of Coefficients over Time (Nonlinear Temp)",
       x = "Time", y = "Coefficient Value")

tibble(time = (n + 1):t, R2 = R2_history_nonlin) %>%
  ggplot(aes(x = time, y = R2)) +
  geom_line() +
  labs(title = "R² over Time (Nonlinear Temp)",
       x = "Time", y = "R²")

Results and Observations for Problem 1(b)

Stability of coefficients: Similar to the linear model, the coefficients in the nonlinear model show initial instability but quickly stabilize and converge near zero. This indicates that the model remains stable over time, even with the nonlinear temperature transformation.
\(R^2\) performance: The \(R^2\) values of the nonlinear model follow a similar trajectory to the linear model, with early fluctuations and gradual improvement. However, there is no significant improvement in \(R^2\) compared to the linear model, indicating that the nonlinear transformation does not greatly enhance model performance.
The model remains robust and stable, but the nonlinear feature provides minimal added value over the original linear model.

Problem 2: Weighted Least Squares

Objective

We aim to analyze the relationship between social mobility and city characteristics using Weighted Least Squares to account for variance differences across cities with varying populations.

The dataset social-mobility.csv includes:

Fraction of individuals moving from the bottom 20% to the top 20% income percentiles.
City population.
Socio-economic factors (commute time, student-teacher ratio).

(a) Solution to Problem 2(a)

Scatter-plot of Mobility vs. Population

social_mobility <- read_csv("social_mobility.csv")
head(social_mobility)

## # A tibble: 6 × 7
##      ID Name             Mobility State Population Student_teacher_ratio Commute
##   <dbl> <chr>               <dbl> <chr>      <dbl>                 <dbl>   <dbl>
## 1   301 Middlesborough     0.0726 TN         66708                  15.1   0.359
## 2   401 Winston-Salem      0.0448 NC        493180                  15.4   0.292
## 3   500 Greensboro         0.0474 NC       1055133                  16.7   0.305
## 4   601 North Wilkesboro   0.0517 NC         90016                  16.2   0.289
## 5   602 Galax              0.0796 VA         64676                  12.3   0.325
## 6   700 Spartanburg        0.0431 SC        354533                  15.9   0.299

# Scatter plot
social_mobility %>%
  ggplot(aes(x = Population, y = Mobility)) +
  geom_point() +
  scale_x_log10() +
  labs(title = "Social Mobility vs. Population",
       x = "Population (log scale)",
       y = "Social Mobility")

Observation problem 2(a):

Variance in social mobility decreases as population increases, meaning that larger cities tend to have more consistent social mobility outcomes.

(b) Solution to Problem 2(b)

Variance Formula

This formula is derived from the variance of a binomial proportion, where mobility is the observed proportion of successful outcomes (individuals achieving upward mobility) and population represents the number of trials (individuals in the city). The variance of a proportion from a binomial distribution is given by:

Assuming binomial distribution: \[ Var(\text{mobility}) = \frac{mobility \times (1 - mobility)}{population} \] This reflects that larger populations produce more stable mobility estimates.

(c) Solution to Problem 2(c)

Weighted Least Squares Implementation

social_mobility <- social_mobility %>%
  mutate(variance = (Mobility * (1 - Mobility)) / Population,
         weight = 1 / variance)

# Weighted Least Squares Model
wls_model <- lm(Mobility ~ Commute + Student_teacher_ratio,
                data = social_mobility,
                weights = weight)

summary(wls_model)

## 
## Call:
## lm(formula = Mobility ~ Commute + Student_teacher_ratio, data = social_mobility, 
##     weights = weight)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -289.28  -13.16   12.95   35.02  443.53 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -0.0025339  0.0088872  -0.285    0.776    
## Commute                0.0623962  0.0109843   5.680 1.97e-08 ***
## Student_teacher_ratio  0.0031117  0.0003937   7.904 1.06e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 57.62 on 696 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.09526,    Adjusted R-squared:  0.09266 
## F-statistic: 36.64 on 2 and 696 DF,  p-value: 7.406e-16

# Ordinary Least Squares Model for Comparison
ols_model <- lm(Mobility ~ Commute + Student_teacher_ratio,
                data = social_mobility)

summary(ols_model)

## 
## Call:
## lm(formula = Mobility ~ Commute + Student_teacher_ratio, data = social_mobility)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.15904 -0.02260 -0.00537  0.01868  0.31943 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            0.0359572  0.0162414   2.214   0.0272 *  
## Commute                0.2118423  0.0129576  16.349   <2e-16 ***
## Student_teacher_ratio -0.0018822  0.0007617  -2.471   0.0137 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04273 on 696 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.3562, Adjusted R-squared:  0.3544 
## F-statistic: 192.6 on 2 and 696 DF,  p-value: < 2.2e-16

Observations problem 2(b)

Weighted Least Squares vs. Ordinary Least Squares Coefficients:
- In the Weighted Least Squares model, the coefficient for Commute is 0.0624 and for Student_teacher_ratio is 0.0031.
- In the Ordinary Least Squares model, the coefficient for Commute is 0.2118 and for Student_teacher_ratio is -0.0019, with different signs and magnitudes.
Interpretation:
- The Weighted Least Squares model reduces the influence of smaller cities with high variance, leading to smaller, more stable coefficient estimates.
- Ordinary Least Squares tends to give more weight to smaller cities, which increases the effect sizes and variance.
Model Fit:
- Weighted Least Squares has a lower R-squared (0.0953) compared to Ordinary Least Squares (0.3562), indicating a lower proportion of variance explained after adjusting for heteroscedasticity through weighting.
- This suggests that Ordinary Least Squares might overfit the noisier, smaller cities, whereas Weighted Least Squares prioritizes stability across the dataset.

Problem 3: Markowitz Portfolio Optimization

Objective

We aim to construct optimal portfolios using Markowitz Portfolio Optimization to minimize risk while targeting specific expected returns, based on asset returns data from 2020-2024.

The dataset stock_returns.csv includes:

Company: Ticker symbol.
date: Date of the observation.
adjusted: Adjusted closing price.
return: Daily return.
log_return: Logarithm of the daily return.

We will divide the data into:

Training period: 2020-2022
Testing period: 2022-2024

(a) Solution to Problem 3(a)

Construct the annualized rate of return and covariance matrix

library(lubridate)
library(tidyr)
library(quadprog)

# Load the data
stock_data <- read_csv("stock_returns.csv")

# Split into training and testing sets
train_data <- stock_data %>% filter(year(date) <= 2022)
test_data <- stock_data %>% filter(year(date) > 2022)

# Calculate annualized returns
annualized_returns <- train_data %>%
  group_by(Company) %>%
  summarize(mu = exp(0.5 * sum(log_return)) - 1)

# Create wide format for covariance
returns_wide <- train_data %>%
  select(date, Company, log_return) %>%
  pivot_wider(names_from = Company, values_from = log_return)

# Covariance matrix
Gamma <- cov(returns_wide %>% select(-date), use = "pairwise.complete.obs")

Optimize portfolios for r = 1.05, 1.10, 1.20

# Function to solve Markowitz optimization
optimize_portfolio <- function(Gamma, mu, r_target) {
  Dmat <- 2 * Gamma
  dvec <- rep(0, length(mu))
  Amat <- cbind(rep(1, length(mu)), mu, diag(1, length(mu)))
  bvec <- c(1, r_target, rep(0, length(mu)))
  result <- solve.QP(Dmat, dvec, Amat, bvec, meq = 2)
  return(result$solution)
}

# Prepare inputs
mu_vector <- annualized_returns$mu

# Optimize for each target return
weights_r1 <- optimize_portfolio(Gamma, mu_vector, 1.05)
weights_r2 <- optimize_portfolio(Gamma, mu_vector, 1.10)
weights_r3 <- optimize_portfolio(Gamma, mu_vector, 1.20)

# Store the weights
portfolio_weights <- tibble(
  Company = annualized_returns$Company,
  r_1.05 = weights_r1,
  r_1.10 = weights_r2,
  r_1.20 = weights_r3
)

portfolio_weights

## # A tibble: 471 × 4
##    Company    r_1.05    r_1.10    r_1.20
##    <chr>       <dbl>     <dbl>     <dbl>
##  1 A       -1.30e-17  6.20e-17  2.34e-17
##  2 AAPL     2.07e-16  2.13e-17  3.97e-16
##  3 ABBV     1.07e-16 -5.92e-17  2.11e-16
##  4 ABEV     2.25e-17  1.05e-17 -2.65e-16
##  5 ABT      4.41e-17 -2.64e-17 -9.69e-17
##  6 ACGL     2.47e-17 -3.52e-18 -5.10e-17
##  7 ACN      2.45e-17 -3.45e-18  7.27e-19
##  8 ADBE    -7.46e-17 -1.19e-16 -1.15e-16
##  9 ADI     -6.64e-16  3.86e-16 -3.66e-16
## 10 ADM      3.15e-17 -1.34e-17 -3.19e-17
## # ℹ 461 more rows

(b) Solution to Problem 3(b)

Plot cumulative value of portfolios and report statistics

# Prepare test data in wide format
returns_test_wide <- test_data %>%
  select(date, Company, log_return) %>%
  pivot_wider(names_from = Company, values_from = log_return)

# Function to calculate cumulative returns
test_cum_return <- function(weights, returns_data) {
  returns_matrix <- as.matrix(returns_data %>% select(-date))
  daily_returns <- returns_matrix %*% weights
  cum_returns <- exp(cumsum(daily_returns))
  tibble(date = returns_data$date, cumulative_return = cum_returns)
}

# Cumulative returns for each portfolio
cum_r1 <- test_cum_return(weights_r1, returns_test_wide)
cum_r2 <- test_cum_return(weights_r2, returns_test_wide)
cum_r3 <- test_cum_return(weights_r3, returns_test_wide)

# Plot cumulative returns
ggplot() +
  geom_line(data = cum_r1, aes(x = date, y = cumulative_return, color = "r = 1.05")) +
  geom_line(data = cum_r2, aes(x = date, y = cumulative_return, color = "r = 1.10")) +
  geom_line(data = cum_r3, aes(x = date, y = cumulative_return, color = "r = 1.20")) +
  labs(title = "Cumulative Returns of Portfolios", x = "Date", y = "Cumulative Return") +
  scale_color_manual(name = "Target Return", values = c("r = 1.05" = "blue", "r = 1.10" = "green", "r = 1.20" = "red"))

Report statistics

# Function to calculate portfolio statistics
portfolio_stats <- function(weights, returns_data) {
  returns_matrix <- as.matrix(returns_data %>% select(-date))
  daily_returns <- returns_matrix %*% weights
  annual_return <- exp(mean(daily_returns) * 252) - 1
  risk <- var(daily_returns) * 252
  max_weight <- max(weights)
  leverage <- sum(abs(weights))
  list(annual_return = annual_return, risk = risk, max_weight = max_weight, leverage = leverage)
}

stats_r1 <- portfolio_stats(weights_r1, returns_test_wide)
stats_r2 <- portfolio_stats(weights_r2, returns_test_wide)
stats_r3 <- portfolio_stats(weights_r3, returns_test_wide)

stats_r1

## $annual_return
## [1] -0.01875454
## 
## $risk
##             [,1]
## [1,] 0.005464002
## 
## $max_weight
## [1] 0.6622191
## 
## $leverage
## [1] 1

stats_r2

## $annual_return
## [1] -0.02444645
## 
## $risk
##             [,1]
## [1,] 0.006526361
## 
## $max_weight
## [1] 0.6282857
## 
## $leverage
## [1] 1

stats_r3

## $annual_return
## [1] -0.03573142
## 
## $risk
##             [,1]
## [1,] 0.008961743
## 
## $max_weight
## [1] 0.5604189
## 
## $leverage
## [1] 1

Observations problem 3(b)

The cumulative return plots show that portfolios with higher target returns (r = 1.20) experienced greater volatility and sharper drawdowns compared to lower target return portfolios (r = 1.05).
Despite aiming for higher returns, the actual realized annual returns on the test set were negative for all portfolios, with -1.88% for r = 1.05, -2.44% for r = 1.10, and -3.57% for r = 1.20.
Risk increased with higher target returns, with variances of 0.0055, 0.0065, and 0.0089 respectively, showing that aiming for higher returns introduced higher realized volatility.
The maximum allocation weight decreased slightly as target returns increased, suggesting more diversification in the higher target return portfolios.
All portfolios maintained a leverage of 1, indicating no short positions.

(c) Solution to Problem 3(c)

Apply ridge regression norm penalty with lambda values

lambda_values <- 10^seq(-1, 1, length.out = 10)
penalized_results <- list()

for (lambda in lambda_values) {
  Dmat <- 2 * (Gamma + lambda * diag(ncol(Gamma)))
  dvec <- rep(0, length(mu_vector))
  Amat <- cbind(rep(1, length(mu_vector)), mu_vector, diag(1, length(mu_vector)))
  bvec <- c(1, 1.20, rep(0, length(mu_vector)))
  result <- solve.QP(Dmat, dvec, Amat, bvec, meq = 2)
  penalized_results[[as.character(lambda)]] <- result$solution
}

# Collect statistics for each penalized portfolio
penalized_stats <- tibble(lambda = lambda_values, annual_return = NA, risk = NA, max_weight = NA, leverage = NA)

for (i in seq_along(lambda_values)) {
  weights <- penalized_results[[as.character(lambda_values[i])]]
  stats <- portfolio_stats(weights, returns_test_wide)
  penalized_stats$annual_return[i] <- stats$annual_return
  penalized_stats$risk[i] <- stats$risk
  penalized_stats$max_weight[i] <- stats$max_weight
  penalized_stats$leverage[i] <- stats$leverage
}

penalized_stats

## # A tibble: 10 × 5
##    lambda annual_return   risk max_weight leverage
##     <dbl>         <dbl>  <dbl>      <dbl>    <dbl>
##  1  0.1           0.116 0.0184      0.307     1.00
##  2  0.167         0.116 0.0186      0.307     1.00
##  3  0.278         0.116 0.0187      0.306     1.00
##  4  0.464         0.116 0.0187      0.306     1.00
##  5  0.774         0.116 0.0187      0.306     1.00
##  6  1.29          0.116 0.0188      0.306     1.00
##  7  2.15          0.116 0.0188      0.306     1.00
##  8  3.59          0.116 0.0188      0.306     1.00
##  9  5.99          0.116 0.0188      0.306     1.00
## 10 10             0.116 0.0188      0.306     1.00

Observations problem 3(c)

As lambda increases, the annual return slightly improves and stabilizes, with values around 11.5% to 11.6%, suggesting regularization helps avoid extreme portfolios.
Risk increases modestly with higher lambda values, from approximately 0.0184 to 0.0188, indicating that while diversification improves, some variance trade-off remains.
Maximum portfolio weights decrease as lambda increases, showing that regularization spreads the investment more evenly across assets, reducing reliance on any single stock.
Leverage remains at 1 across all lambda values, confirming that no short positions are introduced.
Comparing in-sample and out-of-sample performance, higher lambda values appear to reduce overfitting, leading to portfolios that maintain stable performance when applied to new data.
Regularization using lambda effectively balances the trade-off between maximizing return and minimizing risk while improving out-of-sample generalization.

Application of Least Squares

Souleymane Doumbia

2025-03-02

Problem 1: Online Updating for Least Squares and Autoregressive Time Series Models

Objective

Mathematical Formulation

Recursive Least Squares Algorithm

(a) Solution to Problem 1(a)

Implementation

Load Data

Prepare Data

Recursive Least Squares Implementation

Visualize Results

Results and Observations for Problem 1(a)

(b) Solution to Problem 1(b)

Nonlinear Temperature Transformation

Prepare Data with Nonlinear Feature

Visualize Nonlinear Model Results

Results and Observations for Problem 1(b)

Problem 2: Weighted Least Squares

Objective

(a) Solution to Problem 2(a)

Scatter-plot of Mobility vs. Population

Observation problem 2(a):

(b) Solution to Problem 2(b)

Variance Formula

(c) Solution to Problem 2(c)

Weighted Least Squares Implementation

Observations problem 2(b)

Problem 3: Markowitz Portfolio Optimization

Objective

(a) Solution to Problem 3(a)

Construct the annualized rate of return and covariance matrix

Optimize portfolios for r = 1.05, 1.10, 1.20

(b) Solution to Problem 3(b)

Plot cumulative value of portfolios and report statistics

Report statistics

Observations problem 3(b)

(c) Solution to Problem 3(c)

Apply ridge regression norm penalty with lambda values

Observations problem 3(c)