Examining Costco’s Stock Price Using Forecasting Models

Introduction

Costco’s stock price goes through changes whenever the company announces new policies. This past June, Costco increased its membership price fees, and we would like to find the best method to forecast the Costco stock price for a short period. 

We are looking at moving averages, weighted moving averages, ETS, and linear forecasting methods to find the most accurate model. Once we find the model with the least error value, we can accurately forecast the stock price for the next 90 days.

Problem statement

We want to forecast Costco’s stock closing price after a period of change based on past prices, to better inform consumers and stockholders about the upcoming changes in stock closing prices.

Objective

Use the best forecasting method to predict 30 days of Costco's closing stock price. 

Project Variables

Dependant Variable: Costco's Closing Stock Price ($)
Independant Variable: Time (Days)

Load and Clean the Data

# Load necessary libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(TTR)
library(ggplot2)
library(dplyr)
library(lubridate)
library(zoo)
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(readr)
library(dplyr)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
# Load the datase
data <- read_csv("COST_stock_data.csv")
## New names:
## Rows: 9564 Columns: 8
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ticker dbl (6): open, high, low, close, adjclose, volume date (1): ...1
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
# upload xlsx file called "COST_stock_data.csv"

# Convert the 'date' column to Date type
data$...1 <- as.Date(data$...1, format="%Y-%m-%d")

# Change column named "...1" to "Date"
colnames(data)[colnames(data) == "...1"] <- "Date"

# Change column named "close" to "Close"
colnames(data)[colnames(data) == "close"] <- "Close"

# Keep only Date and Close columns
data <- data %>% select(Date, Close)

# Descriptive statistics
summary(data)
##       Date                Close        
##  Min.   :1986-07-09   Min.   :  6.031  
##  1st Qu.:1995-12-19   1st Qu.: 12.125  
##  Median :2005-06-20   Median : 46.515  
##  Mean   :2005-06-21   Mean   :110.432  
##  3rd Qu.:2014-12-17   3rd Qu.:138.780  
##  Max.   :2024-06-20   Max.   :870.750
# Add Periods for later linear regression
data$Period <- seq_len(nrow(data))

# Convert close column data to numeric
data <- data %>%
  mutate(Close = as.numeric(Close))

# Make sure the date column is in date format
data$Date <- as.Date(data$Date)

# Filter the data for the last 90 days
data_90_days <- tail(data, 90)

head(data_90_days)
## # A tibble: 6 × 3
##   Date       Close Period
##   <date>     <dbl>  <int>
## 1 2024-02-12  722.   9475
## 2 2024-02-13  714.   9476
## 3 2024-02-14  722.   9477
## 4 2024-02-15  725.   9478
## 5 2024-02-16  724.   9479
## 6 2024-02-20  726.   9480

Plot the Data

# Plot the adjusted closing prices over time
ggplot(data_90_days, aes(x = Date, y = Close)) +
  geom_line(color = "purple") +
  labs(title = "Costco Stock  Close Price Over Time",
       x = "Date", y = "Close Price") +
  theme_minimal()

## EXPONENTIAL SMOOTHING | “ETS” (Error, Trend, Seasonal) = “MAN”

# Fit the ETS model to the last 90 days of data
# Fit an ETS model with an additive trend
fit_ets <- ets(data_90_days$Close, model = "MAN")
# Multiplicative Error, Additive Trend, No Seasonality

# Display model summary
summary(fit_ets)
## ETS(M,A,N) 
## 
## Call:
## ets(y = data_90_days$Close, model = "MAN")
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 720.7236 
##     b = 1.5661 
## 
##   sigma:  0.0128
## 
##      AIC     AICc      BIC 
## 819.8073 820.5216 832.3063 
## 
## Training set error measures:
##                      ME     RMSE      MAE         MPE      MAPE      MASE
## Training set 0.01179543 9.609848 6.378856 -0.01536474 0.8451029 0.9656555
##                    ACF1
## Training set 0.03191804
# Forecast for the next 30 days
forecast_horizon <- 30
forecast_ets <- forecast(fit_ets, h = forecast_horizon)

# Print forecast values
print(forecast_ets)
##     Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
##  91       864.0072 849.8487 878.1657 842.3536  885.6608
##  92       865.5734 845.5312 885.6155 834.9216  896.2252
##  93       867.1395 842.5690 891.7101 829.5621  904.7169
##  94       868.7057 840.3061 897.1054 825.2722  912.1392
##  95       870.2719 838.4887 902.0551 821.6636  918.8801
##  96       871.8381 836.9868 906.6894 818.5376  925.1385
##  97       873.4042 835.7232 911.0853 815.7761  931.0324
##  98       874.9704 834.6477 915.2931 813.3022  936.6386
##  99       876.5366 833.7255 919.3477 811.0627  942.0105
## 100       878.1028 832.9312 923.2743 809.0188  947.1867
## 101       879.6689 832.2457 927.0922 807.1413  952.1966
## 102       881.2351 831.6541 930.8161 805.4075  957.0628
## 103       882.8013 831.1446 934.4579 803.7993  961.8033
## 104       884.3675 830.7078 938.0271 802.3021  966.4328
## 105       885.9336 830.3357 941.5316 800.9039  970.9633
## 106       887.4998 830.0218 944.9778 799.5947  975.4049
## 107       889.0660 829.7605 948.3715 798.3660  979.7659
## 108       890.6322 829.5471 951.7172 797.2106  984.0537
## 109       892.1983 829.3775 955.0192 796.1222  988.2745
## 110       893.7645 829.2482 958.2808 795.0953  992.4337
## 111       895.3307 829.1560 961.5054 794.1252  996.5361
## 112       896.8969 829.0982 964.6955 793.2078 1000.5859
## 113       898.4630 829.0725 967.8536 792.3393 1004.5867
## 114       900.0292 829.0765 970.9819 791.5165 1008.5419
## 115       901.5954 829.1085 974.0823 790.7362 1012.4545
## 116       903.1615 829.1665 977.1566 789.9960 1016.3271
## 117       904.7277 829.2492 980.2063 789.2933 1020.1622
## 118       906.2939 829.3550 983.2328 788.6259 1023.9619
## 119       907.8601 829.4826 986.2376 787.9920 1027.7281
## 120       909.4262 829.6308 989.2217 787.3896 1031.4629
# Plot the forecast
plot(forecast_ets)

# Get MAPE besides from summary
# Get fitted values from the ETS model
fitted_values <- fitted(fit_ets)

# Calculate the absolute percentage error for each observation
data_90_days <- data_90_days %>%
  mutate(absolute_percentage_error = abs((Close - fitted_values) / Close) * 100)

# Calculate the Mean Absolute Percentage Error (MAPE)
mape_ets <- mean(data_90_days$absolute_percentage_error, na.rm = TRUE) # MAPE: 0.85

# Print the MAPE
print(paste("Mean Absolute Percentage Error (MAPE):", round(mape_ets, 2), "%"))
## [1] "Mean Absolute Percentage Error (MAPE): 0.85 %"

Assumptions tests

# Extract residuals from the ETS model
residuals_ets <- residuals(fit_ets)

# Check for Normality of Residuals
# Histogram
hist(residuals_ets, main = "Histogram of Residuals", xlab = "Residuals", breaks = 20)

As observed, the residuals are centered around zero, indicating no substantial bias. The shape is slightly skewed to the left, the spread is relatively narrow, with most residuals concentrated near the center. While the residuals are centered and mostly symmetric, the slight skewness on the left suggests a mild deviation from normality. However, the deviation is not severe, so the assumption of normal residuals is still approximately valid.
# Q-Q Plot
qqnorm(residuals_ets)
qqline(residuals_ets, col = "red")

Around the center area of all the circles, we observe that they are lining up closely to the standard red diagonal line, only on the far right end that there are a few circles that stay a little further away from the red line, reflecting that there is some deviations, though, as the majority of the points align well with the red line. 

Moving Average and Weighted Moving Average Models

# Add Year and Month Columns
data_90_days <- data_90_days %>%
  mutate(Year = lubridate::year(Date), Month = lubridate::month(Date))

# Calculate the average monthly close price
monthly_avg <- data_90_days %>%
  group_by(Year, Month) %>%
  summarize(Average_Monthly_Close = mean(Close, na.rm = TRUE))
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
# View the Result
head(monthly_avg)
## # A tibble: 5 × 3
## # Groups:   Year [1]
##    Year Month Average_Monthly_Close
##   <dbl> <dbl>                 <dbl>
## 1  2024     2                  732.
## 2  2024     3                  740.
## 3  2024     4                  718.
## 4  2024     5                  783.
## 5  2024     6                  847.
# Create a Date column for the first day of each month
monthly_avg <- monthly_avg %>%
  mutate(Date = as.Date(paste(Year, Month, "01", sep = "-")))

# Create a continuous Date column
monthly_avg <- monthly_avg %>%
  mutate(Date = as.Date(paste(Year, Month, "01", sep = "-"))) %>%
  arrange(Date)

# Fill in missing months with NA (if gaps exist)
monthly_avg <- monthly_avg %>%
  complete(Date = seq(min(Date), max(Date), by = "month"))

# Fill missing Average_Monthly_Close values with interpolation
monthly_avg <- monthly_avg %>%
  arrange(Date) %>%
  mutate(Average_Monthly_Close = zoo::na.approx(Average_Monthly_Close, na.rm = FALSE))

# Ensure no NA values disrupt SMA calculation
library(zoo)

# Fill missing values in Average_Monthly_Close (linear interpolation)
monthly_avg <- monthly_avg %>%
  arrange(Date) %>%
  mutate(Average_Monthly_Close = na.approx(Average_Monthly_Close, na.rm = FALSE))

# Calculate the 3-month moving average (ignoring edge cases with fewer points)
monthly_avg <- monthly_avg %>%
  mutate(Moving_Avg_3 = ifelse(row_number() >= 3, 
                               SMA(Average_Monthly_Close, n = 3),
                               NA))
# Check result
head(monthly_avg)
## # A tibble: 5 × 5
## # Groups:   Year [1]
##    Year Date       Month Average_Monthly_Close Moving_Avg_3
##   <dbl> <date>     <dbl>                 <dbl>        <dbl>
## 1  2024 2024-02-01     2                  732.          NA 
## 2  2024 2024-03-01     3                  740.          NA 
## 3  2024 2024-04-01     4                  718.         730.
## 4  2024 2024-05-01     5                  783.         747.
## 5  2024 2024-06-01     6                  847.         783.
# Calculate the 3 Month Moving Average
monthly_avg <- monthly_avg %>%
  arrange(Year, Month) %>%
  mutate(Moving_Avg_3 = rollmean(Average_Monthly_Close, k = 3, fill = NA, align = "right"))

# Merge the 3-month moving average back with the daily data
data_90_days <- data_90_days %>%
  mutate(Year = year(Date), Month = month(Date)) %>%
  left_join(monthly_avg, by = c("Year", "Month"))

# Calculate the squared errors for dates with moving averages
data_90_days <- data_90_days %>%
  mutate(
    squared_error3 = ifelse(is.na(Moving_Avg_3), NA, (Close - Moving_Avg_3)^2)
  )


# Calculate the absolute percentage errors for dates with moving averages
data_90_days <- data_90_days %>%
  mutate(
    absolute_percentage_error_ma = ifelse(
      is.na(Moving_Avg_3) | Close == 0, 
      NA, 
      abs((Close - Moving_Avg_3) / Close) * 100
    )
  )

# Compute MAPE (excluding NA values)
mape_ma <- mean(data_90_days$absolute_percentage_error_ma, na.rm = TRUE)
mape_ma # MAPE (MA): 4.31%
## [1] 4.308358
# Plot the Average Monthly Close Price Over Time
ggplot(monthly_avg, aes(x = as.Date(paste(Year, Month, "01", sep = "-")), y = Average_Monthly_Close)) +
  geom_line(color = "blue") +
  labs(title = "Average Monthly Close Price Over Time",
       x = "Date", 
       y = "Average Monthly Close PRice") +
  theme_minimal()

# Plot with moving average
ggplot(monthly_avg, aes(x = Date)) +
  geom_line(aes(y = Average_Monthly_Close, color = "Average Monthly Close")) +
  geom_line(aes(y = Moving_Avg_3, color = "3-Month Moving Average")) +
  labs(title = "Average Monthly Close Price with 3-Month Moving Average",
       x = "Date", 
       y = "Price") +
  scale_color_manual(values = c("Average Monthly Close" = "blue", 
                                "3-Month Moving Average" = "red")) +
  theme_minimal()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Load forecasting library
suppressPackageStartupMessages(library(forecast))
library(forecast)

# Convert the Average_Monthly_Close to a time series object
ts_data <- ts(monthly_avg$Average_Monthly_Close, start = c(min(monthly_avg$Year), min(monthly_avg$Month)), frequency = 12)

# Define weights for the WMA (e.g., for a 3-month WMA)
weights <- c(0.5, 0.3, 0.2)  # Most recent month gets highest weight

# Calculate WMA for the existing data
monthly_avg <- monthly_avg %>%
  mutate(WMA = zoo::rollapply(Average_Monthly_Close, 
                              width = 3, 
                              FUN = function(x) sum(weights * x, na.rm = TRUE), 
                              align = "right", 
                              fill = NA))

# Calculate MAPE for model predictions
actual <- monthly_avg$Average_Monthly_Close
predicted <- monthly_avg$WMA  # Use WMA, not Moving_Avg_3

# Remove NA values (from the initial rows of Moving_Avg_3)
valid_data <- !is.na(predicted)
actual <- actual[valid_data]
predicted <- predicted[valid_data]

# Calculate MAPE for WMA
mape_wma <- mean(abs((actual - predicted) / actual)) * 100
cat("MAPE (WMA):", mape_wma, "%", "\n") # MAPE (WMA) : 5.64%
## MAPE (WMA): 5.637632 %
# Forecast for July to December 2024
# Start by adding placeholder rows for the forecast period
future_dates <- seq(as.Date("2024-07-01"), as.Date("2024-12-01"), by = "month")
forecast_data_90_days <- data.frame(Date = future_dates, Average_Monthly_Close = NA)

# Combine current and future data
monthly_avg <- bind_rows(monthly_avg, forecast_data_90_days)

# Generate forecasts using WMA
for (i in (nrow(monthly_avg) - 5):nrow(monthly_avg)) {
  # Use the last three months of WMA data for each forecast
  recent_data <- tail(monthly_avg$Average_Monthly_Close[1:(i-1)], 3)
  
  # Ensure there are enough data points
  if (length(recent_data) == 3) {
    monthly_avg$Average_Monthly_Close[i] <- sum(weights * recent_data)
  }
}

# Filter only forecasted data for July–December 2024
forecast_results <- monthly_avg %>%
  filter(Date >= as.Date("2024-07-01") & Date <= as.Date("2024-12-01"))

# Display the forecasted results
print(forecast_results)
## # A tibble: 6 × 6
## # Groups:   Year [1]
##    Year Date       Month Average_Monthly_Close Moving_Avg_3   WMA
##   <dbl> <date>     <dbl>                 <dbl>        <dbl> <dbl>
## 1    NA 2024-07-01    NA                  764.           NA    NA
## 2    NA 2024-08-01    NA                  799.           NA    NA
## 3    NA 2024-09-01    NA                  812.           NA    NA
## 4    NA 2024-10-01    NA                  784.           NA    NA
## 5    NA 2024-11-01    NA                  800.           NA    NA
## 6    NA 2024-12-01    NA                  801.           NA    NA
# Plot the data
ggplot(monthly_avg, aes(x = Date)) +
  geom_line(aes(y = Average_Monthly_Close, color = "Actual Data"), na.rm = TRUE) +
  geom_point(data = forecast_results, aes(y = Average_Monthly_Close, color = "Forecast"), size = 2) +
  labs(title = "Forecast for Average Monthly Closing Price (July to December 2024)",
       x = "Date", 
       y = "Closing Price") +
  scale_color_manual(values = c("Actual Data" = "blue", "Forecast" = "red")) +
  theme_minimal()

## Linear Trend Model

library(readxl)
library(ggplot2)
library(readr)
library(dplyr)

# Load the data
data <- read_csv("COST_stock_data.csv")
## New names:
## Rows: 9564 Columns: 8
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ticker dbl (6): open, high, low, close, adjclose, volume date (1): ...1
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
# Convert the 'date' column to Date type
data$...1 <- as.Date(data$...1, format="%Y-%m-%d")

# Change column named "...1" to "Date"
colnames(data)[colnames(data) == "...1"] <- "Date"

# Descriptive statistics
summary(data)
##       Date                 open             high              low         
##  Min.   :1986-07-09   Min.   :  6.00   Min.   :  6.125   Min.   :  5.875  
##  1st Qu.:1995-12-19   1st Qu.: 12.12   1st Qu.: 12.312   1st Qu.: 11.938  
##  Median :2005-06-20   Median : 46.56   Median : 47.214   Median : 45.877  
##  Mean   :2005-06-21   Mean   :110.36   Mean   :111.405   Mean   :109.340  
##  3rd Qu.:2014-12-17   3rd Qu.:138.81   3rd Qu.:139.740   3rd Qu.:137.915  
##  Max.   :2024-06-20   Max.   :871.31   Max.   :873.960   Max.   :867.700  
##      close            adjclose           volume            ticker         
##  Min.   :  6.031   Min.   :  4.061   Min.   :   46400   Length:9564       
##  1st Qu.: 12.125   1st Qu.:  8.165   1st Qu.: 1508475   Class :character  
##  Median : 46.515   Median : 32.097   Median : 2312050   Mode  :character  
##  Mean   :110.432   Mean   : 97.419   Mean   : 2890320                     
##  3rd Qu.:138.780   3rd Qu.:113.943   3rd Qu.: 3713000                     
##  Max.   :870.750   Max.   :870.750   Max.   :56773500
# Filter the last 90 days of data
df <- tail(data, 90)

# Add Period for later linear regression
df$Period <- seq_len(nrow(df))

# Construct a time series plot
ggplot(df, aes(x = Period, y = close)) +
  geom_line() +
  geom_point() +
  xlab("Period") +
  ylab("Jobs") +
  ggtitle("Time Series Plot of Costco data")

# Develop a linear trend equation
model <- lm(close ~ Period, data = df)
summary(model)
## 
## Call:
## lm(formula = close ~ Period, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -53.490 -24.398   3.335  22.864  63.181 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 698.0537     6.4694  107.90   <2e-16 ***
## Period        1.3531     0.1235   10.96   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30.43 on 88 degrees of freedom
## Multiple R-squared:  0.5771, Adjusted R-squared:  0.5723 
## F-statistic: 120.1 on 1 and 88 DF,  p-value: < 2.2e-16
# Calculate predictions and error metrics
df$predicted_close <- predict(model)

# Calculate residuals
df$residuals <- df$close - df$predicted_close

# Mean Squared Error (MSE)
mse_lt <- mean(df$residuals^2)
cat("Mean Squared Error (MSE):", mse_lt, "\n")
## Mean Squared Error (MSE): 905.5139
# Mean Absolute Percentage Error (MAPE)
df$percentage_error <- abs(df$residuals / df$close) * 100
mape_lt <- mean(df$percentage_error)
cat("Mean Absolute Percentage Error (MAPE):", mape_lt, "%\n")
## Mean Absolute Percentage Error (MAPE): 3.31251 %

Compare Forecasting Models

## Compare Models
# Collect MAPEs from all models
model_mapes <- data.frame(
  Model = c("ETS", "3-Month Moving Average", "Weighted Moving Average", "Linear Trend"),
  MAPE = c(mape_ets, mape_ma, mape_wma, mape_lt)
)

# Print the comparison table
cat("MAPE Comparison Across Models:\n")
## MAPE Comparison Across Models:
print(model_mapes)
##                     Model      MAPE
## 1                     ETS 0.8451029
## 2  3-Month Moving Average 4.3083585
## 3 Weighted Moving Average 5.6376317
## 4            Linear Trend 3.3125096
# Highlight the best model (lowest MAPE)
best_model <- model_mapes[which.min(model_mapes$MAPE), ]
cat("\nThe model with the lowest MAPE is:\n")
## 
## The model with the lowest MAPE is:
print(best_model)
##   Model      MAPE
## 1   ETS 0.8451029

Interpretations

Using the ETS: MAN statistical analysis method, we set the smoothing parameter alpha to 0.9999 to give more weight to recent observations, making it ideal for forecasting stock prices in a fast-changing environment. Beta, set at 0.0001, reflects confidence in a steady trend with minimal fluctuations, aligning with the observed stability in stock price trends. The Normal Q-Q Test and Histogram of Residuals show that residuals are approximately normally distributed, with minor deviations in the tails and a slight left skew in the histogram. Despite these small deviations, the assumption of normality for residuals is reasonable and supports the model's validity.

To select the best forecasting method for our data, we compared the MAPE (Mean Absolute Percentage Error), a common metric for assessing forecasting accuracy. The MAPE for ETS, Exponential Smoothing, Weighted Moving Average, and Linear Trend is 0.85%, 4.32%, 4.60%, and 3.31% respectively. Among the methods analyzed, ETS (Error, Trend, Seasonality) had the lowest MAPE, indicating its suitability for our analysis.

The small standard deviation (0.0128), MAE (6.3789), and RMSE (9.6098) indicate low variability and minimal differences between actual and forecasted stock prices relative to their higher values.

Conclusion

Using the forecasting ETS(M,A,N) model, we predicted the stock price movement of Costco and found a strong positive trend. In the very near term, Costco could expect to be benefiting froman increase in its stock price. A positive trend indicates investor confidence in the stock or the overall market so it may reflect expectations of the company’s strong future performance or growth potential. 

Observing the difference between multiplicative and additive properties, we fitted our model with appropriate metrics for error, trend, and seasonality. The model holds stable, has good error metrics, and therefore gives a reliable short-term (30 day) forecast.

Recommendations

Costco should focus on its growth in markets to make Costco's reputation a good investment
The stability of the stock appeals to individuals seeking a safe investment choice.
Use short-term stock price forecasting during stable periods of the company and update the model for trends to ensure accurate forecasting.