Costco’s stock price goes through changes whenever the company announces new policies. This past June, Costco increased its membership price fees, and we would like to find the best method to forecast the Costco stock price for a short period.
We are looking at moving averages, weighted moving averages, ETS, and linear forecasting methods to find the most accurate model. Once we find the model with the least error value, we can accurately forecast the stock price for the next 90 days.
We want to forecast Costco’s stock closing price after a period of change based on past prices, to better inform consumers and stockholders about the upcoming changes in stock closing prices.
Use the best forecasting method to predict 30 days of Costco's closing stock price.
Dependant Variable: Costco's Closing Stock Price ($)
Independant Variable: Time (Days)
# Load necessary libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(TTR)
library(ggplot2)
library(dplyr)
library(lubridate)
library(zoo)
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(readr)
library(dplyr)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
# Load the datase
data <- read_csv("COST_stock_data.csv")
## New names:
## Rows: 9564 Columns: 8
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ticker dbl (6): open, high, low, close, adjclose, volume date (1): ...1
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
# upload xlsx file called "COST_stock_data.csv"
# Convert the 'date' column to Date type
data$...1 <- as.Date(data$...1, format="%Y-%m-%d")
# Change column named "...1" to "Date"
colnames(data)[colnames(data) == "...1"] <- "Date"
# Change column named "close" to "Close"
colnames(data)[colnames(data) == "close"] <- "Close"
# Keep only Date and Close columns
data <- data %>% select(Date, Close)
# Descriptive statistics
summary(data)
## Date Close
## Min. :1986-07-09 Min. : 6.031
## 1st Qu.:1995-12-19 1st Qu.: 12.125
## Median :2005-06-20 Median : 46.515
## Mean :2005-06-21 Mean :110.432
## 3rd Qu.:2014-12-17 3rd Qu.:138.780
## Max. :2024-06-20 Max. :870.750
# Add Periods for later linear regression
data$Period <- seq_len(nrow(data))
# Convert close column data to numeric
data <- data %>%
mutate(Close = as.numeric(Close))
# Make sure the date column is in date format
data$Date <- as.Date(data$Date)
# Filter the data for the last 90 days
data_90_days <- tail(data, 90)
head(data_90_days)
## # A tibble: 6 × 3
## Date Close Period
## <date> <dbl> <int>
## 1 2024-02-12 722. 9475
## 2 2024-02-13 714. 9476
## 3 2024-02-14 722. 9477
## 4 2024-02-15 725. 9478
## 5 2024-02-16 724. 9479
## 6 2024-02-20 726. 9480
# Plot the adjusted closing prices over time
ggplot(data_90_days, aes(x = Date, y = Close)) +
geom_line(color = "purple") +
labs(title = "Costco Stock Close Price Over Time",
x = "Date", y = "Close Price") +
theme_minimal()
## EXPONENTIAL SMOOTHING | “ETS” (Error, Trend, Seasonal) = “MAN”
# Fit the ETS model to the last 90 days of data
# Fit an ETS model with an additive trend
fit_ets <- ets(data_90_days$Close, model = "MAN")
# Multiplicative Error, Additive Trend, No Seasonality
# Display model summary
summary(fit_ets)
## ETS(M,A,N)
##
## Call:
## ets(y = data_90_days$Close, model = "MAN")
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 1e-04
##
## Initial states:
## l = 720.7236
## b = 1.5661
##
## sigma: 0.0128
##
## AIC AICc BIC
## 819.8073 820.5216 832.3063
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.01179543 9.609848 6.378856 -0.01536474 0.8451029 0.9656555
## ACF1
## Training set 0.03191804
# Forecast for the next 30 days
forecast_horizon <- 30
forecast_ets <- forecast(fit_ets, h = forecast_horizon)
# Print forecast values
print(forecast_ets)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 91 864.0072 849.8487 878.1657 842.3536 885.6608
## 92 865.5734 845.5312 885.6155 834.9216 896.2252
## 93 867.1395 842.5690 891.7101 829.5621 904.7169
## 94 868.7057 840.3061 897.1054 825.2722 912.1392
## 95 870.2719 838.4887 902.0551 821.6636 918.8801
## 96 871.8381 836.9868 906.6894 818.5376 925.1385
## 97 873.4042 835.7232 911.0853 815.7761 931.0324
## 98 874.9704 834.6477 915.2931 813.3022 936.6386
## 99 876.5366 833.7255 919.3477 811.0627 942.0105
## 100 878.1028 832.9312 923.2743 809.0188 947.1867
## 101 879.6689 832.2457 927.0922 807.1413 952.1966
## 102 881.2351 831.6541 930.8161 805.4075 957.0628
## 103 882.8013 831.1446 934.4579 803.7993 961.8033
## 104 884.3675 830.7078 938.0271 802.3021 966.4328
## 105 885.9336 830.3357 941.5316 800.9039 970.9633
## 106 887.4998 830.0218 944.9778 799.5947 975.4049
## 107 889.0660 829.7605 948.3715 798.3660 979.7659
## 108 890.6322 829.5471 951.7172 797.2106 984.0537
## 109 892.1983 829.3775 955.0192 796.1222 988.2745
## 110 893.7645 829.2482 958.2808 795.0953 992.4337
## 111 895.3307 829.1560 961.5054 794.1252 996.5361
## 112 896.8969 829.0982 964.6955 793.2078 1000.5859
## 113 898.4630 829.0725 967.8536 792.3393 1004.5867
## 114 900.0292 829.0765 970.9819 791.5165 1008.5419
## 115 901.5954 829.1085 974.0823 790.7362 1012.4545
## 116 903.1615 829.1665 977.1566 789.9960 1016.3271
## 117 904.7277 829.2492 980.2063 789.2933 1020.1622
## 118 906.2939 829.3550 983.2328 788.6259 1023.9619
## 119 907.8601 829.4826 986.2376 787.9920 1027.7281
## 120 909.4262 829.6308 989.2217 787.3896 1031.4629
# Plot the forecast
plot(forecast_ets)
# Get MAPE besides from summary
# Get fitted values from the ETS model
fitted_values <- fitted(fit_ets)
# Calculate the absolute percentage error for each observation
data_90_days <- data_90_days %>%
mutate(absolute_percentage_error = abs((Close - fitted_values) / Close) * 100)
# Calculate the Mean Absolute Percentage Error (MAPE)
mape_ets <- mean(data_90_days$absolute_percentage_error, na.rm = TRUE) # MAPE: 0.85
# Print the MAPE
print(paste("Mean Absolute Percentage Error (MAPE):", round(mape_ets, 2), "%"))
## [1] "Mean Absolute Percentage Error (MAPE): 0.85 %"
# Extract residuals from the ETS model
residuals_ets <- residuals(fit_ets)
# Check for Normality of Residuals
# Histogram
hist(residuals_ets, main = "Histogram of Residuals", xlab = "Residuals", breaks = 20)
As observed, the residuals are centered around zero, indicating no substantial bias. The shape is slightly skewed to the left, the spread is relatively narrow, with most residuals concentrated near the center. While the residuals are centered and mostly symmetric, the slight skewness on the left suggests a mild deviation from normality. However, the deviation is not severe, so the assumption of normal residuals is still approximately valid.
# Q-Q Plot
qqnorm(residuals_ets)
qqline(residuals_ets, col = "red")
Around the center area of all the circles, we observe that they are lining up closely to the standard red diagonal line, only on the far right end that there are a few circles that stay a little further away from the red line, reflecting that there is some deviations, though, as the majority of the points align well with the red line.
# Add Year and Month Columns
data_90_days <- data_90_days %>%
mutate(Year = lubridate::year(Date), Month = lubridate::month(Date))
# Calculate the average monthly close price
monthly_avg <- data_90_days %>%
group_by(Year, Month) %>%
summarize(Average_Monthly_Close = mean(Close, na.rm = TRUE))
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
# View the Result
head(monthly_avg)
## # A tibble: 5 × 3
## # Groups: Year [1]
## Year Month Average_Monthly_Close
## <dbl> <dbl> <dbl>
## 1 2024 2 732.
## 2 2024 3 740.
## 3 2024 4 718.
## 4 2024 5 783.
## 5 2024 6 847.
# Create a Date column for the first day of each month
monthly_avg <- monthly_avg %>%
mutate(Date = as.Date(paste(Year, Month, "01", sep = "-")))
# Create a continuous Date column
monthly_avg <- monthly_avg %>%
mutate(Date = as.Date(paste(Year, Month, "01", sep = "-"))) %>%
arrange(Date)
# Fill in missing months with NA (if gaps exist)
monthly_avg <- monthly_avg %>%
complete(Date = seq(min(Date), max(Date), by = "month"))
# Fill missing Average_Monthly_Close values with interpolation
monthly_avg <- monthly_avg %>%
arrange(Date) %>%
mutate(Average_Monthly_Close = zoo::na.approx(Average_Monthly_Close, na.rm = FALSE))
# Ensure no NA values disrupt SMA calculation
library(zoo)
# Fill missing values in Average_Monthly_Close (linear interpolation)
monthly_avg <- monthly_avg %>%
arrange(Date) %>%
mutate(Average_Monthly_Close = na.approx(Average_Monthly_Close, na.rm = FALSE))
# Calculate the 3-month moving average (ignoring edge cases with fewer points)
monthly_avg <- monthly_avg %>%
mutate(Moving_Avg_3 = ifelse(row_number() >= 3,
SMA(Average_Monthly_Close, n = 3),
NA))
# Check result
head(monthly_avg)
## # A tibble: 5 × 5
## # Groups: Year [1]
## Year Date Month Average_Monthly_Close Moving_Avg_3
## <dbl> <date> <dbl> <dbl> <dbl>
## 1 2024 2024-02-01 2 732. NA
## 2 2024 2024-03-01 3 740. NA
## 3 2024 2024-04-01 4 718. 730.
## 4 2024 2024-05-01 5 783. 747.
## 5 2024 2024-06-01 6 847. 783.
# Calculate the 3 Month Moving Average
monthly_avg <- monthly_avg %>%
arrange(Year, Month) %>%
mutate(Moving_Avg_3 = rollmean(Average_Monthly_Close, k = 3, fill = NA, align = "right"))
# Merge the 3-month moving average back with the daily data
data_90_days <- data_90_days %>%
mutate(Year = year(Date), Month = month(Date)) %>%
left_join(monthly_avg, by = c("Year", "Month"))
# Calculate the squared errors for dates with moving averages
data_90_days <- data_90_days %>%
mutate(
squared_error3 = ifelse(is.na(Moving_Avg_3), NA, (Close - Moving_Avg_3)^2)
)
# Calculate the absolute percentage errors for dates with moving averages
data_90_days <- data_90_days %>%
mutate(
absolute_percentage_error_ma = ifelse(
is.na(Moving_Avg_3) | Close == 0,
NA,
abs((Close - Moving_Avg_3) / Close) * 100
)
)
# Compute MAPE (excluding NA values)
mape_ma <- mean(data_90_days$absolute_percentage_error_ma, na.rm = TRUE)
mape_ma # MAPE (MA): 4.31%
## [1] 4.308358
# Plot the Average Monthly Close Price Over Time
ggplot(monthly_avg, aes(x = as.Date(paste(Year, Month, "01", sep = "-")), y = Average_Monthly_Close)) +
geom_line(color = "blue") +
labs(title = "Average Monthly Close Price Over Time",
x = "Date",
y = "Average Monthly Close PRice") +
theme_minimal()
# Plot with moving average
ggplot(monthly_avg, aes(x = Date)) +
geom_line(aes(y = Average_Monthly_Close, color = "Average Monthly Close")) +
geom_line(aes(y = Moving_Avg_3, color = "3-Month Moving Average")) +
labs(title = "Average Monthly Close Price with 3-Month Moving Average",
x = "Date",
y = "Price") +
scale_color_manual(values = c("Average Monthly Close" = "blue",
"3-Month Moving Average" = "red")) +
theme_minimal()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Load forecasting library
suppressPackageStartupMessages(library(forecast))
library(forecast)
# Convert the Average_Monthly_Close to a time series object
ts_data <- ts(monthly_avg$Average_Monthly_Close, start = c(min(monthly_avg$Year), min(monthly_avg$Month)), frequency = 12)
# Define weights for the WMA (e.g., for a 3-month WMA)
weights <- c(0.5, 0.3, 0.2) # Most recent month gets highest weight
# Calculate WMA for the existing data
monthly_avg <- monthly_avg %>%
mutate(WMA = zoo::rollapply(Average_Monthly_Close,
width = 3,
FUN = function(x) sum(weights * x, na.rm = TRUE),
align = "right",
fill = NA))
# Calculate MAPE for model predictions
actual <- monthly_avg$Average_Monthly_Close
predicted <- monthly_avg$WMA # Use WMA, not Moving_Avg_3
# Remove NA values (from the initial rows of Moving_Avg_3)
valid_data <- !is.na(predicted)
actual <- actual[valid_data]
predicted <- predicted[valid_data]
# Calculate MAPE for WMA
mape_wma <- mean(abs((actual - predicted) / actual)) * 100
cat("MAPE (WMA):", mape_wma, "%", "\n") # MAPE (WMA) : 5.64%
## MAPE (WMA): 5.637632 %
# Forecast for July to December 2024
# Start by adding placeholder rows for the forecast period
future_dates <- seq(as.Date("2024-07-01"), as.Date("2024-12-01"), by = "month")
forecast_data_90_days <- data.frame(Date = future_dates, Average_Monthly_Close = NA)
# Combine current and future data
monthly_avg <- bind_rows(monthly_avg, forecast_data_90_days)
# Generate forecasts using WMA
for (i in (nrow(monthly_avg) - 5):nrow(monthly_avg)) {
# Use the last three months of WMA data for each forecast
recent_data <- tail(monthly_avg$Average_Monthly_Close[1:(i-1)], 3)
# Ensure there are enough data points
if (length(recent_data) == 3) {
monthly_avg$Average_Monthly_Close[i] <- sum(weights * recent_data)
}
}
# Filter only forecasted data for July–December 2024
forecast_results <- monthly_avg %>%
filter(Date >= as.Date("2024-07-01") & Date <= as.Date("2024-12-01"))
# Display the forecasted results
print(forecast_results)
## # A tibble: 6 × 6
## # Groups: Year [1]
## Year Date Month Average_Monthly_Close Moving_Avg_3 WMA
## <dbl> <date> <dbl> <dbl> <dbl> <dbl>
## 1 NA 2024-07-01 NA 764. NA NA
## 2 NA 2024-08-01 NA 799. NA NA
## 3 NA 2024-09-01 NA 812. NA NA
## 4 NA 2024-10-01 NA 784. NA NA
## 5 NA 2024-11-01 NA 800. NA NA
## 6 NA 2024-12-01 NA 801. NA NA
# Plot the data
ggplot(monthly_avg, aes(x = Date)) +
geom_line(aes(y = Average_Monthly_Close, color = "Actual Data"), na.rm = TRUE) +
geom_point(data = forecast_results, aes(y = Average_Monthly_Close, color = "Forecast"), size = 2) +
labs(title = "Forecast for Average Monthly Closing Price (July to December 2024)",
x = "Date",
y = "Closing Price") +
scale_color_manual(values = c("Actual Data" = "blue", "Forecast" = "red")) +
theme_minimal()
## Linear Trend Model
library(readxl)
library(ggplot2)
library(readr)
library(dplyr)
# Load the data
data <- read_csv("COST_stock_data.csv")
## New names:
## Rows: 9564 Columns: 8
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ticker dbl (6): open, high, low, close, adjclose, volume date (1): ...1
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
# Convert the 'date' column to Date type
data$...1 <- as.Date(data$...1, format="%Y-%m-%d")
# Change column named "...1" to "Date"
colnames(data)[colnames(data) == "...1"] <- "Date"
# Descriptive statistics
summary(data)
## Date open high low
## Min. :1986-07-09 Min. : 6.00 Min. : 6.125 Min. : 5.875
## 1st Qu.:1995-12-19 1st Qu.: 12.12 1st Qu.: 12.312 1st Qu.: 11.938
## Median :2005-06-20 Median : 46.56 Median : 47.214 Median : 45.877
## Mean :2005-06-21 Mean :110.36 Mean :111.405 Mean :109.340
## 3rd Qu.:2014-12-17 3rd Qu.:138.81 3rd Qu.:139.740 3rd Qu.:137.915
## Max. :2024-06-20 Max. :871.31 Max. :873.960 Max. :867.700
## close adjclose volume ticker
## Min. : 6.031 Min. : 4.061 Min. : 46400 Length:9564
## 1st Qu.: 12.125 1st Qu.: 8.165 1st Qu.: 1508475 Class :character
## Median : 46.515 Median : 32.097 Median : 2312050 Mode :character
## Mean :110.432 Mean : 97.419 Mean : 2890320
## 3rd Qu.:138.780 3rd Qu.:113.943 3rd Qu.: 3713000
## Max. :870.750 Max. :870.750 Max. :56773500
# Filter the last 90 days of data
df <- tail(data, 90)
# Add Period for later linear regression
df$Period <- seq_len(nrow(df))
# Construct a time series plot
ggplot(df, aes(x = Period, y = close)) +
geom_line() +
geom_point() +
xlab("Period") +
ylab("Jobs") +
ggtitle("Time Series Plot of Costco data")
# Develop a linear trend equation
model <- lm(close ~ Period, data = df)
summary(model)
##
## Call:
## lm(formula = close ~ Period, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -53.490 -24.398 3.335 22.864 63.181
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 698.0537 6.4694 107.90 <2e-16 ***
## Period 1.3531 0.1235 10.96 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.43 on 88 degrees of freedom
## Multiple R-squared: 0.5771, Adjusted R-squared: 0.5723
## F-statistic: 120.1 on 1 and 88 DF, p-value: < 2.2e-16
# Calculate predictions and error metrics
df$predicted_close <- predict(model)
# Calculate residuals
df$residuals <- df$close - df$predicted_close
# Mean Squared Error (MSE)
mse_lt <- mean(df$residuals^2)
cat("Mean Squared Error (MSE):", mse_lt, "\n")
## Mean Squared Error (MSE): 905.5139
# Mean Absolute Percentage Error (MAPE)
df$percentage_error <- abs(df$residuals / df$close) * 100
mape_lt <- mean(df$percentage_error)
cat("Mean Absolute Percentage Error (MAPE):", mape_lt, "%\n")
## Mean Absolute Percentage Error (MAPE): 3.31251 %
## Compare Models
# Collect MAPEs from all models
model_mapes <- data.frame(
Model = c("ETS", "3-Month Moving Average", "Weighted Moving Average", "Linear Trend"),
MAPE = c(mape_ets, mape_ma, mape_wma, mape_lt)
)
# Print the comparison table
cat("MAPE Comparison Across Models:\n")
## MAPE Comparison Across Models:
print(model_mapes)
## Model MAPE
## 1 ETS 0.8451029
## 2 3-Month Moving Average 4.3083585
## 3 Weighted Moving Average 5.6376317
## 4 Linear Trend 3.3125096
# Highlight the best model (lowest MAPE)
best_model <- model_mapes[which.min(model_mapes$MAPE), ]
cat("\nThe model with the lowest MAPE is:\n")
##
## The model with the lowest MAPE is:
print(best_model)
## Model MAPE
## 1 ETS 0.8451029
Using the ETS: MAN statistical analysis method, we set the smoothing parameter alpha to 0.9999 to give more weight to recent observations, making it ideal for forecasting stock prices in a fast-changing environment. Beta, set at 0.0001, reflects confidence in a steady trend with minimal fluctuations, aligning with the observed stability in stock price trends. The Normal Q-Q Test and Histogram of Residuals show that residuals are approximately normally distributed, with minor deviations in the tails and a slight left skew in the histogram. Despite these small deviations, the assumption of normality for residuals is reasonable and supports the model's validity.
To select the best forecasting method for our data, we compared the MAPE (Mean Absolute Percentage Error), a common metric for assessing forecasting accuracy. The MAPE for ETS, Exponential Smoothing, Weighted Moving Average, and Linear Trend is 0.85%, 4.32%, 4.60%, and 3.31% respectively. Among the methods analyzed, ETS (Error, Trend, Seasonality) had the lowest MAPE, indicating its suitability for our analysis.
The small standard deviation (0.0128), MAE (6.3789), and RMSE (9.6098) indicate low variability and minimal differences between actual and forecasted stock prices relative to their higher values.
Using the forecasting ETS(M,A,N) model, we predicted the stock price movement of Costco and found a strong positive trend. In the very near term, Costco could expect to be benefiting froman increase in its stock price. A positive trend indicates investor confidence in the stock or the overall market so it may reflect expectations of the company’s strong future performance or growth potential.
Observing the difference between multiplicative and additive properties, we fitted our model with appropriate metrics for error, trend, and seasonality. The model holds stable, has good error metrics, and therefore gives a reliable short-term (30 day) forecast.
Costco should focus on its growth in markets to make Costco's reputation a good investment
The stability of the stock appeals to individuals seeking a safe investment choice.
Use short-term stock price forecasting during stable periods of the company and update the model for trends to ensure accurate forecasting.