library(tidyverse)
library(openintro)
library(reactable)
library(jsonlite)
library(lubridate)
library(tidyverse)
library(fpp2)
library(forecast)
library(lmtest)
Old School Runescape (OSRS) is a popular multiplayer game with an active (relatively) free market. Its only form of microtransactions is the OSRS bond which players can buy from Jagex, the makers of OSRS, directly. These bonds are digital goods that can be resold to other players. The price that bonds are sold at are based on current market prices allowing them to move in a way similar to real world stocks. Through this process, a player can indirectly buy in-game money and support the game.
Since real world scenarios can affect the sales and prices of OSRS bonds, I want to see if there is a correlation between the S&P 500 stock market price and the OSRS bond price.
Start by getting the saved S&P 500 daily stock data for the past 5 years. Mutate the date column to year-month-day format to allow it to be used as a Date type instead. We will also filter the dataframe to only contain dates between April 29, 2024 and December 20, 2024 because of the limitations of the other data we are using.
sp500_hist_init <- read_csv('https://raw.githubusercontent.com/Megabuster/Data607/refs/heads/main/data/project5/sp500_historical.csv', show_col_types = FALSE)
sp500_hist_init <- sp500_hist_init %>%
mutate(Date = as_datetime(mdy(Date)))
sp500_hist <- sp500_hist_init %>%
filter(Date > '2024-04-29' & Date < '2024-12-20')
head(sp500_hist)
## # A tibble: 6 × 5
## Date `Close/Last` Open High Low
## <dttm> <dbl> <dbl> <dbl> <dbl>
## 1 2024-12-20 00:00:00 5931. 5842 5982. 5832.
## 2 2024-12-19 00:00:00 5867. 5913. 5936. 5866.
## 3 2024-12-18 00:00:00 5872. 6048. 6071. 5868.
## 4 2024-12-17 00:00:00 6051. 6053. 6058. 6035.
## 5 2024-12-16 00:00:00 6074. 6064. 6085. 6059.
## 6 2024-12-13 00:00:00 6051. 6068. 6079. 6036.
Typically, we can obtain OSRS price data via their API as shown below. Note that we will not be using that copy of the data because the time range will change depending on when the API is called and cannot retrieve more time unless each day is called individually. Collecting those days manually would incur a severe amount of API calls and likely be rate limited.
test_bond_api <- read_csv('https://prices.runescape.wiki/api/v1/osrs/timeseries?timestep=24h&id=4151', show_col_types = FALSE)
Instead, we also saved a copy as a JSON which we will use to compare to the S&P 500 price data. The dates are again being converted into a datetime type because it is actually not the right format.
bond_prices_init <- fromJSON('https://raw.githubusercontent.com/Megabuster/Data607/refs/heads/main/data/project5/osrs_bond1yr.json')$data
bond_prices_init <- bond_prices_init %>% mutate(Date = as_datetime(timestamp))
bond_prices <- bond_prices_init %>% filter(Date > '2024-04-29' & Date < '2024-12-20')
head(bond_prices)
## timestamp avgHighPrice avgLowPrice highPriceVolume lowPriceVolume Date
## 1 1714435200 11360307 11337654 9573 4318 2024-04-30
## 2 1714521600 11499876 11583917 9642 4367 2024-05-01
## 3 1714608000 11711136 11672254 10003 4793 2024-05-02
## 4 1714694400 11432271 11413354 11661 4460 2024-05-03
## 5 1714780800 11360733 11365814 12727 4338 2024-05-04
## 6 1714867200 11447463 11443389 11718 5120 2024-05-05
The stock market does not typically update on weekends or specific federal holidays. This makes it harder to compare the data with OSRS bonds. One way to handle this is to keep only the days that exist in both dataframes. Note that the dates are in the same format as in the S&P 500 data, but in the opposite order. We will fix this later.
## timestamp avgHighPrice avgLowPrice highPriceVolume lowPriceVolume Date
## 1 1714435200 11360307 11337654 9573 4318 2024-04-30
## 2 1714521600 11499876 11583917 9642 4367 2024-05-01
## 3 1714608000 11711136 11672254 10003 4793 2024-05-02
## 4 1714694400 11432271 11413354 11661 4460 2024-05-03
## 5 1714953600 11775733 11767451 9140 4099 2024-05-06
## 6 1715040000 12160809 12055678 9096 4422 2024-05-07
The next operation is to take the average of the high and low prices to get a single price variable for each data set to make it easier to compare. We will need to reverse the S&P 500 prices because the original dates were in opposite orders.
avg_bond_df <- reduced_bond_prices %>%
mutate(Price = (avgHighPrice + avgLowPrice) / 2)
avg_sp500_df <- sp500_hist %>%
mutate(Price = (High + Low) / 2)
prices_df <- data.frame(Date = avg_bond_df$Date, avg_bond_price = avg_bond_df$Price, avg_sp500_price = rev(avg_sp500_df$Price))
head(prices_df)
## Date avg_bond_price avg_sp500_price
## 1 2024-04-30 11348981 5073.070
## 2 2024-05-01 11541897 5054.785
## 3 2024-05-02 11691695 5042.130
## 4 2024-05-03 11422813 5120.170
## 5 2024-05-06 11771592 5161.710
## 6 2024-05-07 12108244 5189.595
Let’s start with a cursory look at our cleaned up data for both prices.
prices_df %>%
ggplot(aes(Date, avg_sp500_price)) +
geom_point() +
geom_smooth() +
labs(title = 'S&P 500 price over time')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
prices_df %>%
ggplot(aes(Date, avg_bond_price)) +
geom_point() +
geom_smooth() +
labs(title = 'OSRS bond price over time')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
The price of OSRS bonds generally have trended up, but had a dip around September. The S&P 500 has much more regular highs and lows. The S&P 500 price has trended up as well, but has oscillated far more frequently.
Let’s plot to see if there is a trend when comparing the stock and bond prices.
prices_df %>%
ggplot(aes(avg_sp500_price, avg_bond_price)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'
This looks like a solid linear regression model candidate. We can perform a Pearson correlation test to make sure.
##
## Pearson's product-moment correlation
##
## data: prices_df$avg_bond_price and prices_df$avg_sp500_price
## t = 10.082, df = 162, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5168303 0.7068893
## sample estimates:
## cor
## 0.6209021
Checking the correlation between the prices of OSRS bonds and the S&P 500, there appears to be a strong correlation between the prices with a p-value of well under 0.05 and a correlation coefficient of 0.6209. We can fit a linear model and check the residuals.
##
## Call:
## lm(formula = avg_bond_price ~ avg_sp500_price, data = prices_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1876042 -468938 -73704 456458 2434778
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 115421.8 1299369.7 0.089 0.929
## avg_sp500_price 2332.6 231.4 10.082 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 782500 on 162 degrees of freedom
## Multiple R-squared: 0.3855, Adjusted R-squared: 0.3817
## F-statistic: 101.6 on 1 and 162 DF, p-value: < 2.2e-16
model %>%
ggplot(aes(x = .fitted, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = 'dashed') +
xlab('Fitted values') +
ylab('Residuals')
The linear model fitted here has a 0.386 r-squared value that implies a moderate positive correlation. The residuals vs fitted values plot seems fairly normal, but there are outlier values that the model is not accounting for in the upper right. The histogram seems fairly normal. There may be a double peak near the middle depending on how binning is performed. Finally, the values remain along a straight line on the QQ-plot for the first standard deviation before tailing off. The linear regression model appears to fit reasonably well.
“It’s not what you look at that matters, it’s what you see.” - Henry David Thoreau
Proving or disproving this correlation is going to be more difficult than fitting a model because there is a high likelihood that a model fits, yet does not tell us the whole story. The best we can do is to gather as much useful evidence we have and interpret what we see.
We will try to fit ARIMA models for both prices. This in itself does not prove or disprove their correlations on its own, but gives us clues about their patterns which can be compared. One advantage is that it allows us to use the entirety of our initial data instead of comparing matching dates.
bond_prices_ts <- ts((bond_prices_init$avgHighPrice + bond_prices_init$avgLowPrice) / 2, frequency = 1)
bond_arima <- auto.arima(bond_prices_ts)
summary(bond_arima)
## Series: bond_prices_ts
## ARIMA(1,1,3)
##
## Coefficients:
## ar1 ma1 ma2 ma3
## 0.5732 -0.3384 -0.2467 -0.1620
## s.e. 0.1452 0.1462 0.0585 0.0629
##
## sigma^2 = 7.451e+10: log likelihood = -5070.87
## AIC=10151.74 AICc=10151.91 BIC=10171.23
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 18058.91 271089 215505.8 0.1075781 1.612673 0.9474072
## ACF1
## Training set -7.197386e-05
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,3)
## Q* = 18.714, df = 6, p-value = 0.004674
##
## Model df: 4. Total lags used: 10
sp500_prices_ts <- ts((sp500_hist_init$High + sp500_hist_init$Low) / 2, frequency = 1)
sp500_arima <- auto.arima(sp500_prices_ts)
summary(sp500_arima)
## Series: sp500_prices_ts
## ARIMA(0,1,1) with drift
##
## Coefficients:
## ma1 drift
## 0.2548 -2.1732
## s.e. 0.0267 1.3489
##
## sigma^2 = 1453: log likelihood = -6348.91
## AIC=12703.83 AICc=12703.84 BIC=12719.23
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0009862523 38.07431 27.84375 -0.005123338 0.6435599 0.9630606
## ACF1
## Training set 0.002678455
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1) with drift
## Q* = 11.457, df = 9, p-value = 0.2457
##
## Model df: 1. Total lags used: 10
The ARIMA model for OSRS bonds was estimated to be ARIMA(1, 1, 3). The ACF residuals show some extreme lags meaning that there are trends that are not represented within this model.
The ARIMA model for S&P 500 data was estimated to be ARIMA(0, 1, 1) with drift. The ACF residuals show some lags that are not captured but the model, but not to the same extreme as the model for OSRS bonds.
What does this all mean? This tells us nothing about the potential
correlation between our two prices. Let’s redo the ARIMA model process,
but with prices_df
which we had been testing the linear
correlation with.
bond_prices_ts <- ts(prices_df$avg_bond_price, frequency = 1)
bond_arima <- auto.arima(bond_prices_ts)
summary(bond_arima)
## Series: bond_prices_ts
## ARIMA(0,1,0)
##
## sigma^2 = 1.011e+11: log likelihood = -2296.43
## AIC=4594.86 AICc=4594.89 BIC=4597.96
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 24521.33 316965 243211.1 0.1551508 1.862395 0.9941853 0.05556599
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,0)
## Q* = 20.696, df = 10, p-value = 0.02331
##
## Model df: 0. Total lags used: 10
sp500_prices_ts <- ts(prices_df$avg_sp500_price, frequency = 1)
sp500_arima <- auto.arima(sp500_prices_ts)
summary(sp500_arima)
## Series: sp500_prices_ts
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## 0.3558
## s.e. 0.0747
##
## sigma^2 = 1242: log likelihood = -811.49
## AIC=1626.98 AICc=1627.06 BIC=1633.17
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 3.820998 35.02726 24.55121 0.06844516 0.4399697 0.9586754
## ACF1
## Training set -0.01345265
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)
## Q* = 9.0348, df = 9, p-value = 0.4341
##
## Model df: 1. Total lags used: 10
We get two different models. OSRS bonds with the weekends and holidays removed were fitted to ARIMA(0, 1, 0) which is a random walk. Each value is sequentially connected to the next in a random walk.
S&P 500 prices were fitted to ARIMA(0, 1, 1) again. Notably, the drift is missing which means this new model, which has better ACF plot lags, is missing the upward trend of stocks over time.
This tells us that the trends and patterns we have been analyzing may be flawed.
This final exercise shows the dangers of the earlier linear regression results. Forcibly testing the correlation between the two prices over time with missing data makes it very difficult to get an accurate analysis. The initial plots for both prices were quite different. While both were increasing over the same time period, we could see that they were operating with different trends. It’s possible that bonds always dip around late summer/early fall when school starts again in many places. We need to make sure we can account for these trends in our analysis. We simply do not have enough information to accept or reject our assumption outright.
prices_df %>%
ggplot(aes(Date, avg_sp500_price)) +
geom_point() +
geom_smooth() +
labs(title = 'S&P 500 price over time')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
prices_df %>%
ggplot(aes(Date, avg_bond_price)) +
geom_point() +
geom_smooth() +
labs(title = 'OSRS bond price over time')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Returning to our original question, I cannot confirm a correlation between OSRS bonds and S&P 500 prices. I believe they do both exhibit trends of increasing prices over time. Further work would require me to obtain more time data for OSRS bonds. An interesting test would be to see if we can forecast future prices for OSRS bonds using past S&P 500 price trends (Granger causality test).
…