Exponential smoothing would be appropriate to use to predict stock prices in the future. I would need the historical price data for the selected stock. Since there are many uncertainties and fluctuations in stock prices, I will expect the value of α to be closer to 0. The actual data will weight less because the randomness of stock prices is large.
#Clear environment
rm(list = ls())
#Import library
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
# Import data
temps <- read.table('temps.txt', stringsAsFactors = FALSE, header = TRUE)
head(temps)
## DAY X1996 X1997 X1998 X1999 X2000 X2001 X2002 X2003 X2004 X2005 X2006 X2007
## 1 1-Jul 98 86 91 84 89 84 90 73 82 91 93 95
## 2 2-Jul 97 90 88 82 91 87 90 81 81 89 93 85
## 3 3-Jul 97 93 91 87 93 87 87 87 86 86 93 82
## 4 4-Jul 90 91 91 88 95 84 89 86 88 86 91 86
## 5 5-Jul 89 84 91 90 96 86 93 80 90 89 90 88
## 6 6-Jul 93 84 89 91 96 87 93 84 90 82 81 87
## X2008 X2009 X2010 X2011 X2012 X2013 X2014 X2015
## 1 85 95 87 92 105 82 90 85
## 2 87 90 84 94 93 85 93 87
## 3 91 89 83 95 99 76 87 79
## 4 90 91 85 92 98 77 84 85
## 5 88 80 88 90 100 83 86 84
## 6 82 87 89 90 98 83 87 84
From the time_series plot, I noticed a seasonal pattern, so I used the holt Winters function to create a seasonal time series plot. I set alpha, beta, and gamma to NULL to find the most optimal values. The most optimal values for alpha, beta, and gamma are 0.66, 0, and 0.62, respectively. The fitted time series shows no trend, so it means the unofficial end of summer has not gotten later over the 20 years.
## Holt-Winters exponential smoothing with trend and multiplicative seasonal component.
##
## Call:
## HoltWinters(x = temps_series, alpha = NULL, beta = NULL, gamma = NULL, seasonal = "multiplicative")
##
## Smoothing parameters:
## alpha: 0.615003
## beta : 0
## gamma: 0.5495256
##
## Coefficients:
## [,1]
## a 73.679517064
## b -0.004362918
## s1 1.239022317
## s2 1.234344062
## s3 1.159509551
## s4 1.175247483
## s5 1.171344196
## s6 1.151038408
## s7 1.139383104
## s8 1.130484528
## s9 1.110487514
## s10 1.076242879
## s11 1.041044609
## s12 1.058139281
## s13 1.032496529
## s14 1.036257448
## s15 1.019348815
## s16 1.026754142
## s17 1.071170378
## s18 1.054819556
## s19 1.084397734
## s20 1.064605879
## s21 1.109827336
## s22 1.112670130
## s23 1.103970506
## s24 1.102771209
## s25 1.091264692
## s26 1.084518342
## s27 1.077914660
## s28 1.077696145
## s29 1.053788854
## s30 1.079454300
## s31 1.053481186
## s32 1.054023885
## s33 1.078221405
## s34 1.070145761
## s35 1.054891375
## s36 1.044587771
## s37 1.023285461
## s38 1.025836722
## s39 1.031075732
## s40 1.031419152
## s41 1.021827552
## s42 0.998177248
## s43 0.996049257
## s44 0.981570825
## s45 0.976510542
## s46 0.967977608
## s47 0.985788411
## s48 1.004748195
## s49 1.050965934
## s50 1.072515008
## s51 1.086532279
## s52 1.098357400
## s53 1.097158461
## s54 1.054827180
## s55 1.022866587
## s56 0.987259326
## s57 1.016923524
## s58 1.016604903
## s59 1.004320951
## s60 1.019102781
## s61 0.983848662
## s62 1.055888360
## s63 1.056122844
## s64 1.043478958
## s65 1.039475693
## s66 0.991019224
## s67 1.001437488
## s68 1.002221759
## s69 1.003949213
## s70 0.999566344
## s71 1.018636837
## s72 1.026490773
## s73 1.042507768
## s74 1.022500795
## s75 1.002503740
## s76 1.004560984
## s77 1.025536556
## s78 1.015357769
## s79 0.992176558
## s80 0.979377825
## s81 0.998058079
## s82 1.002553395
## s83 0.955429116
## s84 0.970970220
## s85 0.975543504
## s86 0.931515830
## s87 0.926764603
## s88 0.958565273
## s89 0.963250387
## s90 0.951644060
## s91 0.937362688
## s92 0.954257999
## s93 0.892485444
## s94 0.879537700
## s95 0.879946892
## s96 0.890633648
## s97 0.917134959
## s98 0.925991769
## s99 0.884247686
## s100 0.846648167
## s101 0.833696369
## s102 0.800001437
## s103 0.807934782
## s104 0.819343668
## s105 0.828571029
## s106 0.795608740
## s107 0.796609993
## s108 0.815503509
## s109 0.830111282
## s110 0.829086181
## s111 0.818367239
## s112 0.863958784
## s113 0.912057203
## s114 0.898308248
## s115 0.878723779
## s116 0.848971946
## s117 0.813891909
## s118 0.846821392
## s119 0.819121827
## s120 0.851036184
## s121 0.820416491
## s122 0.851581233
## s123 0.874038407
## [1] 68904.57
I can use a linear regression model to predict the unemployment rate during the Coronavirus outbreak. Some predictors that I might use are the number of people that are infected by Coronavirus and the number of shops that are closed during the pandemic.
#Clear environment
rm(list = ls())
#import data
uscrime <- read.table('uscrime.txt', stringsAsFactors = FALSE, header = TRUE)
head(uscrime)
## M So Ed Po1 Po2 LF M.F Pop NW U1 U2 Wealth Ineq Prob
## 1 15.1 1 9.1 5.8 5.6 0.510 95.0 33 30.1 0.108 4.1 3940 26.1 0.084602
## 2 14.3 0 11.3 10.3 9.5 0.583 101.2 13 10.2 0.096 3.6 5570 19.4 0.029599
## 3 14.2 1 8.9 4.5 4.4 0.533 96.9 18 21.9 0.094 3.3 3180 25.0 0.083401
## 4 13.6 0 12.1 14.9 14.1 0.577 99.4 157 8.0 0.102 3.9 6730 16.7 0.015801
## 5 14.1 0 12.1 10.9 10.1 0.591 98.5 18 3.0 0.091 2.0 5780 17.4 0.041399
## 6 12.1 0 11.0 11.8 11.5 0.547 96.4 25 4.4 0.084 2.9 6890 12.6 0.034201
## Time Crime
## 1 26.2011 791
## 2 25.2999 1635
## 3 24.3006 578
## 4 29.9012 1969
## 5 21.2998 1234
## 6 20.9995 682
set.seed(1)
#Creating a model all the data in uscrime file
model_0 <- lm(formula = Crime~., data = uscrime)
summary(model_0)
##
## Call:
## lm(formula = Crime ~ ., data = uscrime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -395.74 -98.09 -6.69 112.99 512.67
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.984e+03 1.628e+03 -3.675 0.000893 ***
## M 8.783e+01 4.171e+01 2.106 0.043443 *
## So -3.803e+00 1.488e+02 -0.026 0.979765
## Ed 1.883e+02 6.209e+01 3.033 0.004861 **
## Po1 1.928e+02 1.061e+02 1.817 0.078892 .
## Po2 -1.094e+02 1.175e+02 -0.931 0.358830
## LF -6.638e+02 1.470e+03 -0.452 0.654654
## M.F 1.741e+01 2.035e+01 0.855 0.398995
## Pop -7.330e-01 1.290e+00 -0.568 0.573845
## NW 4.204e+00 6.481e+00 0.649 0.521279
## U1 -5.827e+03 4.210e+03 -1.384 0.176238
## U2 1.678e+02 8.234e+01 2.038 0.050161 .
## Wealth 9.617e-02 1.037e-01 0.928 0.360754
## Ineq 7.067e+01 2.272e+01 3.111 0.003983 **
## Prob -4.855e+03 2.272e+03 -2.137 0.040627 *
## Time -3.479e+00 7.165e+00 -0.486 0.630708
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 209.1 on 31 degrees of freedom
## Multiple R-squared: 0.8031, Adjusted R-squared: 0.7078
## F-statistic: 8.429 on 15 and 31 DF, p-value: 3.539e-07
#Testing the model_0 with the test data
test_df <- data.frame(M = 14.0,
So = 0,
Ed = 10.0,
Po1 = 12.0,
Po2 = 15.5,
LF = 0.640,
M.F = 94.0,
Pop = 150,
NW = 1.1,
U1 = 0.120,
U2 = 3.6,
Wealth = 3200,
Ineq = 20.1,
Prob = 0.04,
Time = 39.0)
predict(model_0, test_df)
## 1
## 155.4349
The predicted value from model_0 does not seem reasonable, so hypothesis testing will be used to create a better model. A hypothesis testing will check whether an independent variable has a relationship with the output. A bigger p-value means the variable does not have a strong relationship with the output. Therefore, variables with smaller than 0.01 will be chosen for the new model.
set.seed(1)
#Created a new model with filterd variables
model_1 <- lm(formula = Crime~M+Ed+Po1+U2+Ineq+Prob, data = uscrime)
summary(model_1)
##
## Call:
## lm(formula = Crime ~ M + Ed + Po1 + U2 + Ineq + Prob, data = uscrime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -470.68 -78.41 -19.68 133.12 556.23
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5040.50 899.84 -5.602 1.72e-06 ***
## M 105.02 33.30 3.154 0.00305 **
## Ed 196.47 44.75 4.390 8.07e-05 ***
## Po1 115.02 13.75 8.363 2.56e-10 ***
## U2 89.37 40.91 2.185 0.03483 *
## Ineq 67.65 13.94 4.855 1.88e-05 ***
## Prob -3801.84 1528.10 -2.488 0.01711 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 200.7 on 40 degrees of freedom
## Multiple R-squared: 0.7659, Adjusted R-squared: 0.7307
## F-statistic: 21.81 on 6 and 40 DF, p-value: 3.418e-11
#Creating a new set of test data
test_df2 <- data.frame(M = 14.0,Ed = 10.0, Po1 = 12.0, U2 = 3.6, Ineq = 20.1,Prob = 0.04)
predict(model_1, test_df2)
## 1
## 1304.245
The second model gives out a more reasonable prediction. One of the methods to check a model’s accuracy is checking the R-squared. A larger R-squared value is more desirable. Even though the adjusted R-squared for the second model is lower than the first model, the predicted value seems more accurate. To validate the model, I will now use cross-validation to train the dataset and to create another model. The new model gave out the same value and has a R-squared value of 0.74.
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
set.seed(1)
model_2<- train(Crime~M+Ed+Po1+U2+Ineq+Prob,
data = uscrime,
method = "lm",
trControl = trainControl(
method="repeatedcv",
number=10,
repeats=5))
summary(model_2)
##
## Call:
## lm(formula = .outcome ~ ., data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -470.68 -78.41 -19.68 133.12 556.23
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5040.50 899.84 -5.602 1.72e-06 ***
## M 105.02 33.30 3.154 0.00305 **
## Ed 196.47 44.75 4.390 8.07e-05 ***
## Po1 115.02 13.75 8.363 2.56e-10 ***
## U2 89.37 40.91 2.185 0.03483 *
## Ineq 67.65 13.94 4.855 1.88e-05 ***
## Prob -3801.84 1528.10 -2.488 0.01711 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 200.7 on 40 degrees of freedom
## Multiple R-squared: 0.7659, Adjusted R-squared: 0.7307
## F-statistic: 21.81 on 6 and 40 DF, p-value: 3.418e-11
predict(model_2, test_df2)
## 1
## 1304.245
model_2
## Linear Regression
##
## 47 samples
## 6 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times)
## Summary of sample sizes: 42, 42, 43, 41, 42, 44, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 205.4417 0.7385824 164.1083
##
## Tuning parameter 'intercept' was held constant at a value of TRUE