TPASTE<-read.table("/Users/shuaibismail/Documents copy/809/set1/TPASTE.txt", header=FALSE)
attach(TPASTE)
head(TPASTE)
Data Preparation
TPASTE$CRMS = TPASTE$V2
TPASTE$COMS = TPASTE$V3
TPASTE$CRP = TPASTE$V4
TPASTE$COP = TPASTE$V5
TPASTE = TPASTE[6:9]
head(TPASTE)
pairs(TPASTE)
cor(TPASTE)
## CRMS COMS CRP COP
## CRMS 1.0000000 -0.7659797 -0.8241672 -0.6534415
## COMS -0.7659797 1.0000000 0.6308614 0.5020123
## CRP -0.8241672 0.6308614 1.0000000 0.6910539
## COP -0.6534415 0.5020123 0.6910539 1.0000000
CRMS_ts<-ts(data = TPASTE$CRMS)
plot.ts(CRMS_ts, plot.type = "single",main="Time Series PLot of CRMS")
grid()
#Autocorrelation Plot
acf(CRMS_ts, lag.max=50)
It’s evident that the Times Series plot is Non-Stationary or unstable. From the ACL Plot, it can be seen that the observations are serial or auto-correlated. This simply means that the market share of Crest Toothpaste at week 1 of 1958, can tell us something about its market share 50 weeks later.
multilinear_regressor = lm(formula = CRMS~.,
data = TPASTE)
summary(multilinear_regressor)
##
## Call:
## lm(formula = CRMS ~ ., data = TPASTE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.164142 -0.037277 0.000109 0.041295 0.156192
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.68403 0.07084 23.772 < 2e-16 ***
## COMS -0.72982 0.06698 -10.896 < 2e-16 ***
## CRP -0.48467 0.04241 -11.427 < 2e-16 ***
## COP -0.12892 0.04560 -2.827 0.00505 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05904 on 272 degrees of freedom
## Multiple R-squared: 0.7861, Adjusted R-squared: 0.7837
## F-statistic: 333.2 on 3 and 272 DF, p-value: < 2.2e-16
For COMS, the B1 partial regression coefficient is -0.72982, tells us that for a unit change in the market share of CRMS, there’s a 0.72982 decrease in the market share of Colgate Toothpaste.
For CRP, the B1 partial regression coefficient is -0.48467, tells us that for a unit change in the market share of CRMS, there’s a 0.48467 decrease in its unit price.
For COP, the B1 partial regression coefficient is -0.12892, tells us that for a unit change in the market share of CRMS, there’s a 0.12892 decrease in the unit price of Colgate Tootpaste.
TPASTE_residuals<-rstandard(multilinear_regressor)
#Autocorrelation
acf(TPASTE_residuals)
acf(TPASTE_residuals, plot=FALSE)
##
## Autocorrelations of series 'TPASTE_residuals', by lag
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12
## 1.000 0.502 0.456 0.412 0.456 0.416 0.378 0.398 0.366 0.375 0.387 0.353 0.330
## 13 14 15 16 17 18 19 20 21 22 23 24
## 0.349 0.278 0.313 0.336 0.361 0.320 0.292 0.313 0.297 0.271 0.248 0.222
Box.test(TPASTE_residuals, lag=20)
##
## Box-Pierce test
##
## data: TPASTE_residuals
## X-squared = 771, df = 20, p-value < 2.2e-16
From the plot, we can see that the residuals are auto-correlated at lag 20, also the Box’s test shows a p-value less than the statistical significance, which means we can reject the null hypothesis (all auto-correlations are zero).
shapiro.test(TPASTE_residuals)
##
## Shapiro-Wilk normality test
##
## data: TPASTE_residuals
## W = 0.99547, p-value = 0.6016
hist(TPASTE_residuals,breaks=20, col="steelblue")
The Shapiro test returned a p-value greater than the statistical significance, which shows that we cannot reject the null hypothesis (residuals are normally distributed). It’s also evident from the plot that the residuals are normally distributed.
#Constant Variance (White's Test)
library(skedastic)
white_lm(multilinear_regressor)
white_lm(multilinear_regressor, interactions = TRUE)
#Plots for checking residuals visually
plot(TPASTE_residuals)
The result of the White test shows a p-value less than 0.05. Hence we can reject the null hypothesis (No heteroskedasticity, no constant variance). The plot also shows that the residuals are scattered around with no obvious patterns.
library(car)
## Loading required package: carData
vif(multilinear_regressor)
## COMS CRP COP
## 1.684458 2.411638 1.941004
Since the variance inflation factor test shows that non of the predictors have a VIF score greater than 10, then there’s no linear relationship between the co-variates or predictors.
predict(multilinear_regressor, data.frame(COMS=.3, CRP=1.7, COP=2.1), level=.99, interval="prediction")
## fit lwr upr
## 1 0.3704291 0.2119287 0.5289296
The forecast is 0.3704291, lower prediction interval is 0.2119287, and Upper interval is 0.5289296.
CRMS_ts<-ts(data = TPASTE$CRMS)
plot(CRMS_ts, col="darkgrey")
lines(predict(multilinear_regressor), col="blue")
plot(CRMS_ts, col="lightgray")
lines(predict(multilinear_regressor), col="red")
predint = predict(multilinear_regressor,interval="prediction")
## Warning in predict.lm(multilinear_regressor, interval = "prediction"): predictions on current data refer to _future_ responses
confint = predict(multilinear_regressor,interval="confidence")
predlower = predint[,2]
predupper = predint[,3]
conflower = confint[,2]
confupper = confint[,3]
plot(CRMS_ts, col="lightgray", ylim=c(0,1))
lines(predict(multilinear_regressor), col="red")
lines(predlower,col="orange")
lines(predupper,col="orange")
lines(conflower,col="blue")
lines(confupper,col="blue")