TPASTE<-read.table("/Users/shuaibismail/Documents copy/809/set1/TPASTE.txt", header=FALSE)
attach(TPASTE)
head(TPASTE)

Data Preparation

TPASTE$CRMS = TPASTE$V2
TPASTE$COMS = TPASTE$V3
TPASTE$CRP = TPASTE$V4
TPASTE$COP = TPASTE$V5

TPASTE = TPASTE[6:9]
head(TPASTE)
  1. Obtaining Correlation Estimates and Pairwise Scatter Plots
pairs(TPASTE)

cor(TPASTE)
##            CRMS       COMS        CRP        COP
## CRMS  1.0000000 -0.7659797 -0.8241672 -0.6534415
## COMS -0.7659797  1.0000000  0.6308614  0.5020123
## CRP  -0.8241672  0.6308614  1.0000000  0.6910539
## COP  -0.6534415  0.5020123  0.6910539  1.0000000
  1. Obtaining the Time Series and Autocorrelation plot of CRMS
CRMS_ts<-ts(data = TPASTE$CRMS)

plot.ts(CRMS_ts, plot.type = "single",main="Time Series PLot of CRMS")
grid()

#Autocorrelation Plot
acf(CRMS_ts, lag.max=50)

It’s evident that the Times Series plot is Non-Stationary or unstable. From the ACL Plot, it can be seen that the observations are serial or auto-correlated. This simply means that the market share of Crest Toothpaste at week 1 of 1958, can tell us something about its market share 50 weeks later.

  1. Regression Model
multilinear_regressor = lm(formula = CRMS~.,
                           data = TPASTE)

summary(multilinear_regressor)
## 
## Call:
## lm(formula = CRMS ~ ., data = TPASTE)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.164142 -0.037277  0.000109  0.041295  0.156192 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.68403    0.07084  23.772  < 2e-16 ***
## COMS        -0.72982    0.06698 -10.896  < 2e-16 ***
## CRP         -0.48467    0.04241 -11.427  < 2e-16 ***
## COP         -0.12892    0.04560  -2.827  0.00505 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05904 on 272 degrees of freedom
## Multiple R-squared:  0.7861, Adjusted R-squared:  0.7837 
## F-statistic: 333.2 on 3 and 272 DF,  p-value: < 2.2e-16
  1. Form the regression analysis, it can be seen that all predictor variables are correlated with CRMS and with the p-value less than the statistical significance level, we can conclude that there’s a linear dependency between these variables and CRMS. Hence, this can be useful for prediction.

For COMS, the B1 partial regression coefficient is -0.72982, tells us that for a unit change in the market share of CRMS, there’s a 0.72982 decrease in the market share of Colgate Toothpaste.

For CRP, the B1 partial regression coefficient is -0.48467, tells us that for a unit change in the market share of CRMS, there’s a 0.48467 decrease in its unit price.

For COP, the B1 partial regression coefficient is -0.12892, tells us that for a unit change in the market share of CRMS, there’s a 0.12892 decrease in the unit price of Colgate Tootpaste.

  1. Residual Analysis
TPASTE_residuals<-rstandard(multilinear_regressor)

#Autocorrelation 
acf(TPASTE_residuals)

acf(TPASTE_residuals, plot=FALSE)
## 
## Autocorrelations of series 'TPASTE_residuals', by lag
## 
##     0     1     2     3     4     5     6     7     8     9    10    11    12 
## 1.000 0.502 0.456 0.412 0.456 0.416 0.378 0.398 0.366 0.375 0.387 0.353 0.330 
##    13    14    15    16    17    18    19    20    21    22    23    24 
## 0.349 0.278 0.313 0.336 0.361 0.320 0.292 0.313 0.297 0.271 0.248 0.222
Box.test(TPASTE_residuals, lag=20)
## 
##  Box-Pierce test
## 
## data:  TPASTE_residuals
## X-squared = 771, df = 20, p-value < 2.2e-16

From the plot, we can see that the residuals are auto-correlated at lag 20, also the Box’s test shows a p-value less than the statistical significance, which means we can reject the null hypothesis (all auto-correlations are zero).

  1. Residual Test for Normality
shapiro.test(TPASTE_residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  TPASTE_residuals
## W = 0.99547, p-value = 0.6016
hist(TPASTE_residuals,breaks=20, col="steelblue")

The Shapiro test returned a p-value greater than the statistical significance, which shows that we cannot reject the null hypothesis (residuals are normally distributed). It’s also evident from the plot that the residuals are normally distributed.

  1. Residual Test for Constant Variance
#Constant Variance (White's Test)
library(skedastic)
white_lm(multilinear_regressor)
white_lm(multilinear_regressor, interactions = TRUE)
#Plots for checking residuals visually 
plot(TPASTE_residuals)

The result of the White test shows a p-value less than 0.05. Hence we can reject the null hypothesis (No heteroskedasticity, no constant variance). The plot also shows that the residuals are scattered around with no obvious patterns.

  1. Test for multi-collinearity
library(car)
## Loading required package: carData
vif(multilinear_regressor)
##     COMS      CRP      COP 
## 1.684458 2.411638 1.941004

Since the variance inflation factor test shows that non of the predictors have a VIF score greater than 10, then there’s no linear relationship between the co-variates or predictors.

predict(multilinear_regressor, data.frame(COMS=.3, CRP=1.7, COP=2.1), level=.99, interval="prediction")
##         fit       lwr       upr
## 1 0.3704291 0.2119287 0.5289296

The forecast is 0.3704291, lower prediction interval is 0.2119287, and Upper interval is 0.5289296.

CRMS_ts<-ts(data = TPASTE$CRMS)
plot(CRMS_ts, col="darkgrey")
lines(predict(multilinear_regressor), col="blue")

plot(CRMS_ts, col="lightgray")
lines(predict(multilinear_regressor), col="red")

predint =  predict(multilinear_regressor,interval="prediction")
## Warning in predict.lm(multilinear_regressor, interval = "prediction"): predictions on current data refer to _future_ responses
confint =  predict(multilinear_regressor,interval="confidence")
predlower = predint[,2]
predupper = predint[,3]
conflower = confint[,2]
confupper = confint[,3]
plot(CRMS_ts, col="lightgray", ylim=c(0,1))
lines(predict(multilinear_regressor), col="red")
lines(predlower,col="orange")
lines(predupper,col="orange")
lines(conflower,col="blue")
lines(confupper,col="blue")