The data corresponding to this assignment is given in the text file named TPASE.txt. The data consists of weekly time-series (1958 through 1963, 276 weeks) with the following order of variables: Time index (1,… 276), Market share of CREST toothpaste (weekly), Market share of COLGATE toothpaste (weekly), Unit Price of CREST toothpaste (weekly), Unit Price of COLGATE toothpaste (weekly).
We are interested in modeling the variable “Market share of CREST” as the dependent variable. Assuming the following abbreviations: CRMS= Market share of CREST toothpaste (weekly), COMS= Market share of COLGATE toothpaste (weekly), CRP= Unit Price of CREST toothpaste (weekly), COP=Unit Price of COLGATE toothpaste (weekly). For statistical inference purposes you can use an 𝜶 (significance) level of 0.05. For each case, please clearly state your hypotheses, rejection criteria, and conclusion when needed.
data<-read.table("/Users/nishanaugain/Desktop/UNH/SEM2/TERM3/DS809/A1/TPASTE.txt")
attach(data)
head(data)
## V1 V2 V3 V4 V5
## 1 1 0.108 0.424 2.07 2.05
## 2 2 0.166 0.482 2.17 2.10
## 3 3 0.126 0.428 2.17 2.04
## 4 4 0.115 0.397 2.07 1.96
## 5 5 0.119 0.352 2.12 2.06
## 6 6 0.176 0.342 2.05 2.01
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data <- rename(data,'TimeIndex' = 'V1','CRMS' = 'V2','COMS' = 'V3','CRP' = 'V4','COP' = 'V5')
head(data)
## TimeIndex CRMS COMS CRP COP
## 1 1 0.108 0.424 2.07 2.05
## 2 2 0.166 0.482 2.17 2.10
## 3 3 0.126 0.428 2.17 2.04
## 4 4 0.115 0.397 2.07 1.96
## 5 5 0.119 0.352 2.12 2.06
## 6 6 0.176 0.342 2.05 2.01
solution
x <- cbind(CRMS=data$CRMS, COMS=data$COMS, CRP=data$CRP,COP=data$COP)
cor(x)
## CRMS COMS CRP COP
## CRMS 1.0000000 -0.7659797 -0.8241672 -0.6534415
## COMS -0.7659797 1.0000000 0.6308614 0.5020123
## CRP -0.8241672 0.6308614 1.0000000 0.6910539
## COP -0.6534415 0.5020123 0.6910539 1.0000000
cor.test(data$CRMS,data$COMS)
##
## Pearson's product-moment correlation
##
## data: data$CRMS and data$COMS
## t = -19.723, df = 274, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.8107278 -0.7123330
## sample estimates:
## cor
## -0.7659797
cor.test(data$CRMS,data$CRP)
##
## Pearson's product-moment correlation
##
## data: data$CRMS and data$CRP
## t = -24.088, df = 274, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.8586793 -0.7822143
## sample estimates:
## cor
## -0.8241672
cor.test(data$CRMS,data$COP)
##
## Pearson's product-moment correlation
##
## data: data$CRMS and data$COP
## t = -14.289, df = 274, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.7162509 -0.5801301
## sample estimates:
## cor
## -0.6534415
cor.test(data$COMS,data$CRP)
##
## Pearson's product-moment correlation
##
## data: data$COMS and data$CRP
## t = 13.459, df = 274, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5540617 0.6970133
## sample estimates:
## cor
## 0.6308614
cor.test(data$COMS,data$COP)
##
## Pearson's product-moment correlation
##
## data: data$COMS and data$COP
## t = 9.6082, df = 274, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4081341 0.5853844
## sample estimates:
## cor
## 0.5020123
cor.test(data$CRP,data$COP)
##
## Pearson's product-moment correlation
##
## data: data$CRP and data$COP
## t = 15.826, df = 274, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.6238892 0.7480852
## sample estimates:
## cor
## 0.6910539
The p-values for all the pairs are very small (p-value < 2.2e-16 i.e < 0.05), hence we reject the null hypothesis.This implies that the relationships are significant at 𝛼 level of 5%.
pairs(~ CRMS + COMS + CRP + COP,data=data,lower.panel = NULL)
#pairs(~ CRMS + COMS + CRP + COP,data=data,lower.panel = NULL,
# col = c("red", "cornflowerblue", "purple","green"), # Change color by group
# pch = c(8, 18, 1,11), # Change points by group
# main = "Scatter Plots")
****We do suspect multicolinearity because CRMS appears to be negatively correlated with the other three variables the same way****
solution
plot.ts(data$CRMS, col="gray")
grid()
acf(data$CRMS,lag=30)
Box.test(data$CRMS, lag=30)
##
## Box-Pierce test
##
## data: data$CRMS
## X-squared = 4656.7, df = 30, p-value < 2.2e-16
The p-value is very small( p-value < 2.2e-16), hence we reject the null hypothesis.This proves that CRMS is autocorrelated Also the ACF seems to decay slowly, Therefore CRMS may be non-stationary. If CRMS were stationary, all lags other than t=0 would be within the dashed blue lines. Since none of them are within that zone, we can conclude the data series is not stationary
solution
mlrfit<-lm(CRMS~COMS+CRP+COP,data)
summary(mlrfit)
##
## Call:
## lm(formula = CRMS ~ COMS + CRP + COP, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.164142 -0.037277 0.000109 0.041295 0.156192
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.68403 0.07084 23.772 < 2e-16 ***
## COMS -0.72982 0.06698 -10.896 < 2e-16 ***
## CRP -0.48467 0.04241 -11.427 < 2e-16 ***
## COP -0.12892 0.04560 -2.827 0.00505 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05904 on 272 degrees of freedom
## Multiple R-squared: 0.7861, Adjusted R-squared: 0.7837
## F-statistic: 333.2 on 3 and 272 DF, p-value: < 2.2e-16
Regression Equation:
\[CRMS_t = 1.684 - 0.72*COMS_t - 0.48*CRP_t - 0.128*COP_t + \varepsilon_t\]
solution
Significance of coefficients
Since p-value is very small (p-value < 2e-16,less than 𝛼) for COMS, we reject null hypothesis.There is evidence that COMS is a good predictor of IRC given the effects of CRP and COP
Since p-value is very small (p-value < 2e-16,less than 𝛼) for CRP, we reject null hypothesis.There is evidence that CRP is a good predictor of IRC given the effects of COMS and COP
Since p-value is small (p-value < 0.00505,less than 𝛼) for COP, we reject null hypothesis.There is evidence that COP is a good predictor of IRC given the effects of COMS and CRP
Coefficient Interpretations:
-> 1.684 (intercept): expected value of CRMS when all other independent variables (COMS,CRP, and COP) are zero.
-> -0.72 (coefficient for COMS): expected decrease in CRMS for every unit increase in COMS given the same levels of (controlling for) CRP and COP.
-> -0.48 (coefficient for CRP): expected decrease in CRMS for every unit increase in CRP given the same levels of (controlling for) COMS and COP
-> -0.128 (coefficient for COP): expected decrease in CRMS for every unit increase in COP given the same levels of (controlling for) COMS and CRP
sres<-rstandard(mlrfit)
acf(sres, lag=30)
acf(sres, plot=FALSE)
##
## Autocorrelations of series 'sres', by lag
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12
## 1.000 0.502 0.456 0.412 0.456 0.416 0.378 0.398 0.366 0.375 0.387 0.353 0.330
## 13 14 15 16 17 18 19 20 21 22 23 24
## 0.349 0.278 0.313 0.336 0.361 0.320 0.292 0.313 0.297 0.271 0.248 0.222
Box.test(sres, lag=20)
##
## Box-Pierce test
##
## data: sres
## X-squared = 771, df = 20, p-value < 2.2e-16
#Plots for checking residuals visually
#plot(mlrfit)
The p-value is very small( p-value < 2.2e-16), hence we reject the null hypothesis.This implies that residuals are autocorrelated
shapiro.test(sres)
##
## Shapiro-Wilk normality test
##
## data: sres
## W = 0.99547, p-value = 0.6016
hist(sres,breaks=15, col="steelblue")
Visually, the histogram looks fairly symmetric. Also the p-value (0.6016) is not less than 𝛼 (0.05), we fail to reject the null.This implies that the residuals are normally distributed
#install.packages("skedastic")
library(skedastic)
white_lm(mlrfit, interactions = TRUE)
## # A tibble: 1 x 5
## statistic p.value parameter method alternative
## <dbl> <dbl> <dbl> <chr> <chr>
## 1 20.9 0.0132 9 White's Test greater
The formal test indicates that the residuals exhibit non-constant variance. Since the p-value (≈ 0) is less than 𝛼 (0.05), we reject the null hypothesis. The residuals are heteroscedastic.
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
vif(mlrfit)
## COMS CRP COP
## 1.684458 2.411638 1.941004
All VIF estimates are below 10. Therefore, there are no multi-collinearity issues in our model.
predict(mlrfit, data.frame(COMS=0.30,CRP=1.7,COP=2.1), interval="prediction",level=.99)
## fit lwr upr
## 1 0.3704291 0.2119287 0.5289296
#Plotting prediction intervals
predint = predict(mlrfit,interval="prediction")
## Warning in predict.lm(mlrfit, interval = "prediction"): predictions on current data refer to _future_ responses
confint = predict(mlrfit,interval="confidence")
predlower = predint[,2]
predupper = predint[,3]
conflower = confint[,2]
confupper = confint[,3]
plot(data$CRMS, col="lightgray")
lines(predict(mlrfit), col="red")
lines(predlower,col="orange")
lines(predupper,col="orange")
#lines(conflower,col="blue")
#lines(confupper,col="blue")