data()
library(datasets)
From the second image, you will see a datum called EuStockMarkets, then extract it by running the code:
View(EuStockMarkets)
DAX = EuStockMarkets[ , 1]
SMI = EuStockMarkets[ , 2]
CAC = EuStockMarkets[ , 3]
FTSE = EuStockMarkets[ , 4]
#Here, we intend to regress DAX on SMI, look at the commands:
reg = lm(DAX ~ SMI, data=EuStockMarkets) summary(reg)
anova(reg)
reg_2 = lm(DAX ~ SMI + CAC + FTSE, data=EuStockMarkets) summary(reg_2)
##Anyway, now….in order to authenticate the validity of our results of parameter estimation….we to perform ANOVA test on this multiple linear regression model too……see the command below:
anova(reg_2)
#We have several assumption tests, but we will only deal with 2 tests and 2 plots:
#First of all, we will find the residuals of the model using the command:
res=resid(reg_2)
print(res)
View(res)
#After that, we can plot a normal Quartile-Quartile Plot as follows:
qqnorm(res, col=3,lwd=1, pch=19, col.main=“blue”, col.lab=“purple”)
#After the qqplot command, we then add a line as follows:
qqline(res,col=2,lwd=2)
#After the line is added, then we show the legend as follows:
legend(x=“topleft”, legend=c(“Line of Scatter Points”,“Line of Best Fit”), col=c(3:2),lwd=2,bg=“brown”)
#Therefore, we have the following plot to detect whether or not the data follows a normal distribiution:
windows()
#After that, let’s use Histogram of the Standardized Residuals to check whether or not the data follows a normal distribution:
res=resid(reg_2)
hist(res, prob=TRUE, col=c(1:8), main=“Histogram of the Standardized Residuals”, col.main=6, col.lab=“purple”, xlab=“Assigned Range of Values of the Residuals”, sub=“Figure XI”)
#Ho: The dataset is NOT normal Vs H1: The dataset is normal
#The command is:
res=resid(reg_2)
shapiro.test(res)
the null hypothesis (Ho) and conclude that the data set is normal.
#Ho: The dataset is NOT normal Vs H1: The dataset is normal
#The command is:
res=resid(reg_2)
ks.test(res, pnorm, mean(res), sd(res))
the null hypothesis (Ho) and conclude that the data set is normal.
#This is another crucial assumption that should be tested and confirm that the data set is homoscedastic.
length(DAX) length(SMI) length(CAC) length(FTSE)
##Since all the four variables are of the same length, it means we are good to go:
y1=gl(4,1860)
y2=c(DAX, SMI, CAC, FTSE)
#Set the hypotheses as follows:
Ho: The data set is NOT homoscedastic Vs H1: The data set is homoscedastic
bartlett.test(y2,y1)
bartlett.test(y2~y1)
#NB: Both produce the same results as follows:
the null hypothesis (Ho) and conclude that the data set is homoscedastic.
##For the same Homoscedastic Assumption, we can still use another statistical test such as:
#First of all, let’s call the library as follows:
library(lmtest)
bptest(reg_2, studentize=FALSE)
the null hypothesis (Ho) and conclude that the data set is homoscedastic.
##For the same Homoscedastic Assumption, we can still use another statistical PLOT such as:
plot(res, col=c(3:9), main=“Residual Plot for Heteroscedasticity Assumption”, col.main=“blue”, sub=“Figure III”, col.sub=“purple”, pch=19, ylab=“Residual Values”, col.lab=2)
##NB: Since it givesa structureless shape, it is an indication that homoscedasticity assumption is NOT violated.
library(car)
durbinWatsonTest(reg_2, max.lag=2)
the null hypothesis (Ho) and conclude that it is significant, thereby making us to belive that there is autocorrelation problem in the data set.