TUAN NURSHAFIEKA WAHIDA BINTI TUAN NADIN SD22030 02G LAB REPORT 1
# Set the working directory
setwd("C:\\Users\\shafi\\Downloads")
# Read the data from the Excel file
gmp_data <- read.table("gmp (1).txt", header = TRUE)
# View the first few rows of the data
head(gmp_data)
# Develop the multiple linear regression model
model <- lm(y ~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11, data = gmp_data)
# Summary of the model
summary(model)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 +
## x10 + x11, data = gmp_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.3441 -1.6711 -0.4486 1.4906 5.2508
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.339838 30.355375 0.571 0.5749
## x1 -0.075588 0.056347 -1.341 0.1964
## x2 -0.069163 0.087791 -0.788 0.4411
## x3 0.115117 0.088113 1.306 0.2078
## x4 1.494737 3.101464 0.482 0.6357
## x5 5.843495 3.148438 1.856 0.0799 .
## x6 0.317583 1.288967 0.246 0.8082
## x7 -3.205390 3.109185 -1.031 0.3162
## x8 0.180811 0.130301 1.388 0.1822
## x9 -0.397945 0.323456 -1.230 0.2344
## x10 -0.005115 0.005896 -0.868 0.3971
## x11 0.638483 3.021680 0.211 0.8350
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.227 on 18 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.8355, Adjusted R-squared: 0.7349
## F-statistic: 8.31 on 11 and 18 DF, p-value: 5.231e-05
#PART A (i)
# Normal probability plot of residuals
qqnorm(residuals(model))
qqline(residuals(model))
Part A i) Construct a normal probability plot of the residuals. Does
there seem to be any problems with the normality assumption? - the plot
shows that the residual are approximately normally distributed. the
points fall close to the straight line.
#(ii)
# Shapiro-Wilk normality test
shapiro_test <- shapiro.test(residuals(model))
shapiro_test
##
## Shapiro-Wilk normality test
##
## data: residuals(model)
## W = 0.964, p-value = 0.3904
#(iii)
# Plot of residuals vs predicted response
plot(predict(model), residuals(model))
abline(h = 0, col = "red")
#PART B (i)
# Plot of influential observations by Cook's Distance
cook_distance <- cooks.distance(model)
plot(cook_distance,pch="*",cex=2, main = "Influential Observations by Cook's Distance")
abline(h = 4*mean(cook_distance,na.rm=T), col = "red")
text(x=1:length(cook_distance)+1,y=cook_distance,
labels=ifelse(cook_distance>4*mean(cook_distance,na.rm=T),names(cook_distance),""),col="red")
#(ii)
# Examination of outliers
outliers <- as.numeric(names(cook_distance)[(cook_distance>4*mean(cook_distance,na.rm=TRUE))])
head(gmp_data[outliers,])
#PART C (i)
# Construction of lack of fit test
lack_of_fit_model <- lm(y ~ x1 + x2 + x3 + x8 + x9 + x10, data = gmp_data)
# Summary of the lack of fit model
summary(lack_of_fit_model)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3 + x8 + x9 + x10, data = gmp_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7829 -1.6308 -0.2023 1.7894 6.2575
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 31.891090 19.188065 1.662 0.110
## x1 -0.051858 0.044919 -1.154 0.260
## x2 0.001803 0.056564 0.032 0.975
## x3 0.031761 0.070140 0.453 0.655
## x8 0.129341 0.116709 1.108 0.279
## x9 -0.206554 0.275463 -0.750 0.461
## x10 -0.003947 0.004986 -0.792 0.437
##
## Residual standard error: 3.206 on 23 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.7924, Adjusted R-squared: 0.7383
## F-statistic: 14.64 on 6 and 23 DF, p-value: 7.75e-07
```
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.