Sharma Shivankit M10625565
| Dataset | Covariate(x) | Response(y) | \(\beta\)0(intercept) | \(\beta\)1(slope) | R2 | Tvalue | Fvalue | P>0.05 |
|---|---|---|---|---|---|---|---|---|
| Tombstone | Mean SO2levels | Marble recession rates | 0.322996 | 0.008593 | 0.8116 | 9.046 | 81.835 | No |
| Bus | Car miles per year | Expenses per car mile | 18.78 | -0.0000445 | 0.1582 | -2.03383 | 4.1365 | Yes |
| Bus | Percent of double deckers | Expenses per car mile | 16.5477 | 0.03289 | 0.3246 | 3.2517 | 10.574 | No |
| Bus | Percent of fleet on oil fuel | Expenses per car mile | 17.8763 | 0.003375 | 0.001036 | 0.1510 | 0.0228 | Yes |
| Bus | Receipts per car mile | Expenses per can mile | 8.6584 | 0.479866 | 0.6191 | 5.9807 | 35.769 | No |
tombs <- read.csv("C:\\Users\\sharm_000\\OneDrive\\University\\BANA 7038\\Homework 2\\tombstone.csv",header = T,sep = ",")
names(tombs)
## [1] "City"
## [2] "Modelled.100.Year.Mean.SO2.Concentration..ug.m..3."
## [3] "Marble.Tombstone.Mean.Surface.Recession.Rate..mm.100years."
We rename the columns in the following manner:-
str(tombs)
## 'data.frame': 21 obs. of 3 variables:
## $ City : Factor w/ 21 levels "Albany,NY","Baltimore,MD",..: 21 8 16 19 10 11 9 1 20 13 ...
## $ mso2c : int 12 20 20 46 48 92 91 94 102 117 ...
## $ mtmsrr: num 0.27 0.14 0.33 0.81 0.84 1.08 1.78 1.21 1.09 1.72 ...
#Explore distribution of response/dependent variable mtmsrr
summary(tombs$mtmsrr)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.140 1.010 1.530 1.496 1.980 3.160
Mean < Median. Hence distribution for mtmsrr is left skewed.
library(psych)
pairs.panels(tombs[c("mtmsrr","mso2c")])
Stretched elipse shows strong co-relation. Red line (loess smooth) rising steeply shows that mtmsrr increases with mso2c
mtombs <- lm (mtmsrr ~ mso2c, data = tombs)
#Beta coefficient estimates
mtombs$coefficients
## (Intercept) mso2c
## 0.322995899 0.008593333
summary(mtombs)$r.squared
## [1] 0.8115724
summary(mtombs)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.322995899 0.1521958377 2.122239 4.718525e-02
## mso2c 0.008593333 0.0009499341 9.046242 2.578534e-08
#ANOVA test
anova(mtombs)
## Analysis of Variance Table
##
## Response: mtmsrr
## Df Sum Sq Mean Sq F value Pr(>F)
## mso2c 1 10.9031 10.9031 81.835 2.579e-08 ***
## Residuals 19 2.5314 0.1332
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Confidence Levels For coefficients
confint(mtombs)
## 2.5 % 97.5 %
## (Intercept) 0.004446349 0.64154545
## mso2c 0.006605098 0.01058157
For Fitted Values
fitval <- fitted(mtombs)
fitval
## 1 2 3 4 5 6 7
## 0.4261159 0.4948626 0.4948626 0.7182892 0.7354759 1.1135825 1.1049892
## 8 9 10 11 12 13 14
## 1.1307692 1.1995159 1.3284159 1.3713825 1.5432492 1.5432492 1.8526092
## 15 16 17 18 19 20 21
## 1.8697959 2.0158825 2.2479025 2.3338359 2.3768025 2.4197692 3.0986425
Predicted confidence intervals
ptombs <- predict(mtombs,interval = "confidence")
ptombs[1:5,]
## fit lwr upr
## 1 0.4261159 0.1276356 0.7245962
## 2 0.4948626 0.2094375 0.7802876
## 3 0.4948626 0.2094375 0.7802876
## 4 0.7182892 0.4729586 0.9636199
## 5 0.7354759 0.4930475 0.9779043
For predicted values
predval <- c(30,110,210)
new.so2 <- data.frame(mso2c = predval)
pdtombs <- predict(mtombs, newdata = new.so2, interval="confidence")
pdtombs
## fit lwr upr
## 1 0.5807959 0.3112588 0.850333
## 2 1.2682625 1.0934071 1.443118
## 3 2.1275959 1.9059316 2.349260
#Final Plotting
plot(tombs$mso2c, tombs$mtmsrr,xlab="Mean SO2 Concentration", ylab="Mean Surface Recession Rate", main="Tombstones Regression plot")
abline(mtombs, lty=2)
#Confidence intervals for fitted values
lines(ptombs[,1],ptombs[,2],col="blue")
lines(ptombs[,1],ptombs[,3],col="blue")
#Confidence intervals for predict values
lines(predval,pdtombs[,2],col="red")
lines(predval,pdtombs[,3],col="red")
## [1] "Expenses.per.car.mile..pence."
## [2] "Car.miles.per.year..1000s."
## [3] "Percent.of.Double.Deckers.in.fleet"
## [4] "Percent.of.fleet.on.fuel.oil"
## [5] "Receipts.per.car.mile..pence."
Renaming the columns in the following manner:-
Defining the relationship
Exploring the structure Summary of response variable (Expenses per car mile (pence)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 16.56 16.95 17.76 18.17 18.62 21.24
Mean > Median. Hence distribution for exp is right skewed.
## (Intercept) car
## 1.878180e+01 -4.449914e-05
R2 and its interpretation
## [1] 0.1582641
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.878180e+01 4.075464e-01 46.08506 2.223005e-23
## car -4.449914e-05 2.187948e-05 -2.03383 5.420264e-02
## Analysis of Variance Table
##
## Response: exp
## Df Sum Sq Mean Sq F value Pr(>F)
## car 1 7.506 7.5058 4.1365 0.0542 .
## Residuals 22 39.920 1.8145
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The F value is 4.136 which is close to 0 and the p value is more than 0.05 (0.05420264)
Therefore, we accept the null hypothesis that x does not affect y
Confidence Levels For coefficients
## 2.5 % 97.5 %
## (Intercept) 1.793660e+01 1.962700e+01
## car -8.987441e-05 8.761294e-07
For Fitted Values
## 1 2 3 4 5 6 7 8
## 18.50435 16.72461 18.45429 17.50401 17.80576 18.72231 17.98611 18.67861
## 9 10 11 12 13 14 15 16
## 17.97904 18.73076 18.68497 18.19143 18.62245 18.10969 16.68994 18.33063
## 17 18 19 20 21 22 23 24
## 18.50827 17.75436 17.86734 18.36129 18.73606 18.61057 18.08512 18.43805
Predicted confidence intervals
## fit lwr upr
## 1 18.50435 17.83996 19.16874
## 2 16.72461 15.14429 18.30493
## 3 18.45429 17.81459 19.09398
## 4 17.50401 16.61724 18.39078
## 5 17.80576 17.12523 18.48629
For predicted values
## fit lwr upr
## 1 18.78047 17.93627 19.62466
## 2 18.77691 17.93538 19.61843
## 3 18.77246 17.93427 19.61065
Response variable = Expenses per car mile (pence) (This is y)
Computing the values of \(\beta\)0 and \(\beta\)1
## (Intercept) dd
## 16.54775261 0.03289006
R2 and its interpretation
## [1] 0.3246126
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.54775261 0.55637160 29.742267 2.922432e-19
## dd 0.03289006 0.01011456 3.251753 3.656954e-03
## Analysis of Variance Table
##
## Response: exp
## Df Sum Sq Mean Sq F value Pr(>F)
## dd 1 15.395 15.3949 10.574 0.003657 **
## Residuals 22 32.031 1.4559
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The F value is 10.574 and the p value is less than 0.05 (0.00365)
Therefore, we reject the null hypothesis that x has no effect on y
Confidence Levels For coefficients
## 2.5 % 97.5 %
## (Intercept) 15.39390854 17.70159668
## dd 0.01191374 0.05386638
For Fitted Values
## 1 2 3 4 5 6 7 8
## 19.83676 17.98406 18.70238 18.03307 18.16594 19.00924 18.87176 18.65041
## 9 10 11 12 13 14 15 16
## 17.02301 18.80335 18.30178 17.37527 17.72390 18.11727 17.11379 17.96696
## 17 18 19 20 21 22 23 24
## 18.77540 17.64200 17.42296 18.56556 19.83676 16.72371 17.22299 18.21166
Predicted confidence intervals
## fit lwr upr
## 1 19.83676 18.65739 21.01612
## 2 17.98406 17.45968 18.50844
## 3 18.70238 18.08903 19.31573
## 4 18.03307 17.51486 18.55128
## 5 18.16594 17.65514 18.67675
For predicted values
## fit lwr upr
## 1 17.53445 16.88237 18.18653
## 2 20.16566 18.79421 21.53711
## 3 23.45467 20.04577 26.86356
Response variable = Expenses per car mile (pence) (This is y)
Computing the values of \(\beta\)0 and \(\beta\)1
## (Intercept) oil
## 17.876371507 0.003375493
R2 and its interpretation
## [1] 0.00103655
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.876371507 1.96636930 9.0910550 6.637217e-09
## oil 0.003375493 0.02234115 0.1510886 8.812827e-01
## Analysis of Variance Table
##
## Response: exp
## Df Sum Sq Mean Sq F value Pr(>F)
## oil 1 0.049 0.04916 0.0228 0.8813
## Residuals 22 47.376 2.15347
The F value is 0.0228 which is close to 0 and the p value is more than 0.05 (0.8813)
Therefore, we accept the null hypothesis that x does not affect y
Confidence Levels For coefficients
## 2.5 % 97.5 %
## (Intercept) 13.79837118 21.95437183
## oil -0.04295722 0.04970821
For Fitted Values
## 1 2 3 4 5 6 7 8
## 18.21392 18.16170 18.15171 18.19141 18.15677 18.19701 18.18806 18.19731
## 9 10 11 12 13 14 15 16
## 18.08309 18.20683 18.20548 18.06830 18.09099 18.19802 18.21392 18.17814
## 17 18 19 20 21 22 23 24
## 18.18874 18.10432 18.20825 18.16909 18.21392 18.09774 18.19272 18.20255
Predicted confidence intervals
## fit lwr upr
## 1 18.21392 17.34826 19.07958
## 2 18.16170 17.53012 18.79328
## 3 18.15171 17.48168 18.82174
## 4 18.19141 17.50420 18.87861
## 5 18.15677 17.50957 18.80398
For predicted values
## fit lwr upr
## 1 17.97764 15.26512 20.69015
## 2 18.24768 17.01371 19.48165
## 3 18.58522 12.85200 24.31845
Response variable = Expenses per car mile (pence) (This is y)
Computing the values of \(\beta\)0 and \(\beta\)1
## (Intercept) rec
## 8.658455 0.479866
R2 and its interpretation `
## [1] 0.6191757
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.658455 1.60107683 5.407895 1.973762e-05
## rec 0.479866 0.08023504 5.980754 5.097023e-06
## Analysis of Variance Table
##
## Response: exp
## Df Sum Sq Mean Sq F value Pr(>F)
## rec 1 29.365 29.3648 35.769 5.097e-06 ***
## Residuals 22 18.061 0.8209
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The F value is 35.769 and the p value is less than 0.05 (0.000005097023)
Therefore, we reject the null hypothesis that x does not affect y
Confidence Levels For coefficients
## 2.5 % 97.5 %
## (Intercept) 5.3380250 11.9788852
## rec 0.3134688 0.6462633
For Fitted Values
## 1 2 3 4 5 6 7 8
## 20.70309 17.88628 18.93719 17.34883 17.89108 17.92467 18.28937 20.34319
## 9 10 11 12 13 14 15 16
## 17.10410 18.31816 17.48799 17.75672 21.01501 17.96786 17.60316 17.82390
## 17 18 19 20 21 22 23 24
## 18.25578 17.92467 18.49091 16.84977 18.54849 16.20675 17.63195 17.77111
Predicted confidence intervals
## fit lwr upr
## 1 20.70309 19.74463 21.66156
## 2 17.88628 17.49030 18.28226
## 3 18.93719 18.47040 19.40397
## 4 17.34883 16.87113 17.82653
## 5 17.89108 17.49551 18.28664
For predicted values
## fit lwr upr
## 1 23.05444 21.31783 24.79104
## 2 61.44372 46.43332 76.45412
## 3 109.43033 77.78277 141.07788