Assigned response variable y as the “Marble Tombstone Mean Surface Recession Rate” and covariate x as the “Mean SO2 concentrations over a 100 year period”
TomStone_DataSet <- read.csv("tombstone.csv",h=T)
TomStone_DataSet
## City
## 1 Washington,DC (Rural)
## 2 Cincinnati,OH (Rural)
## 3 Philadelphia,PA (Rura
## 4 Richmond,VA
## 5 Fall River,MA
## 6 Hartford,CT
## 7 Evanston,IL
## 8 Albany,NY
## 9 Washington,DC
## 10 Louisville,KY
## 11 Providence,RI
## 12 Cambridge,MA
## 13 Baltimore,MD
## 14 Newark,NJ
## 15 Boston,MA
## 16 Pittsburgh,PA
## 17 Cincinnati,OH
## 18 Brooklyn,NY
## 19 Philadelphia,PA
## 20 Indianapolis,IN
## 21 Chicago,IL
## Modelled.100.Year.Mean.SO2.Concentration..ug.m..3.
## 1 12
## 2 20
## 3 20
## 4 46
## 5 48
## 6 92
## 7 91
## 8 94
## 9 102
## 10 117
## 11 122
## 12 142
## 13 142
## 14 178
## 15 180
## 16 197
## 17 224
## 18 234
## 19 239
## 20 244
## 21 323
## Marble.Tombstone.Mean.Surface.Recession.Rate..mm.100years.
## 1 0.27
## 2 0.14
## 3 0.33
## 4 0.81
## 5 0.84
## 6 1.08
## 7 1.78
## 8 1.21
## 9 1.09
## 10 1.72
## 11 1.18
## 12 1.01
## 13 1.90
## 14 1.98
## 15 1.53
## 16 2.71
## 17 2.41
## 18 1.61
## 19 2.51
## 20 2.15
## 21 3.16
It’s a positive trend as many plotted points lie on the right end of the graph. After plotting the linear regression line, its being observed that the line is going from left to right showing a positive trend. This graph depicts the increase in S02 concentration in different cities over the period of 100 years.
x <- TomStone_DataSet[,2]
y <- TomStone_DataSet[,3]
plot(x, y, pch=20)
abline(a=0.2, b=0.010,lty=2)
model2 <- lm(y ~ x, data=TomStone_DataSet) # obtain least square estimate
summary(model2)
##
## Call:
## lm(formula = y ~ x, data = TomStone_DataSet)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.72384 -0.19138 0.06136 0.13320 0.69412
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3229959 0.1521958 2.122 0.0472 *
## x 0.0085933 0.0009499 9.046 2.58e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.365 on 19 degrees of freedom
## Multiple R-squared: 0.8116, Adjusted R-squared: 0.8017
## F-statistic: 81.83 on 1 and 19 DF, p-value: 2.579e-08
Beta0 = 0.3229959, Beta1 = 0.0085933 For every unit increase in covariate(x) -> Mean SO2 concentrations over a 100 year period, the response variable(y)-> Marble Tombstone Mean Surface Recession Rate will increase by 0.0085933. But when the Mean SO2 concentrations over a 100 year period is equal to zero, then the Marble Tombstone Mean Surface Recession Rate is 0.3229959.
model2 <- lm(y ~ x, data=TomStone_DataSet) # obtain least square estimate
summary(model2)
##
## Call:
## lm(formula = y ~ x, data = TomStone_DataSet)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.72384 -0.19138 0.06136 0.13320 0.69412
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3229959 0.1521958 2.122 0.0472 *
## x 0.0085933 0.0009499 9.046 2.58e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.365 on 19 degrees of freedom
## Multiple R-squared: 0.8116, Adjusted R-squared: 0.8017
## F-statistic: 81.83 on 1 and 19 DF, p-value: 2.579e-08
plot(x,y,pch=20)
abline(model2)
For the first observation, y1 is 0.27 and x1 is 12. The fitted value of the first observation is 0.4261159, which means that, given x=12, according to our fitted regression linear, the mean of the response variable y is estimated to be 0.27. In other words, the estimated mean response of y at x=12 is 0.4261159. So, the observed value y1=0.27. y1=0.27 is a little below the estimated mean response. The main objective of fitted value is to make sure that the coordinates xi and yi all fall on the estimated line.
fitted_values <- model2$fitted.values
fitted_values
## 1 2 3 4 5 6 7
## 0.4261159 0.4948626 0.4948626 0.7182892 0.7354759 1.1135825 1.1049892
## 8 9 10 11 12 13 14
## 1.1307692 1.1995159 1.3284159 1.3713825 1.5432492 1.5432492 1.8526092
## 15 16 17 18 19 20 21
## 1.8697959 2.0158825 2.2479025 2.3338359 2.3768025 2.4197692 3.0986425
#Sum of Fitted Values
sum(fitted_values)
## [1] 31.42
# y is assigned as a response variable
sum(y)
## [1] 31.42
The sum of fitted values and response variable(y) is the same. And so does the mean of fitted values and mean of response values(y) is the same.
The fitted values is calculated by = Beta0 + xiBeta1. While calculating the residual we subtract every fitted value from each response variable yi (yi – (Beta0 + xiBeta1)). As the summation of residuals comes to zero that’s the reason why the sum of response variable(y) is equal to the sum of fitted values. According to the line equation: Yi = Beta0 + xiBeta1 + e When residual becomes zero the y(response) value becomes equal to the fitted value Beta0 + xiBeta1.
#Sum and mean of fitted values and response variables
sum(y)
## [1] 31.42
sum(fitted_values)
## [1] 31.42
mean(y)
## [1] 1.49619
mean(fitted_values)
## [1] 1.49619
Sum of residuals is very close to zero. In this case I have rounded off the values to the eight position to give a exact zero as a answer.
residuals <- model2$residuals
residuals
## 1 2 3 4 5 6
## -0.15611590 -0.35486256 -0.16486256 0.09171078 0.10452411 -0.03358255
## 7 8 9 10 11 12
## 0.67501079 0.07923079 -0.10951588 0.39158412 -0.19138254 -0.53324921
## 13 14 15 16 17 18
## 0.35675079 0.12739080 -0.33979586 0.69411747 0.16209748 -0.72383585
## 19 20 21
## 0.13319748 -0.26976919 0.06135750
round(sum(residuals),8)
## [1] 0
summary(model2)$coef[,2]
## (Intercept) x
## 0.1521958377 0.0009499341
In both the cases the standard error is small which means there are less variable and more accuracy. The value of Beta0 = 0.3229959, Beta1 = 0.0085933 which is more than double the standard error which means the standard error in this case is satisfactory. So, we conclude that there is no standard error.
For every unit increase in SO2 Concentration(x) the Marble Tombstone Mean Surface Recession Rate(y) will increase by Beta1 which is 0.4261159.
summary(model2)
##
## Call:
## lm(formula = y ~ x, data = TomStone_DataSet)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.72384 -0.19138 0.06136 0.13320 0.69412
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3229959 0.1521958 2.122 0.0472 *
## x 0.0085933 0.0009499 9.046 2.58e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.365 on 19 degrees of freedom
## Multiple R-squared: 0.8116, Adjusted R-squared: 0.8017
## F-statistic: 81.83 on 1 and 19 DF, p-value: 2.579e-08
Yes, Intercept of the linear regression has natural interpretation. But, in this case if the SO2 concentration will be zero then also the marble tombstone mean surface recession rate will have a minimum value of beta1 = 0.3229959. It represents that, when SO2 concentration is 0, then mean succession rate for marble is 0.322mm/100 years.
Chicago has the highest Surface Recession Rate.
Brooklyn, NY has the largest residuals. The largest absolute residual value is present at the 18th position in the list of residuals which is a value of abs(-0.72383585.)
abs(model2$residuals)
## 1 2 3 4 5 6
## 0.15611590 0.35486256 0.16486256 0.09171078 0.10452411 0.03358255
## 7 8 9 10 11 12
## 0.67501079 0.07923079 0.10951588 0.39158412 0.19138254 0.53324921
## 13 14 15 16 17 18
## 0.35675079 0.12739080 0.33979586 0.69411747 0.16209748 0.72383585
## 19 20 21
## 0.13319748 0.26976919 0.06135750
abs(model2$residuals[18])
## 18
## 0.7238359
TomStone_DataSet[18,1]
## [1] Brooklyn,NY
## 21 Levels: Albany,NY Baltimore,MD Boston,MA Brooklyn,NY ... Washington,DC (Rural)
Covariate Mean (Mean(x)) -> 136.5238 Response Variable(Mean(y)) - > 1.49619
Note : Please refer to the red dot on the graph for the plot.
Bus_DataSet <- read.csv("bus.csv",h=T)
Bus_DataSet
## Expenses.per.car.mile..pence. Car.miles.per.year..1000s.
## 1 19.76 6235
## 2 17.85 46230
## 3 19.96 7360
## 4 16.80 28715
## 5 18.20 21934
## 6 16.71 1337
## 7 18.81 17881
## 8 20.74 2319
## 9 16.56 18040
## 10 18.55 1147
## 11 17.40 2176
## 12 17.62 13267
## 13 21.24 3581
## 14 18.23 15104
## 15 16.86 47009
## 16 17.45 10139
## 17 17.66 6147
## 18 18.30 23089
## 19 16.58 20550
## 20 17.51 9450
## 21 21.17 1028
## 22 16.92 3848
## 23 16.96 15656
## 24 18.24 7725
## Percent.of.Double.Deckers.in.fleet Percent.of.fleet.on.fuel.oil
## 1 100.00 100.00
## 2 43.67 84.53
## 3 65.51 81.57
## 4 45.16 93.33
## 5 49.20 83.07
## 6 74.84 94.99
## 7 70.66 92.34
## 8 63.93 95.08
## 9 14.45 61.24
## 10 68.58 97.90
## 11 53.33 97.50
## 12 25.16 56.86
## 13 35.76 63.58
## 14 47.72 95.29
## 15 17.21 100.00
## 16 43.15 89.40
## 17 67.73 92.54
## 18 33.27 67.53
## 19 26.61 98.32
## 20 61.35 86.72
## 21 100.00 100.00
## 22 5.35 65.58
## 23 20.53 93.72
## 24 50.59 96.63
## Receipts.per.car.mile..pence.
## 1 25.10
## 2 19.23
## 3 21.42
## 4 18.11
## 5 19.24
## 6 19.31
## 7 20.07
## 8 24.35
## 9 17.60
## 10 20.13
## 11 18.40
## 12 18.96
## 13 25.75
## 14 19.40
## 15 18.64
## 16 19.10
## 17 20.00
## 18 19.31
## 19 20.49
## 20 17.07
## 21 20.61
## 22 15.73
## 23 18.70
## 24 18.99
x <- Bus_DataSet[,2]
y <- Bus_DataSet[,1]
It’s a negative trend as many plotted points lie on the left end of the graph. After plotting the linear regression line, its being observed that the line is going from right to left showing a negative trend. This graph depicts that with increase in the miles the car is been drove the expense on the car will decrease.
model1 <- lm(y ~ x, data=Bus_DataSet) # obtain least square estimate
summary(model1)
##
## Call:
## lm(formula = y ~ x, data = Bus_DataSet)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0123 -0.9417 -0.1894 0.8993 2.6176
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.878e+01 4.075e-01 46.085 <2e-16 ***
## x -4.450e-05 2.188e-05 -2.034 0.0542 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.347 on 22 degrees of freedom
## Multiple R-squared: 0.1583, Adjusted R-squared: 0.12
## F-statistic: 4.136 on 1 and 22 DF, p-value: 0.0542
plot(x,y,pch=20)
abline(model1)
Beta0 = 1.878e+01, Beta1 = -4.450e-05 For every unit increase in covariate(x) -> Car miles per year (1000s), the response variable(y)-> Expenses per car mile (pence) will decrease by -4.450e-05. (negative trend).
#Obtain coefficient estimates Beta0 and Beta1
model1 <- lm(y ~ x, data=Bus_DataSet) # obtain least square estimate
summary(model1)
##
## Call:
## lm(formula = y ~ x, data = Bus_DataSet)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0123 -0.9417 -0.1894 0.8993 2.6176
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.878e+01 4.075e-01 46.085 <2e-16 ***
## x -4.450e-05 2.188e-05 -2.034 0.0542 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.347 on 22 degrees of freedom
## Multiple R-squared: 0.1583, Adjusted R-squared: 0.12
## F-statistic: 4.136 on 1 and 22 DF, p-value: 0.0542
plot(x,y,pch=20)
abline(model1)
For the first observation, y1 is 19.76 and x1 is 6235. The fitted value of the first observation is 18.50435, which means that, given x=6235, according to our fitted regression linear, the mean of the response variable y is estimated to be 19.76. In other words, the estimated mean response of y at x=6235 is 19.76. So the observed value y1=19.76. y1=19.76 is a little above the estimated mean response. The main objective of fitted value is to make sure that the coordinates xi and yi all fall on the estimated line.
fitted_values <- model1$fitted.values
fitted_values
## 1 2 3 4 5 6 7 8
## 18.50435 16.72461 18.45429 17.50401 17.80576 18.72231 17.98611 18.67861
## 9 10 11 12 13 14 15 16
## 17.97904 18.73076 18.68497 18.19143 18.62245 18.10969 16.68994 18.33063
## 17 18 19 20 21 22 23 24
## 18.50827 17.75436 17.86734 18.36129 18.73606 18.61057 18.08512 18.43805
#Sum of Fitted Values
sum(fitted_values)
## [1] 436.08
Image
The sum of responses the 436.08.
# y is assigned as a response variable
sum(y)
## [1] 436.08
The sum of fitted values and response variable(y) is the same. And so does the mean of fitted values and mean of response values(y) is the same.
#Sum and mean of fitted values and response variables
sum(y)
## [1] 436.08
sum(fitted_values)
## [1] 436.08
mean(y)
## [1] 18.17
mean(fitted_values)
## [1] 18.17
The fitted values is calculated by = Beta0 + xiBeta1. While calculating the residual we subtract every fitted value from each response variable yi (yi – (Beta0 + xiBeta1)). As the summation of residuals comes to zero that’s the reason why the sum of response variable(y) is equal to the sum of fitted values. According to the line equation: Yi = Beta0 + xiBeta1 + e When residual becomes zero the y(response) value becomes equal to the fitted value Beta0 + xiBeta1.
Sum of residuals is very close to zero. In this case I have rounded off the values to the eight position to give a exact zero as a answer.
residuals <- model1$residuals
residuals
## 1 2 3 4 5 6
## 1.2556501 1.1253933 1.5057117 -0.7040092 0.3942422 -2.0123067
## 7 8 9 10 11 12
## 0.8238871 2.0613915 -1.4190375 -0.1807615 -1.2849719 -0.5714319
## 13 14 15 16 17 18
## 2.6175494 0.1203130 0.1700581 -0.8806252 -0.8482658 0.5456387
## 19 20 21 22 23 24
## -1.2873447 -0.8512851 2.4339431 -1.6905693 -1.1251235 -0.1980461
round(sum(residuals),8)
## [1] 0
The standard error of Beta0 = 4.075464e^-01 and Beta1 = 2.187948e^-05
summary(model1)$coef[,2]
## (Intercept) x
## 4.075464e-01 2.187948e-05
Beta0 = 1.878e+01, Beta1 = -4.450e-05. This case is very close to the error, but as the limit between the beta values and standard error as not exceeded so there is no standard error.
If the car per miles increases by 1 unit then the expense per car in pence decreases by Beta1 = -4.450e-05.
summary(model1)
##
## Call:
## lm(formula = y ~ x, data = Bus_DataSet)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0123 -0.9417 -0.1894 0.8993 2.6176
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.878e+01 4.075e-01 46.085 <2e-16 ***
## x -4.450e-05 2.188e-05 -2.034 0.0542 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.347 on 22 degrees of freedom
## Multiple R-squared: 0.1583, Adjusted R-squared: 0.12
## F-statistic: 4.136 on 1 and 22 DF, p-value: 0.0542
Intercept of the linear regression have no natural interpretation. Because in this case if the car drove for zero miles then expense of car per pence will have no effect. This means that when car is brand new then at that time there won’t be any expense per mile.
Covariate Mean (Mean(x)) -> 13748.62 Response Variable(Mean(y)) - > 18.17
The plotted point lies on the line created by the fitted points (the estimated line). Therefore, the fitted regression line goes through the point mean(x) that is mean of Car miles per year (1000s) and mean(y) Expenses per car mile (pence).
mean(x)
## [1] 13748.62
mean(y)
## [1] 18.17
Image