Car Stopping Distance

The cars data set in R lists car speeds (in mph) and stopping distances (in ft).

plot(cars$speed, cars$dist, xlab="Speed (mph)", ylab="Distance (ft)", col='blue')

The correlation between speed and distance is

r = cor(cars$speed, cars$dist)
r
## [1] 0.8068949

Comment on the sign (+/-) and magnitude (close to 1 or close to 0) of the correlation. It would be positive (+) — because as the speed increases the stopping distance increases This is due to the points going upward from left to right.

Magnitude: Close to 1 — the points follow follow an upward tight linear pattern. If it were 0 the points would be scattered widely around the trend line.

With this the correlation would be strong given the placement of the points to the trend

We can use regression to predict stopping distances for speeds not in the data set.

model = lm(cars$dist ~ cars$speed)
plot(cars$speed, cars$dist, xlab="Speed (mph)", ylab="Distance (ft)", col='blue')
abline(model, col='green')

The following output gives information about the regression line.

summary(model)
## 
## Call:
## lm(formula = cars$dist ~ cars$speed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## cars$speed    3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Use the output to write down the equation for the regression line. The following output gives information about the regression line.

add up all the speeds and distances (x,y) Speed I put +them to the power of how many there were 4 + 4 + 7 + 7 + 8 + 9 + 10 + 10 + 10 + 11 + 11 + 12 + 12 + 12 + 12 + 13 + 13 + 13 + 13 + 14 + 14 + 14 + 14 + 15 + 15 + 15 + 16 + 16 + 17 + 17 + 17 + 18 + 18 + 18 + 18 + 19 + 19 + 19 + 20 + 20 + 20 + 20 + 20 + 22 + 23 + 24 + 24 + 24 + 24 + 25 x = 770 x=770/50 x=15.4

x=770/50 x=15.4 Then for distance 2+10+4+22+16+10+18+26+34+17+28+14+20+24+28+26+34+34+46+26+36+60+80+20 26+54+32+40+32+40+50+42+56+76+84+36+46+68+32+48+52+56+64+66+54+70+92+ 93+120+85

y= 2149 y=2149/50 y= 42.98

Next find the slope

As given in the table above Intercept = -17.5791 Slope = 3.9324 Speed= 15.4 Distance =42.98

So the equation would apper as such 42.98= -17.5791 + 3.9323(15.4)

The slope and y-intercept of the regression line are:

b = model$coefficients[1]
b
## (Intercept) 
##   -17.57909
m = model$coefficients[2]
m
## cars$speed 
##   3.932409

The predicted stopping distance for a car going 20 mph is

m * 20 + b
## cars$speed 
##   61.06908

The RMS error for the regression line is

rms = sd(cars$dist) * sqrt(1 - r**2)
rms
## [1] 15.22184

So, if a car is traveling at 20 mph, its stopping distance is 61.0690803 ft give or take about 15.2218432 feet.

As given in the table above Intercept = -17.5791 Slope = 3.9324 Speed= 15.4 Distance =42.98

The euaption would be as follows y=-17.58+3.93 (20) y=-17.58+78.60 y=61.02 ft

So if the car is travling a 20mph is stoping distance would be 61.02ft give or take 15.38 ft

Repeat the calculations above to predict the stopping distance of a car traveling at 25 mph. The euaption would be as follows y=-17.58+3.93 (25) y=-17.58+98.25 y=80.7 ft

So if the car is travling a 20mph is stoping distance would be 80.7ft give or take 15.38 ft

Comment on the accuracy of using the regression line to predict the stopping distance of a car traveling at 75 mph. The orgnail data only covers speed from 5 mph to 25mph 75 mph is outside that range so I would be calculated well outside the data.Also the linear relationship may not hold at higher speeds due to physics changes — braking systems, tire grip and overall wear and tear on the car.

The prediction writen as an equaltion y=-17.58+3.93(75) y=-17.58+294.75 y=277.17mph

But this is unreliable — the car having never tested at that speed according to the data , and the negative intercept also makes low-speed predictions nonsense. So the accuracy viod for 75 mph.

Tree Volume

The trees data set in R lists tree heights (ft), diamters (in) and volumes (cubic ft).

plot(trees$Girth, trees$Volume, xlab="Girth (in)", ylab="Volume (cubic ft)", col='green')

plot(trees$Height, trees$Volume, xlab="Height (ft)", ylab="Volume (cubic ft)", col='blue')

Based on the plots, which variable do you think would be better at predicting tree volume?

Girth would be better at predicting tree volume. The girth vs. volume graph shows a strong positive relationship, with volume increasing consistently as girth increases.

The correlations of volume with the other variables are:

cor(trees$Girth, trees$Volume)
## [1] 0.9671194
cor(trees$Height, trees$Volume)
## [1] 0.5982497

Comment on the sign and magnitude of the two correlations.

Volume vs Girth graph is positive + due to both volume and girth increasing.

As for magnitude is strong due to the points being close together and progressively going upward.

And the correction as stated perversely is +0.96711924

Volume vs height graph is once again positive due to taller tress tending to have higher volume

The magnitude is in the middle due to the points not being a bit scattered but still having a visual trend of the points increasing.

And the correlation coefficient should be +0.5 to 0.6

Fill in the blanks to find the regression line for predicting volume from one of the other variables. Also remove the # symbols.

To predict the volume, build a regression line for volume apart from the other variables. I also use girth as it proved to provide more accurate information, thus making it a better predictor. I entered the following in R.

model = lm(trees\(Girth ~ trees\)Volume) plot(trees\(Girth, trees\)Volume, xlab=“Girth (inches)”, ylab=“Volume (cubic ft)”, col=‘blue’) abline(model, col=‘green’)

#model = lm(trees$Girth ~ trees$Volume)
#plot(trees$Girth, trees$Volume, xlab="Girth (inches)", ylab="Volume (cubic ft)"", col='blue')
#abline(model, col='green')

Which gave me a new graph with a regression line

The following output gives information about the regression line.

#summary(model)

Use the output to write down the equation for the regression line.

So first I imputed a few commands into R such as

To load the data data(trees)

Then to make it fit the model I entered model = lm(trees\(Volume ~ trees\)Girth)

Then to view what I did I entered summary(model)

This gave me a lot of information as follows

Call: lm(formula = trees\(Volume ~ trees\)Girth)

Residuals: Min 1Q Median 3Q Max -8.065 -3.107 0.152 3.495 9.587

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) -36.9435 3.3651 -10.98 7.62e-12 trees$Girth 5.0659 0.2474 20.48 < 2e-16 — Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.252 on 29 degrees of freedom Multiple R-squared: 0.9353, Adjusted R-squared: 0.9331 F-statistic: 419.4 on 1 and 29 DF, p-value: < 2.2e-16

But what I need for the sake of this question the numbers that I needed where.

From the Coefficients table:

Intercept = −36.9435 Slope (Girth) = 5.0659

So the equation would look as follows

Volume = −36.94 + 5.07 × Girth