Answer:
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.5.3
## -- Attaching packages ---------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0 v purrr 0.2.5
## v tibble 2.0.1 v dplyr 0.7.8
## v tidyr 0.8.2 v stringr 1.3.1
## v readr 1.3.1 v forcats 0.3.0
## Warning: package 'ggplot2' was built under R version 3.5.3
## -- Conflicts ------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
glimpse(cars)
## Observations: 50
## Variables: 2
## $ speed <dbl> 4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11, 12, 12, 12, 12, 13...
## $ dist <dbl> 2, 10, 4, 22, 16, 10, 18, 26, 34, 17, 28, 14, 20, 24, 28...
There are 50 observations and 2 variables . Speed and dist are double.
Summary Statistics
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
There are no outlier in the data.
cor(cars)
## speed dist
## speed 1.0000000 0.8068949
## dist 0.8068949 1.0000000
plot(cars$speed,cars$dist,type='p',main="Speed VS Distance")
The correlation between sped distance is strong and positive.
We fitted a linear model of distance as a function f speed . Speed is the dependent and dist is the independent variable.
fit <- lm(dist ~ speed, data = cars)
summary(fit)
##
## Call:
## lm(formula = dist ~ speed, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
Fitted regression model: \(distance_{i} = -17.5791 + 3.9324 * speed_{i}\)
For 1 unit increase in speed distance traveled increases by 3.9324 units. ### Model Summary: 1. The model explains 65.11% variability in distance due to speed. 2. Speed is significant predictor of distance at 5% level of significant since the p value 1.49e-12 is less than 0.05. 3. The model is a valid model since the F statistics p value 1.49e-12 is less than 0.05 at 5% level of significance.
par(mfrow=c(2,2))
plot(fit)
From the residual vs fitted value plot we can see the there is no pattern in the data hence the data randomness of the residuals and heteroscidatcity is satisfied.
From the normal q-q plot we can see that the residuals are Approximately normally distribute.
From the overall analysis we can say that speed is a good predictor of distance and our model is a well fitted model since the assumptions of the linear regression model are satisfied here.