Take an existing data set in R and work through with the functions described in the powerpoint deck. All you need are two numeric variables. Use one to guess the other. If you are unsure which two to use in any given data frame, create scatter plots Submit an Rmd script, with description of the tests that you are applying and why are you applying them
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
Calculate the correlation between speed and distance
cor(cars$speed, cars$dist)
## [1] 0.8068949
Make a scatter plot to visulize the relationship between the two variables
scatter.smooth(x=cars$speed, y=cars$dist, main = "Speed vs Distance")
Fit the linear model between the two variables
LModel <-lm(speed ~ dist, data= cars)
print(LModel)
##
## Call:
## lm(formula = speed ~ dist, data = cars)
##
## Coefficients:
## (Intercept) dist
## 8.2839 0.1656
summary(LModel)
##
## Call:
## lm(formula = speed ~ dist, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5293 -2.1550 0.3615 2.4377 6.4179
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.28391 0.87438 9.474 1.44e-12 ***
## dist 0.16557 0.01749 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.156 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
Build the histogram of the residuals
Rsdhist<-residuals(LModel)
summary(Rsdhist)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -7.5293 -2.1550 0.3615 0.0000 2.4377 6.4179
hist(Rsdhist)