Take an existing data set in R and work through with the functions described in the powerpoint deck. All you need are two numeric variables. Use one to guess the other. If you are unsure which two to use in any given data frame, create scatter plots Submit an Rmd script, with description of the tests that you are applying and why are you applying them

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Calculate the correlation between speed and distance

cor(cars$speed, cars$dist)
## [1] 0.8068949

Make a scatter plot to visulize the relationship between the two variables

scatter.smooth(x=cars$speed, y=cars$dist, main = "Speed vs Distance")

Fit the linear model between the two variables

LModel <-lm(speed ~ dist, data= cars)
print(LModel)
## 
## Call:
## lm(formula = speed ~ dist, data = cars)
## 
## Coefficients:
## (Intercept)         dist  
##      8.2839       0.1656
summary(LModel)
## 
## Call:
## lm(formula = speed ~ dist, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.5293 -2.1550  0.3615  2.4377  6.4179 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.28391    0.87438   9.474 1.44e-12 ***
## dist         0.16557    0.01749   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.156 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Build the histogram of the residuals

Rsdhist<-residuals(LModel)
summary(Rsdhist)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -7.5293 -2.1550  0.3615  0.0000  2.4377  6.4179
hist(Rsdhist)