Exercise

  1. Take an existing data set in R and work through with the functions described in the powerpoint deck.
  2. All you need are two numeric variables. Use one to guess the other.
  3. If you are unsure which two to use in any given data frame, create scatter plots
  4. Submit an Rmd script, with description of the tests that you are applying and why are you applying them
summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Ploting regression lines of varibales in cars data set

plot(cars, xlab = "Speed (mph)", ylab = "Stopping distance (ft)",
     las = 1)
lines(lowess(cars$speed, cars$dist, f = 2/3, iter = 3), col = "red")
title(main = "Stopping distances in function of Speed - Cars Dataset")

Scalling data using log function

plot(cars, xlab = "Speed (mph)", ylab = "Stopping distance (ft)",
     las = 1, log = "xy")
title(main = "cars data (logarithmic scales)")
lines(lowess(cars$speed, cars$dist, f = 2/3, iter = 3), col = "red")

summary(fm1 <- lm(log(dist) ~ log(speed), data = cars))
## 
## Call:
## lm(formula = log(dist) ~ log(speed), data = cars)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.00215 -0.24578 -0.02898  0.20717  0.88289 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.7297     0.3758  -1.941   0.0581 .  
## log(speed)    1.6024     0.1395  11.484 2.26e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4053 on 48 degrees of freedom
## Multiple R-squared:  0.7331, Adjusted R-squared:  0.7276 
## F-statistic: 131.9 on 1 and 48 DF,  p-value: 2.259e-15

Polynomial regression

## An example of polynomial regression
plot(cars, xlab = "Speed (mph)", ylab = "Stopping distance (ft)",
    las = 1, xlim = c(0, 25))
d <- seq(0, 25, length.out = 200)
for(degree in 1:4) {
  fm <- lm(dist ~ poly(speed, degree), data = cars)
  assign(paste("cars", degree, sep = "."), fm)
  lines(d, predict(fm, data.frame(speed = d)), col = degree)
}