Exercises

These exercises accompany the Regression and Data Transformations tutorial.

  1. Using the chicago_air dataset in the region5air package, convert the data in the ozone columns from ppm to ppb (multiply by 1000). Create a scatterplot of ozone (x-axis) and solar radiation (y-axis) data and label the x and y axes. Give the graph an appropriate title.
  2. Create a linear model of this data called my.mod and print a summary of the model parameters to the console.
  3. Using abline(), and your linear model output (my.mod), add a regression line to the scatterplot from Question 1.
  4. Using ggplot this time, create a scatterplot of ozone and solar radiation and add a regression line and confidence intervals using the stat_smooth functionality.
  5. Do a cubic transform (x3) on the solar radiation data and save it as the variable solar.exp. Create a normal Q-Q plot of the data. Create a kernel density plot of the data. Conduct a Shapiro-Wilk normality test.


Solutions


Solution 1

library(region5air)
data(chicago_air)
chicago_air$ozone <- chicago_air$ozone * 1000  
plot(chicago_air$solar,chicago_air$ozone, xlab='Solar Radiation (W/m2)', ylab='Ozone (ppb)',main="Solar Radiation vs Ozone")

Solution 2

my.mod <- lm(chicago_air$ozone ~ chicago_air$solar)
my.mod
## 
## Call:
## lm(formula = chicago_air$ozone ~ chicago_air$solar)
## 
## Coefficients:
##       (Intercept)  chicago_air$solar  
##             16.74              23.37

Solution 3

plot(chicago_air$solar, chicago_air$ozone, xlab='Solar Radiation (W/m2)', ylab='Ozone (ppb)')
abline(my.mod, col="red")

Solution 4

library(ggplot2)
p <- ggplot(chicago_air, aes(x = solar, y = ozone)) + geom_point()
print(p)

p + stat_smooth(method = "lm", formula = y ~ x, size = 1)

Solution 5

qqnorm(chicago_air$solar)

solar.exp <- chicago_air$solar^3
qqnorm(solar.exp)

d <- density(solar.exp)
plot(d, main="Density plot of solar radiation with cubic transform")
polygon(d, col="red") 

shapiro.test(solar.exp)
## 
##  Shapiro-Wilk normality test
## 
## data:  solar.exp
## W = 0.88323, p-value = 4.946e-16