Exercises

These exercises accompany the Quality Assurance and Common Pitfalls tutorial.

  1. Using the chicago_air dataset in the region5air library, replace the NAs in the ozone column with a linearly interpolated value. Create a time-series plot of the new data.
  2. Using the ‘so2_data.csv’ file on your thumb drive, import the data and replace the missing data code (-999) with an interpolated value (the most recent non-NA prior to it).
  3. Create a boxplot of the so2 data and access any outlier data. Print the outlier data to the console.
  4. Type in data(state) into the command line. Type state.division into the command line to load the dataset. Using the state.division dataset, create abbreviations for each of the state divisions. Save it to a variable called divison_abb
  5. What is wrong with the following code?
  x <- sum(rep(seq(1,5,1),25)


Solutions


Solution 1

library(region5air)
data(chicago_air)

library(zoo)
chicago_air$ozone <- na.approx(chicago_air$ozone) 
plot(as.Date(chicago_air$date), chicago_air$ozone, type="l", xlab='Date', ylab='Ozone (ppm)')

Solution 2

so2 <- read.csv("E:/RIntro/datasets/so2_data.csv", na.strings = '-999')

#OR

so2 <- read.csv("E:/RIntro/datasets/so2_data.csv")
so2[which(so2$SO2 == -999),] = NA

so2_replace <- na.locf(so2$SO2)  

Solution 3

box.stat <- boxplot(so2_replace, na.rm=T)

box.stat$out
## [1] 0.009 0.013 0.010 0.012 0.011 0.009 0.009

Solution 4

data(state)
state.division
##  [1] East South Central Pacific            Mountain          
##  [4] West South Central Pacific            Mountain          
##  [7] New England        South Atlantic     South Atlantic    
## [10] South Atlantic     Pacific            Mountain          
## [13] East North Central East North Central West North Central
## [16] West North Central East South Central West South Central
## [19] New England        South Atlantic     New England       
## [22] East North Central West North Central East South Central
## [25] West North Central Mountain           West North Central
## [28] Mountain           New England        Middle Atlantic   
## [31] Mountain           Middle Atlantic    South Atlantic    
## [34] West North Central East North Central West South Central
## [37] Pacific            Middle Atlantic    New England       
## [40] South Atlantic     West North Central East South Central
## [43] West South Central Mountain           New England       
## [46] South Atlantic     Pacific            South Atlantic    
## [49] East North Central Mountain          
## 9 Levels: New England Middle Atlantic ... Pacific
division_abb <- abbreviate(state.division)
head(division_abb)
## East South Central            Pacific           Mountain 
##             "EsSC"             "Pcfc"             "Mntn" 
## West South Central            Pacific           Mountain 
##             "WsSC"             "Pcfc"             "Mntn"

Solution 5

#There is a closing parenthesis missing.  The expression should look like this...
  x <- sum(rep(seq(1,5,1),25))
  x
## [1] 375