Use the built-in dataset state.x77.

  1. What is the class of the object (use the command class to find out)? Convert it into a data frame and call it ‘s77’ (find out the command to convert an object into a data frame).
class(state.x77)
## [1] "matrix"
s77 <- as.data.frame(state.x77)
class(s77)
## [1] "data.frame"
  1. Create a subset of s77 containing the per capita income of the states that have less than 40 days with minimun temperature below freezing point.
s77[s77$Frost < 40, ]
subset(x = s77, subset = Frost < 40) # alternative coding using the 'subset' function.
  1. Create another subset with the states that have less than 40 days with minimum temperature below freezing point and a life expectancy greater than 71 years.
s77[s77$Frost < 40 & s77$`Life Exp` > 71, ] # because of the space in 'Life Exp' we need to use backticks or quotes.
# Alternative coding
subset(s77, subset = Frost < 40 & `Life Exp` > 71) # with the subset command only backticks work!!!
  1. Order the data frame simultaneously by ‘Illiteracy’ (increasing) and ‘Income’ (decreasing).
s77[order(s77$Illiteracy, -s77$Income), ]
  1. Read up on the cut function and add a new ordinal variable to the data frame that divides the ‘Frost’ variable into three categories: ‘low’, ‘intermediate’ and ‘high’ number of frost days.

The cut function converts a numeric variable into a categorical variable, with multiple levels. We can simply provide the number of intervals desired (break = 3) and R will find cut points (interval boundaries) for us.

s77$frost_cat <- cut(x = s77$Frost, breaks = 3, labels = c("low", "intermediate", "high"))
s77

If we want our very own custom intervals, we can simply supply a vector telling R the start and endpoints of the desired intervals.

range(s77$Frost) # 0, 188
## [1]   0 188
s77$frost_cat <- cut(x = s77$Frost, breaks = c(0, 30, 90, 190), labels = c("low", "intermediate", "high")) # here from 0-30 is considered 'low', > 30 - 90 'intermediate' and > 90 - 190 'high'.
s77