Use the built-in dataset state.x77.

  1. What is the class of the object (use the command class to find out)? Convert it into a data frame and call it ‘s77’ (find out the command to convert an object into a data frame).
class(state.x77)
## [1] "matrix"
s77 <- data.frame(state.x77)
class(s77)
## [1] "data.frame"
  1. Create a subset of s77 containing the per capita income of the states that have less than 40 days with minimun temperature below freezing point.
summary(s77)
##    Population        Income       Illiteracy       Life.Exp    
##  Min.   :  365   Min.   :3098   Min.   :0.500   Min.   :67.96  
##  1st Qu.: 1080   1st Qu.:3993   1st Qu.:0.625   1st Qu.:70.12  
##  Median : 2838   Median :4519   Median :0.950   Median :70.67  
##  Mean   : 4246   Mean   :4436   Mean   :1.170   Mean   :70.88  
##  3rd Qu.: 4968   3rd Qu.:4814   3rd Qu.:1.575   3rd Qu.:71.89  
##  Max.   :21198   Max.   :6315   Max.   :2.800   Max.   :73.60  
##      Murder          HS.Grad          Frost             Area       
##  Min.   : 1.400   Min.   :37.80   Min.   :  0.00   Min.   :  1049  
##  1st Qu.: 4.350   1st Qu.:48.05   1st Qu.: 66.25   1st Qu.: 36985  
##  Median : 6.850   Median :53.25   Median :114.50   Median : 54277  
##  Mean   : 7.378   Mean   :53.11   Mean   :104.46   Mean   : 70736  
##  3rd Qu.:10.675   3rd Qu.:59.15   3rd Qu.:139.75   3rd Qu.: 81163  
##  Max.   :15.100   Max.   :67.30   Max.   :188.00   Max.   :566432
s77[s77$Frost < 40, ]      
  1. Create another subset with the states that have less than 40 days with minimum temperature below freezing point and a life expectancy greater than 71 years.
s77[s77$Frost < 40 & s77$Life.Exp > 71, ]
  1. Order the data frame simultaneously by ‘Illiteracy’ (increasing) and ‘Income’ (decreasing). (This should have been covered in today’s extra R tutorial but you can read up on how to achieve this in the R-ticulate booklet that I have made available on Blackboard)
s77[order(s77$Illiteracy, -s77$Income), ]
  1. Read up on the cut function and add a new ordinal variable to the data frame that divides the ‘Frost’ variable into three categories: ‘low’, ‘intermediate’ and ‘high’ number of frost days.
s77$Frost_categories <- cut(x = s77$Frost, breaks = 3, labels = c("low", "intermediate", "high"))
s77
#x is our numeric vector which is to be converted to a factor by cutting
# breaks are either numeric vector or 2 or more unique cut points or a single number greater or equal to 2
#labels for levels of resulting categories created by cutting

The cut function converts a numeric variable into a categorical variable, with multiple levels. We can simply provide the number of intervals desired (breaks = 3) and R will find cut points (interval boundaries) for us.

We can create our own custom intervals, by simply supplying a vector telling R the start and endpoints of the desired intervals

range(s77$Frost)
## [1]   0 188
s77$Frost_cat <- cut(x = s77$Frost, breaks = c(0, 30, 90, 190), labels = c("low", "intermediate", "high"))
s77

In this case here from 0-30 is considered “low”, >30 - 90 “intermediate”, and >90 - 190 “high”