Use the built-in dataset state.x77.
class to find out)? Convert it into a data frame and call it ‘s77’ (find out the command to convert an object into a data frame).class(state.x77)
## [1] "matrix"
s77 <- data.frame(state.x77)
class(s77)
## [1] "data.frame"
s77 containing the per capita income of the states that have less than 40 days with minimun temperature below freezing point.summary(s77)
## Population Income Illiteracy Life.Exp
## Min. : 365 Min. :3098 Min. :0.500 Min. :67.96
## 1st Qu.: 1080 1st Qu.:3993 1st Qu.:0.625 1st Qu.:70.12
## Median : 2838 Median :4519 Median :0.950 Median :70.67
## Mean : 4246 Mean :4436 Mean :1.170 Mean :70.88
## 3rd Qu.: 4968 3rd Qu.:4814 3rd Qu.:1.575 3rd Qu.:71.89
## Max. :21198 Max. :6315 Max. :2.800 Max. :73.60
## Murder HS.Grad Frost Area
## Min. : 1.400 Min. :37.80 Min. : 0.00 Min. : 1049
## 1st Qu.: 4.350 1st Qu.:48.05 1st Qu.: 66.25 1st Qu.: 36985
## Median : 6.850 Median :53.25 Median :114.50 Median : 54277
## Mean : 7.378 Mean :53.11 Mean :104.46 Mean : 70736
## 3rd Qu.:10.675 3rd Qu.:59.15 3rd Qu.:139.75 3rd Qu.: 81163
## Max. :15.100 Max. :67.30 Max. :188.00 Max. :566432
s77[s77$Frost < 40, ]
s77[s77$Frost < 40 & s77$Life.Exp > 71, ]
s77[order(s77$Illiteracy, -s77$Income), ]
cut function and add a new ordinal variable to the data frame that divides the ‘Frost’ variable into three categories: ‘low’, ‘intermediate’ and ‘high’ number of frost days.s77$Frost_categories <- cut(x = s77$Frost, breaks = 3, labels = c("low", "intermediate", "high"))
s77
#x is our numeric vector which is to be converted to a factor by cutting
# breaks are either numeric vector or 2 or more unique cut points or a single number greater or equal to 2
#labels for levels of resulting categories created by cutting
The cut function converts a numeric variable into a categorical variable, with multiple levels. We can simply provide the number of intervals desired (breaks = 3) and R will find cut points (interval boundaries) for us.
We can create our own custom intervals, by simply supplying a vector telling R the start and endpoints of the desired intervals
range(s77$Frost)
## [1] 0 188
s77$Frost_cat <- cut(x = s77$Frost, breaks = c(0, 30, 90, 190), labels = c("low", "intermediate", "high"))
s77
In this case here from 0-30 is considered “low”, >30 - 90 “intermediate”, and >90 - 190 “high”