Sameer Mathur
patientID <- c(1, 2, 3, 4, 5, 6)
age <- c(25, 34, 28, 52, 34, 64)
diabetes <- c("Type1", "Type2", "Type1", "Type1", "Type2", "Type1")
status <- c("Poor", "Improved", "Excellent", "Poor", "Improved", "Excellent")
patientdata <- data.frame(patientID, age, diabetes, status)
patientdata
patientID age diabetes status
1 1 25 Type1 Poor
2 2 34 Type2 Improved
3 3 28 Type1 Excellent
4 4 52 Type1 Poor
5 5 34 Type2 Improved
6 6 64 Type1 Excellent
# structure of the patientdata
str(patientdata)
'data.frame': 6 obs. of 4 variables:
$ patientID: num 1 2 3 4 5 6
$ age : num 25 34 28 52 34 64
$ diabetes : Factor w/ 2 levels "Type1","Type2": 1 2 1 1 2 1
$ status : Factor w/ 3 levels "Excellent","Improved",..: 3 2 1 3 2 1
# summary
summary(patientdata)
patientID age diabetes status
Min. :1.00 Min. :25.0 Type1:4 Excellent:2
1st Qu.:2.25 1st Qu.:29.5 Type2:2 Improved :2
Median :3.50 Median :34.0 Poor :2
Mean :3.50 Mean :39.5
3rd Qu.:4.75 3rd Qu.:47.5
Max. :6.00 Max. :64.0
# structure of mtcars
data(mtcars)
str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
# car cylinders in mtcars
cyls = mtcars$cyl
cyls
[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
is.factor(cyls)
[1] FALSE
is.numeric(cyls)
[1] TRUE
Now let's create a factor variable called cyls.f based on cyls.
# car cylinders in mtcars
cyls
[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
cyls.f <- factor(cyls, labels = c("Low", "Medium", "High"))
cyls.f
[1] Medium Medium Low Medium High Medium High Low Low Medium
[11] Medium High High High High High High Low Low Low
[21] Low High High High High Low Low Low High Medium
[31] High Low
Levels: Low Medium High
The first label “Low” “, will correspond to cyl=4, the second label "Medium” will correspond to cyl=6 etc, because the order of the labels will follow the numeric order of the data.