load("/cloud/project/cdc.Rdata")
str(cdc)
## 'data.frame': 20000 obs. of 9 variables:
## $ genhlth : Factor w/ 5 levels "excellent","very good",..: 3 3 3 3 2 2 2 2 3 3 ...
## $ exerany : num 0 0 1 1 0 1 1 0 0 1 ...
## $ hlthplan: num 1 1 1 1 1 1 1 1 1 1 ...
## $ smoke100: num 0 1 1 0 0 0 0 0 1 0 ...
## $ height : num 70 64 60 66 61 64 71 67 65 70 ...
## $ weight : int 175 125 105 132 150 114 194 170 150 180 ...
## $ wtdesire: int 175 115 105 124 130 114 185 160 130 170 ...
## $ age : int 77 33 49 42 55 55 31 45 27 44 ...
## $ gender : Factor w/ 2 levels "m","f": 1 2 2 2 2 2 1 1 2 1 ...
Create a new dataframe men with only the rows of cdc for which gender is “m”.
men = cdc[cdc$gender=='m',]
str(men)
## 'data.frame': 9569 obs. of 9 variables:
## $ genhlth : Factor w/ 5 levels "excellent","very good",..: 3 2 2 3 1 4 1 1 4 3 ...
## $ exerany : num 0 1 0 1 1 1 1 1 1 1 ...
## $ hlthplan: num 1 1 1 1 1 1 0 1 0 1 ...
## $ smoke100: num 0 0 0 0 1 1 1 1 0 1 ...
## $ height : num 70 71 67 70 69 69 66 70 69 73 ...
## $ weight : int 175 194 170 180 186 168 185 170 170 185 ...
## $ wtdesire: int 175 185 160 170 175 148 220 170 170 175 ...
## $ age : int 77 31 45 44 46 62 21 69 23 79 ...
## $ gender : Factor w/ 2 levels "m","f": 1 1 1 1 1 1 1 1 1 1 ...
table(men$gender)
##
## m f
## 9569 0
mrows = cdc$gender == 'm'
str(mrows)
## logi [1:20000] TRUE FALSE FALSE FALSE FALSE FALSE ...
men2 = cdc[mrows,]
str(men2)
## 'data.frame': 9569 obs. of 9 variables:
## $ genhlth : Factor w/ 5 levels "excellent","very good",..: 3 2 2 3 1 4 1 1 4 3 ...
## $ exerany : num 0 1 0 1 1 1 1 1 1 1 ...
## $ hlthplan: num 1 1 1 1 1 1 0 1 0 1 ...
## $ smoke100: num 0 0 0 0 1 1 1 1 0 1 ...
## $ height : num 70 71 67 70 69 69 66 70 69 73 ...
## $ weight : int 175 194 170 180 186 168 185 170 170 185 ...
## $ wtdesire: int 175 185 160 170 175 148 220 170 170 175 ...
## $ age : int 77 31 45 44 46 62 21 69 23 79 ...
## $ gender : Factor w/ 2 levels "m","f": 1 1 1 1 1 1 1 1 1 1 ...
What happens if the logical vector is not long enough. Try v = c(True,False)
v = c(TRUE,FALSE)
vres = cdc[v,]
str(vres)
## 'data.frame': 10000 obs. of 9 variables:
## $ genhlth : Factor w/ 5 levels "excellent","very good",..: 3 3 2 2 3 1 1 4 3 3 ...
## $ exerany : num 0 1 0 1 0 1 1 1 0 1 ...
## $ hlthplan: num 1 1 1 1 1 1 0 0 0 1 ...
## $ smoke100: num 0 1 0 0 1 1 1 0 1 1 ...
## $ height : num 70 60 61 71 65 69 66 69 67 75 ...
## $ weight : int 175 105 150 194 150 186 185 170 156 200 ...
## $ wtdesire: int 175 105 130 185 130 175 220 170 150 190 ...
## $ age : int 77 49 55 31 27 46 21 23 47 43 ...
## $ gender : Factor w/ 2 levels "m","f": 1 2 2 1 2 1 1 1 1 1 ...
head(cdc)
## genhlth exerany hlthplan smoke100 height weight wtdesire age gender
## 1 good 0 1 0 70 175 175 77 m
## 2 good 0 1 1 64 125 115 33 f
## 3 good 1 1 1 60 105 105 49 f
## 4 good 1 1 0 66 132 124 42 f
## 5 very good 0 1 0 61 150 130 55 f
## 6 very good 1 1 0 64 114 114 55 f
head(vres)
## genhlth exerany hlthplan smoke100 height weight wtdesire age gender
## 1 good 0 1 0 70 175 175 77 m
## 3 good 1 1 1 60 105 105 49 f
## 5 very good 0 1 0 61 150 130 55 f
## 7 very good 1 1 0 71 194 185 31 m
## 9 good 0 1 1 65 150 130 27 f
## 11 excellent 1 1 1 69 186 175 46 m
First 3 Rows of CDC
f3 = c(453,2941,3235)
f3
## [1] 453 2941 3235
f3cdc = cdc[f3,]
str(f3cdc)
## 'data.frame': 3 obs. of 9 variables:
## $ genhlth : Factor w/ 5 levels "excellent","very good",..: 3 1 2
## $ exerany : num 1 1 1
## $ hlthplan: num 1 1 0
## $ smoke100: num 1 0 0
## $ height : num 67 68 65
## $ weight : int 210 139 150
## $ wtdesire: int 200 139 135
## $ age : int 78 19 28
## $ gender : Factor w/ 2 levels "m","f": 1 1 2
head(f3cdc)
## genhlth exerany hlthplan smoke100 height weight wtdesire age gender
## 453 good 1 1 1 67 210 200 78 m
## 2941 excellent 1 1 0 68 139 139 19 m
## 3235 very good 1 0 0 65 150 135 28 f
mysamp = sample(1:nrow(cdc),size=10)
mysamp
## [1] 3419 13298 4088 13313 9326 17635 13262 11128 3986 8369
cdcsamp = cdc[mysamp,]
str(cdcsamp)
## 'data.frame': 10 obs. of 9 variables:
## $ genhlth : Factor w/ 5 levels "excellent","very good",..: 1 1 1 2 2 2 4 3 1 3
## $ exerany : num 1 1 1 0 1 0 1 1 1 1
## $ hlthplan: num 1 1 1 1 1 0 1 0 1 1
## $ smoke100: num 0 0 1 1 0 0 0 0 0 0
## $ height : num 60 62 70 72 70 71 64 75 72 65
## $ weight : int 105 150 200 195 170 155 140 280 165 280
## $ wtdesire: int 105 130 200 185 170 165 125 230 165 160
## $ age : int 31 37 55 40 30 21 50 21 24 66
## $ gender : Factor w/ 2 levels "m","f": 2 2 1 1 1 1 2 1 1 2
that a randomly selected person from cdc is a man between 40 and 50.
want = cdc$gender == "m" & cdc$age > 40 & cdc$age < 50
str(want)
## logi [1:20000] FALSE FALSE FALSE FALSE FALSE FALSE ...
table(want)
## want
## FALSE TRUE
## 18224 1776
mean(want)
## [1] 0.0888
The mean value of a logical expression is the proportion of cases for which the expression is true.