load("/cloud/project/cdc.Rdata")
str(cdc)
## 'data.frame':    20000 obs. of  9 variables:
##  $ genhlth : Factor w/ 5 levels "excellent","very good",..: 3 3 3 3 2 2 2 2 3 3 ...
##  $ exerany : num  0 0 1 1 0 1 1 0 0 1 ...
##  $ hlthplan: num  1 1 1 1 1 1 1 1 1 1 ...
##  $ smoke100: num  0 1 1 0 0 0 0 0 1 0 ...
##  $ height  : num  70 64 60 66 61 64 71 67 65 70 ...
##  $ weight  : int  175 125 105 132 150 114 194 170 150 180 ...
##  $ wtdesire: int  175 115 105 124 130 114 185 160 130 170 ...
##  $ age     : int  77 33 49 42 55 55 31 45 27 44 ...
##  $ gender  : Factor w/ 2 levels "m","f": 1 2 2 2 2 2 1 1 2 1 ...

Exercise

Create a new dataframe men with only the rows of cdc for which gender is “m”.

Solution 1

men = cdc[cdc$gender=='m',]
str(men)
## 'data.frame':    9569 obs. of  9 variables:
##  $ genhlth : Factor w/ 5 levels "excellent","very good",..: 3 2 2 3 1 4 1 1 4 3 ...
##  $ exerany : num  0 1 0 1 1 1 1 1 1 1 ...
##  $ hlthplan: num  1 1 1 1 1 1 0 1 0 1 ...
##  $ smoke100: num  0 0 0 0 1 1 1 1 0 1 ...
##  $ height  : num  70 71 67 70 69 69 66 70 69 73 ...
##  $ weight  : int  175 194 170 180 186 168 185 170 170 185 ...
##  $ wtdesire: int  175 185 160 170 175 148 220 170 170 175 ...
##  $ age     : int  77 31 45 44 46 62 21 69 23 79 ...
##  $ gender  : Factor w/ 2 levels "m","f": 1 1 1 1 1 1 1 1 1 1 ...
table(men$gender)
## 
##    m    f 
## 9569    0

Solution 2

mrows = cdc$gender == 'm'
str(mrows)
##  logi [1:20000] TRUE FALSE FALSE FALSE FALSE FALSE ...
men2 = cdc[mrows,]
str(men2)
## 'data.frame':    9569 obs. of  9 variables:
##  $ genhlth : Factor w/ 5 levels "excellent","very good",..: 3 2 2 3 1 4 1 1 4 3 ...
##  $ exerany : num  0 1 0 1 1 1 1 1 1 1 ...
##  $ hlthplan: num  1 1 1 1 1 1 0 1 0 1 ...
##  $ smoke100: num  0 0 0 0 1 1 1 1 0 1 ...
##  $ height  : num  70 71 67 70 69 69 66 70 69 73 ...
##  $ weight  : int  175 194 170 180 186 168 185 170 170 185 ...
##  $ wtdesire: int  175 185 160 170 175 148 220 170 170 175 ...
##  $ age     : int  77 31 45 44 46 62 21 69 23 79 ...
##  $ gender  : Factor w/ 2 levels "m","f": 1 1 1 1 1 1 1 1 1 1 ...

Something Else

What happens if the logical vector is not long enough. Try v = c(True,False)

Solution

v = c(TRUE,FALSE)
vres = cdc[v,]
str(vres)
## 'data.frame':    10000 obs. of  9 variables:
##  $ genhlth : Factor w/ 5 levels "excellent","very good",..: 3 3 2 2 3 1 1 4 3 3 ...
##  $ exerany : num  0 1 0 1 0 1 1 1 0 1 ...
##  $ hlthplan: num  1 1 1 1 1 1 0 0 0 1 ...
##  $ smoke100: num  0 1 0 0 1 1 1 0 1 1 ...
##  $ height  : num  70 60 61 71 65 69 66 69 67 75 ...
##  $ weight  : int  175 105 150 194 150 186 185 170 156 200 ...
##  $ wtdesire: int  175 105 130 185 130 175 220 170 150 190 ...
##  $ age     : int  77 49 55 31 27 46 21 23 47 43 ...
##  $ gender  : Factor w/ 2 levels "m","f": 1 2 2 1 2 1 1 1 1 1 ...
head(cdc)
##     genhlth exerany hlthplan smoke100 height weight wtdesire age gender
## 1      good       0        1        0     70    175      175  77      m
## 2      good       0        1        1     64    125      115  33      f
## 3      good       1        1        1     60    105      105  49      f
## 4      good       1        1        0     66    132      124  42      f
## 5 very good       0        1        0     61    150      130  55      f
## 6 very good       1        1        0     64    114      114  55      f
head(vres)
##      genhlth exerany hlthplan smoke100 height weight wtdesire age gender
## 1       good       0        1        0     70    175      175  77      m
## 3       good       1        1        1     60    105      105  49      f
## 5  very good       0        1        0     61    150      130  55      f
## 7  very good       1        1        0     71    194      185  31      m
## 9       good       0        1        1     65    150      130  27      f
## 11 excellent       1        1        1     69    186      175  46      m

Extract with a numeric vector

First 3 Rows of CDC

f3 = c(453,2941,3235)
f3
## [1]  453 2941 3235
f3cdc = cdc[f3,]
str(f3cdc)
## 'data.frame':    3 obs. of  9 variables:
##  $ genhlth : Factor w/ 5 levels "excellent","very good",..: 3 1 2
##  $ exerany : num  1 1 1
##  $ hlthplan: num  1 1 0
##  $ smoke100: num  1 0 0
##  $ height  : num  67 68 65
##  $ weight  : int  210 139 150
##  $ wtdesire: int  200 139 135
##  $ age     : int  78 19 28
##  $ gender  : Factor w/ 2 levels "m","f": 1 1 2
head(f3cdc)
##        genhlth exerany hlthplan smoke100 height weight wtdesire age gender
## 453       good       1        1        1     67    210      200  78      m
## 2941 excellent       1        1        0     68    139      139  19      m
## 3235 very good       1        0        0     65    150      135  28      f

Sample

mysamp = sample(1:nrow(cdc),size=10)
mysamp
##  [1]  3419 13298  4088 13313  9326 17635 13262 11128  3986  8369
cdcsamp = cdc[mysamp,]
str(cdcsamp)
## 'data.frame':    10 obs. of  9 variables:
##  $ genhlth : Factor w/ 5 levels "excellent","very good",..: 1 1 1 2 2 2 4 3 1 3
##  $ exerany : num  1 1 1 0 1 0 1 1 1 1
##  $ hlthplan: num  1 1 1 1 1 0 1 0 1 1
##  $ smoke100: num  0 0 1 1 0 0 0 0 0 0
##  $ height  : num  60 62 70 72 70 71 64 75 72 65
##  $ weight  : int  105 150 200 195 170 155 140 280 165 280
##  $ wtdesire: int  105 130 200 185 170 165 125 230 165 160
##  $ age     : int  31 37 55 40 30 21 50 21 24 66
##  $ gender  : Factor w/ 2 levels "m","f": 2 2 1 1 1 1 2 1 1 2

Calculate the probability

that a randomly selected person from cdc is a man between 40 and 50.

Solution

want = cdc$gender == "m" & cdc$age > 40 & cdc$age < 50
str(want)
##  logi [1:20000] FALSE FALSE FALSE FALSE FALSE FALSE ...
table(want)
## want
## FALSE  TRUE 
## 18224  1776
mean(want)
## [1] 0.0888

Remember

The mean value of a logical expression is the proportion of cases for which the expression is true.