Harold Nelson
9/10/2024
Create a vector with numerical values.
## [1] 1 2 3 4 5 6 7 8 9 10
What proportion of the values are greater than 7? Do this by hand.
How could you do this with R code? How would you do this in your favorite language?
Create a logical vector based on the values of x.
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
## logi [1:10] FALSE FALSE FALSE FALSE FALSE FALSE ...
The values in this vector are logical, not numerical.
What happens if we use the sum() and/or mean() functions?
## [1] 3
## [1] 0.3
Using logical values in place of numbers, TRUE becomes 1 and FALSE becomes 0.
The sum of a logical expression is the count of cases for which the logical expression is true.
The mean of a logical expression is the fraction of cases for which the logical expression is true.
Note that the logical vector is not necessary. The expression is enough.
## [1] 0.3
This can be applied to dataframes using logical expressions that involve multiple variables.
Load the file cdc2.Rdata after you download it. This will create the datafrane cdc2 in your global environment. Run the command str() to examine the contents.
## 'data.frame': 19997 obs. of 15 variables:
## $ genhlth : Factor w/ 5 levels "excellent","very good",..: 3 3 3 3 2 2 2 2 3 3 ...
## $ exerany : num 0 0 1 1 0 1 1 0 0 1 ...
## $ hlthplan : num 1 1 1 1 1 1 1 1 1 1 ...
## $ smoke100 : num 0 1 1 0 0 0 0 0 1 0 ...
## $ height : num 70 64 60 66 61 64 71 67 65 70 ...
## $ weight : int 175 125 105 132 150 114 194 170 150 180 ...
## $ wtdesire : int 175 115 105 124 130 114 185 160 130 170 ...
## $ age : int 77 33 49 42 55 55 31 45 27 44 ...
## $ gender : Factor w/ 2 levels "m","f": 1 2 2 2 2 2 1 1 2 1 ...
## $ BMI : num 25.1 21.5 20.5 21.3 28.3 ...
## $ BMIDes : num 25.1 19.7 20.5 20 24.6 ...
## $ DesActRatio: num 1 0.92 1 0.939 0.867 ...
## $ BMICat : Factor w/ 5 levels "Underweight",..: 3 2 2 2 3 2 3 3 3 3 ...
## $ BMIDesCat : Factor w/ 5 levels "Underweight",..: 3 2 2 2 2 2 3 3 2 2 ...
## $ ageCat : Factor w/ 4 levels "18-31","32-43",..: 4 2 3 2 3 3 1 3 1 3 ...
Get a count of females with ageCat = “32-43” and BMICat = “Underweight”
the_count = sum(cdc2$gender == 'f' &
cdc2$ageCat == "32-43" &
cdc2$BMICat == "Underweight")
the_count
## [1] 77
Repeat the previous exercise for the proportion. What fraction of all cases meet these three criteria?
the_proportion = mean(cdc2$gender == 'f' &
cdc2$ageCat == "32-43" &
cdc2$BMICat == "Underweight")
the_proportion
## [1] 0.003850578
Find the count and proportion of wierd people. A person is weird if they have a normal BMI and they desire to be either underweight or obese or morbidly obese.
This time start by creating the boolean variable weird.
cdc2$weird = cdc2$BMICat == "Normal" & (cdc2$BMIDesCat == "Underweight" |
cdc2$BMIDesCat == "Obese" |
cdc2$BMIDesCat == "Morbidly Obese")
mean(cdc2$weird)
## [1] 0.00760114
## [1] 152
Use the boolean variable weird to create a dataframe named weirdos containing the weird people.
## genhlth exerany hlthplan smoke100 height weight wtdesire age gender
## 236 excellent 1 0 1 71 155 120 33 f
## 286 very good 1 1 0 59 100 90 44 f
## 297 excellent 1 0 1 63 105 100 19 m
## 324 good 0 1 1 71 150 105 59 m
## 359 excellent 1 0 0 64 115 105 24 f
## 514 very good 1 1 0 68 150 115 24 m
## BMI BMIDes DesActRatio BMICat BMIDesCat ageCat weird
## 236 21.61575 16.73477 0.7741935 Normal Underweight 32-43 TRUE
## 286 20.19535 18.17581 0.9000000 Normal Underweight 44-57 TRUE
## 297 18.59788 17.71227 0.9523810 Normal Underweight 18-31 TRUE
## 324 20.91847 14.64293 0.7000000 Normal Underweight 58-99 TRUE
## 359 19.73755 18.02124 0.9130435 Normal Underweight 18-31 TRUE
## 514 22.80493 17.48378 0.7666667 Normal Underweight 18-31 TRUE
Get a table of gender by BMIDescat for the weirdos.
##
## Underweight Normal Overweight Obese Morbidly Obese
## m 17 0 0 13 0
## f 122 0 0 0 0
Get a table of gender by ageCat for the weirdos.