Counts and Proportions

Harold Nelson

9/10/2024

Create a vector with numerical values.

x = 1:10
x
##  [1]  1  2  3  4  5  6  7  8  9 10

What proportion of the values are greater than 7? Do this by hand.

How could you do this with R code? How would you do this in your favorite language?

R is Different

Create a logical vector based on the values of x.

x_gt_7 = x > 7
x_gt_7
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE
str(x_gt_7)
##  logi [1:10] FALSE FALSE FALSE FALSE FALSE FALSE ...

The values in this vector are logical, not numerical.

Arithmetic on logical values?

What happens if we use the sum() and/or mean() functions?

sum(x_gt_7)
## [1] 3
mean(x_gt_7)
## [1] 0.3

What happened?

Using logical values in place of numbers, TRUE becomes 1 and FALSE becomes 0.

The sum of a logical expression is the count of cases for which the logical expression is true.

The mean of a logical expression is the fraction of cases for which the logical expression is true.

Note that the logical vector is not necessary. The expression is enough.

mean(x > 7)
## [1] 0.3

Use Anywhere

This can be applied to dataframes using logical expressions that involve multiple variables.

Load the file cdc2.Rdata after you download it. This will create the datafrane cdc2 in your global environment. Run the command str() to examine the contents.

Solution

load("cdc2.Rdata")
str(cdc2)
## 'data.frame':    19997 obs. of  15 variables:
##  $ genhlth    : Factor w/ 5 levels "excellent","very good",..: 3 3 3 3 2 2 2 2 3 3 ...
##  $ exerany    : num  0 0 1 1 0 1 1 0 0 1 ...
##  $ hlthplan   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ smoke100   : num  0 1 1 0 0 0 0 0 1 0 ...
##  $ height     : num  70 64 60 66 61 64 71 67 65 70 ...
##  $ weight     : int  175 125 105 132 150 114 194 170 150 180 ...
##  $ wtdesire   : int  175 115 105 124 130 114 185 160 130 170 ...
##  $ age        : int  77 33 49 42 55 55 31 45 27 44 ...
##  $ gender     : Factor w/ 2 levels "m","f": 1 2 2 2 2 2 1 1 2 1 ...
##  $ BMI        : num  25.1 21.5 20.5 21.3 28.3 ...
##  $ BMIDes     : num  25.1 19.7 20.5 20 24.6 ...
##  $ DesActRatio: num  1 0.92 1 0.939 0.867 ...
##  $ BMICat     : Factor w/ 5 levels "Underweight",..: 3 2 2 2 3 2 3 3 3 3 ...
##  $ BMIDesCat  : Factor w/ 5 levels "Underweight",..: 3 2 2 2 2 2 3 3 2 2 ...
##  $ ageCat     : Factor w/ 4 levels "18-31","32-43",..: 4 2 3 2 3 3 1 3 1 3 ...

Exercise

Get a count of females with ageCat = “32-43” and BMICat = “Underweight”

Solution

the_count = sum(cdc2$gender == 'f' &
                cdc2$ageCat == "32-43" &
                cdc2$BMICat == "Underweight")
the_count
## [1] 77

Exercise

Repeat the previous exercise for the proportion. What fraction of all cases meet these three criteria?

Solution

the_proportion  = mean(cdc2$gender == 'f' &
                cdc2$ageCat == "32-43" &
                cdc2$BMICat == "Underweight")
the_proportion
## [1] 0.003850578

Exercise

Find the count and proportion of wierd people. A person is weird if they have a normal BMI and they desire to be either underweight or obese or morbidly obese.

This time start by creating the boolean variable weird.

Solution

cdc2$weird = cdc2$BMICat == "Normal" & (cdc2$BMIDesCat == "Underweight" |
 cdc2$BMIDesCat == "Obese"   |
 cdc2$BMIDesCat == "Morbidly Obese")

mean(cdc2$weird)
## [1] 0.00760114
sum(cdc2$weird)
## [1] 152

Exercise

Use the boolean variable weird to create a dataframe named weirdos containing the weird people.

Solution

weirdos = cdc2[cdc2$weird,]
head(weirdos)
##       genhlth exerany hlthplan smoke100 height weight wtdesire age gender
## 236 excellent       1        0        1     71    155      120  33      f
## 286 very good       1        1        0     59    100       90  44      f
## 297 excellent       1        0        1     63    105      100  19      m
## 324      good       0        1        1     71    150      105  59      m
## 359 excellent       1        0        0     64    115      105  24      f
## 514 very good       1        1        0     68    150      115  24      m
##          BMI   BMIDes DesActRatio BMICat   BMIDesCat ageCat weird
## 236 21.61575 16.73477   0.7741935 Normal Underweight  32-43  TRUE
## 286 20.19535 18.17581   0.9000000 Normal Underweight  44-57  TRUE
## 297 18.59788 17.71227   0.9523810 Normal Underweight  18-31  TRUE
## 324 20.91847 14.64293   0.7000000 Normal Underweight  58-99  TRUE
## 359 19.73755 18.02124   0.9130435 Normal Underweight  18-31  TRUE
## 514 22.80493 17.48378   0.7666667 Normal Underweight  18-31  TRUE

Exercise

Get a table of gender by BMIDescat for the weirdos.

Solution

table(weirdos$gender,weirdos$BMIDesCat)
##    
##     Underweight Normal Overweight Obese Morbidly Obese
##   m          17      0          0    13              0
##   f         122      0          0     0              0

Exercise

Get a table of gender by ageCat for the weirdos.

Solution

table(weirdos$gender,weirdos$ageCat)
##    
##     18-31 32-43 44-57 58-99
##   m    21     4     2     3
##   f    55    37    19    11