Using Logical Values

Harold Nelson

7/17/2019

You have learned about logical variables and logical expressions in the Datacamp course Introduction to R. These notes will show you how to use this feature of R in commmon exploratory tasks.

Let’s create a simple numeric vector and a related vector of logical values and display both of them.

x = 1:10
x
##  [1]  1  2  3  4  5  6  7  8  9 10
midx = x > 3 & x < 7
midx
##  [1] FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE

Note that the vector midx contains the value TRUE if and only if the corresponding value in x satisfies the condition spelled out by the logical expression that defined midx.

How useful is this?

Suppose you want to count the number of entries in x that satisfy the condition. Here’s a way you could do this.

num_sat = sum(midx)
num_sat
## [1] 3

What happened? You can see that when R was asked to use logical values in a numeric expression, it converted the value TRUE to 1 and the value FALSE to 0.

Did we really need to create the vector midx? No. We coud have just as easily used the logical expression.

num_sat = sum(x > 3 & x < 7)
num_sat
## [1] 3

You may not be interested in the count of satisfying values. You may just want to know what fraction of the values in x satisfy the condition. You could divide the value of num_sat by the length of x. Or, you could just use the mean() function on the logical expression.

num_sat/length(x)
## [1] 0.3
mean(x > 3 & x < 7)
## [1] 0.3

Take-aways:

  1. The sum of a logical expression is the count of items satisfying the logical expression.

  2. The mean of a logical expression is the fraction of items satisfying the logical expression.

Exercise

  1. Create a numeric vector consisting of the integers starting at 10 and ending at 25.

  2. How many of these integers are greater than 15 and less then 21?

  3. What fraction of these integers satisfy the condition above?

Solution

x = 10:25
sum(x > 15 & x < 21)
## [1] 5
mean(x > 15 & x < 21)
## [1] 0.3125

Note that since I didn’t store the results of these expressions, they were simply displayed. If I had wanted to make some use of them later, I would have stored them, but then to see them, I’d have to ask specifically. The code in the next chunk demonstrates this concept.

count_sat = sum(x > 15 & x < 21)
count_sat
## [1] 5
fract_sat = mean(x > 15 & x < 21)
fract_sat
## [1] 0.3125

Subsetting

Probably the most common use of logical values in data exploration is the creation of subsets of dataframes or vectors. Consider the dataframe diamonds, which is in the ggplot2 package. Suppose you want to analyze the diamonds which have an “Ideal” cut but are priced at less than $1,000.

# If you need to get the ggplot2 package, un-comment the following,
# line.

# install.packages(ggplot2)
library(ggplot2)

Cheap_ideal = diamonds[diamonds$cut == "Ideal" & diamonds$price < 1000,]

# Run str() to verify the result.
str(Cheap_ideal)
## Classes 'tbl_df', 'tbl' and 'data.frame':    6838 obs. of  10 variables:
##  $ carat  : num  0.23 0.23 0.31 0.3 0.33 0.33 0.33 0.23 0.32 0.3 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 7 7 6 6 6 7 4 6 6 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 5 2 2 2 2 3 5 3 2 ...
##  $ depth  : num  61.5 62.8 62.2 62 61.8 61.2 61.1 61.9 60.9 61 ...
##  $ table  : num  55 56 54 54 55 56 56 54 55 59 ...
##  $ price  : int  326 340 344 348 403 403 403 404 404 405 ...
##  $ x      : num  3.95 3.93 4.35 4.31 4.49 4.49 4.49 3.93 4.45 4.3 ...
##  $ y      : num  3.98 3.9 4.37 4.34 4.51 4.5 4.55 3.95 4.48 4.33 ...
##  $ z      : num  2.43 2.46 2.71 2.68 2.78 2.75 2.76 2.44 2.72 2.63 ...

Exercise

Create a subset named myCars of mtcars consisting of all the cars with more than 100 cubic inches of displacement and mpg above 20. Run str() to verify the result. Display a summary of average horsepower for this subset.

Solution

myCars = mtcars[mtcars$disp > 100 & mtcars$mpg > 20,]
str(myCars)
## 'data.frame':    9 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 24.4 22.8 21.5 26 21.4
##  $ cyl : num  6 6 4 6 4 4 4 4 4
##  $ disp: num  160 160 108 258 147 ...
##  $ hp  : num  110 110 93 110 62 95 97 91 109
##  $ drat: num  3.9 3.9 3.85 3.08 3.69 3.92 3.7 4.43 4.11
##  $ wt  : num  2.62 2.88 2.32 3.21 3.19 ...
##  $ qsec: num  16.5 17 18.6 19.4 20 ...
##  $ vs  : num  0 0 1 1 1 1 1 0 1
##  $ am  : num  1 1 1 0 0 0 0 1 1
##  $ gear: num  4 4 4 3 4 4 3 5 4
##  $ carb: num  4 4 1 1 2 2 1 2 2
summary(myCars$hp)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   62.00   93.00   97.00   97.44  110.00  110.00