Harold Nelson
9/12/2020
Video is at https://www.youtube.com/watch?v=2lbj58zB7bc
These notes are designed to follow the Datacamp course Intermediate R. They include some details that I want to emphasize or extra points that I consider worth mentioning.
This particular set focuses on logical values, the first chapter of the course, but it does include other topics as appropriate. You will also hear the term Boolean values after the originator of this area of mathematics. See https://en.wikipedia.org/wiki/Boolean_algebra for more information.
Working through these notes should give you enough experience with logical expressions to distinguish them easily from normal algebraic expressions.
The notes are in question and answer format. To get the most out of them, you should have RStudio open and respond actively to the questions as they come up.
How do you use NA values in logical expressions? It is clear that the numerical value 7 is not NA. It would seem that the logical expression 7 == NA should evaluate to false. It is also clear that NA itself is NA. Therefore NA == NA should be TRUE. Try it.
## [1] NA
## [1] NA
That didn’t work. The key principle is that any expression involving NA is NA.
We need to make use of the function is.na() to answer such questions with a TRUE/FALSE answer. It returns TRUE or FALSE depending on the value of the argument. Try it with 7 and with NA.
Most presentations of logical operators use truth tables to give precise definitions of the logical operators. Given two logical variables, P and Q, there are four possible combinations of truth values. The standard arrangement of these possibilities is based on a tree diagram or a nested for loop in which P is in the outer loop or the fist level of the tree. The table below is implemented as a dataframe and shows the value of P & Q for each of the four possibilities.
## P Q P_and_Q
## 1 TRUE TRUE TRUE
## 2 TRUE FALSE FALSE
## 3 FALSE TRUE FALSE
## 4 FALSE FALSE FALSE
Following the example above construct a truth table for the or operator which is written “|” in R.
## P Q P_or_Q
## 1 TRUE TRUE TRUE
## 2 TRUE FALSE TRUE
## 3 FALSE TRUE TRUE
## 4 FALSE FALSE FALSE
Not, written ! in R is a unary operator, so its truth table only needs two rows. Create this table yourself.
## P1 not_P1
## 1 TRUE FALSE
## 2 FALSE TRUE
Two logical expressions are equivalent if they have the same truth value for every possible combination of the basic values of their lowest level components. The way we do this in Math. courses is to build a truth table with columns containing the truth values for the two expressions. Then we visually examine the two columns. If the two columns are identical, the expressions are logically equivalent.
As an example consider the following two logical expressions
Exercise: Let’s build a truth table containing these.
P = c(T,T,F,F)
Q = c(T,F,T,F)
Exp1 = !(P & Q)
Exp2 = !P | !Q
tt_df = data.frame(P,Q,Exp1,Exp2)
tt_df
## P Q Exp1 Exp2
## 1 TRUE TRUE FALSE FALSE
## 2 TRUE FALSE TRUE TRUE
## 3 FALSE TRUE TRUE TRUE
## 4 FALSE FALSE TRUE TRUE
Yes! they are identical. We can see that visually, but did we need to look. Could we just compare the two logical vectors?
Exercise: Do that.
## [1] TRUE TRUE TRUE TRUE
We still had to look and see that every position in the logical vector contained the value TRUE. How could we simplify the conclusion?
The idea is that we want to see if the count of TRUE values in the logical vectors is the same as its length.
Exercise: Do that.
Let’s just look at the two numbers.
## [1] 4
## [1] 4
Yes, they are the same. Of course I could ask this question even more directly.
Do that.
## [1] TRUE
This particular logical equivalence is one of DeMorgan’s two laws. The second one is almost the same, but the two logical operators are reversed in their roles.
In the algebra of real numbers, we know that the following is true. For any three numbers, x, y and z
\[x(y+z)=x y + x z\] There are two similar laws in Boolean algebra. We’ll just look at one of them. For any three logical values P, Q and R The following expressions are equivalent.
A truth table would contain eight rows and be a tedious chore.But computing the values of the two logical expressions for every possible cobination of P, Q and R would be easy using nested for loops.
Do That:
TV = c(T,F)
for(P in TV){
for(Q in TV){
for(R in TV){
Exp1 = P & (Q | R)
Exp2 = (P & Q) | (P & R)
print(Exp1 == Exp2)
}
}
}
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
That’s a very not so R-ish way to do things. Experienced R users do manipulations with vectors instead of for loops. We can generate the base columns of the 8-row truth table as follows using simple brute strength.
P = c(T,T,T,T,F,F,F,F)
Q = c(T,T,F,F,T,T,F,F)
R = c(T,F,T,F,T,F,T,F)
base_cols = data.frame(P,Q,R)
base_cols
## P Q R
## 1 TRUE TRUE TRUE
## 2 TRUE TRUE FALSE
## 3 TRUE FALSE TRUE
## 4 TRUE FALSE FALSE
## 5 FALSE TRUE TRUE
## 6 FALSE TRUE FALSE
## 7 FALSE FALSE TRUE
## 8 FALSE FALSE FALSE
Then we can create the expressions we want using vector operations and test the equality of the two.
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
You could also build the base columns using the rep() function. Visit http://www.endmemo.com/r/rep.php to see how rep() works. Then use it to build P, Q and R.
P = rep(c(T,F),each = 4)
Q = rep(c(T,T,F,F),times = 2)
R = rep(c(T,F),times = 4)
tt_df = data.frame(P,Q,R)
tt_df
## P Q R
## 1 TRUE TRUE TRUE
## 2 TRUE TRUE FALSE
## 3 TRUE FALSE TRUE
## 4 TRUE FALSE FALSE
## 5 FALSE TRUE TRUE
## 6 FALSE TRUE FALSE
## 7 FALSE FALSE TRUE
## 8 FALSE FALSE FALSE
In R4DS, Hadley makes the point that in R, there are three potential values of a boolean variable:
How can we investigate the equivalence of two expressions while incorporating the third value?
It’s fairly easy to do with the nested for loops. In fact, we only need to insert three characters in the code.
Here’s an example.
TV = c(T,F,NA)
for(P in TV){
for(Q in TV){
for(R in TV){
Exp1 = P & (Q | R)
Exp2 = (P & Q) | (P & R)
print(Exp1 == Exp2)
}
}
}
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] NA
## [1] TRUE
## [1] NA
## [1] NA
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] NA
## [1] NA
## [1] NA
## [1] NA
## [1] TRUE
## [1] NA
## [1] NA
## [1] NA
## [1] NA
Expand on Exp1 == Exp2 to include the cases where both expressions are NA.
TV = c(T,F,NA)
for(P in TV){
for(Q in TV){
for(R in TV){
Exp1 = P & (Q | R)
Exp2 = (P & Q) | (P & R)
print((Exp1 == Exp2) | (is.na(Exp1) == is.na(Exp2)))
}
}
}
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] TRUE