This refresher course is based on:
Franken, W.M. & Bouts, R.A. (2002). Wiskunde voor statistiek: Een voorbereiding. Bussum, Netherlands: Uitgeverij Coutinho
The refresher course consists of five chapters:
- Set Theory
- Mathematical Operations
- Equations
- Functions & Graphs
- Statistical Operations
This module illustrates all topics in the textbook, and how they can be implemented in R.
On the fly, you will learn many of the basic functions of R.
1. Sets
1.1 Sets in Mathematics
- A set is a well-defined collection of objects.
- Each object in a set is called an element of the set.
- Two sets are equal if they have exactly the same elements in them.
- A set that contains no elements is called a null set or an empty set.
- If every element in Set A is also in Set B, then Set A is a subset of Set B.
Example: if A is the set of first five charachters of the alphabet, then: A = {a,b,c,d,e}
In R we can create this set as follows.
We can decide on the name ourselves. Here, we use “SetA”.
The two characters “<” and “-” look like an arrow. We combine the characters on the right of the arrow, using the c() function We then assign this combination to SetA, on the lefthand side.
SetA <- c("a","b","c","d","e")
"a" %in% SetA
[1] TRUE
"f" %in% SetA
[1] FALSE
Likewise, we can create a set of numbers, from 1 to 100.
SetN <- c(1:100)
length(SetN)
[1] 100
88 %in% SetN # Is 88 an element of our set? [TRUE, it is!]
[1] TRUE
105 %in%SetN # Is 105 an element of our set? [FALSE, it is not!]
[1] FALSE
1.2 Relations between sets
Empty Set
An empty set is a set without elements.
SetEmpty <- NULL
length(SetEmpty)
[1] 0
"2" %in% SetEmpty
[1] FALSE
Identical sets
Apart from the sequence of elements, the two sets below are identical.
All elements of A occur in B, and all elements of B are part of A.
SetA <- c("1","3","4","6")
SetB <- c("1","4","6","3")
SetB %in% SetA
[1] TRUE TRUE TRUE TRUE
SetA %in% SetB
[1] TRUE TRUE TRUE TRUE
We can test the differences and similarities using the commands below.
The overlap between two sets, is called the intersection. The intersection is graphically displayed below.
All elements that are part of two sets combined (either A, B or both) form the union of the two sets. In a graph:
The intersect of two sets is written as \(A \cap B\)
The union of two sets is written as \(A \cup B\)
setdiff(SetA, SetB)
character(0)
setequal(SetA, SetB)
[1] TRUE
intersect(SetA, SetB)
[1] "1" "3" "4" "6"
Let’s look at some other examples.
The two sets below are not identical, and contain duplicates.
From the result we see that duplicates are removed!
SetA <- c("a","a","b","c","d")
SetB <- c("c","c","d","e")
union(SetA, SetB)
[1] "a" "b" "c" "d" "e"
(SetA)
[1] "a" "a" "b" "c" "d"
unique(SetA) # Removes duplicates
[1] "a" "b" "c" "d"
duplicated(SetA)
[1] FALSE TRUE FALSE FALSE FALSE
SetA[!duplicated(SetA)] # Removes duplicates
[1] "a" "b" "c" "d"
setdiff(SetA, SetB)
[1] "a" "b"
setdiff(SetB, SetA)
[1] "e"
To challenge your skills, we can define the union of sets A and B as follows:
- Elements unique to A, plus:
- Elements unique to B, plus:
- Elements in the interesection of A and B.
We can sort the combination of these three parts alphabetically, and check if indeed the result is equal to what we have defined as the union:
sort(c(setdiff(SetA, SetB), setdiff(SetB, SetA), intersect(SetB, SetA)))
[1] "a" "b" "c" "d" "e"
all(sort(c(setdiff(SetA, SetB), setdiff(SetB, SetA), intersect(SetB, SetA))) == union(SetA, SetB))
[1] TRUE
The all() function can be used to see if all elements of one set, are contained by the other.
All elements of set C (a duplicated element “a” ) are part of D.
One element of D, is not contained in A.
SetC <- c("a","a"); SetD <- c("a","b")
SetC %in% SetD
[1] TRUE TRUE
all(SetC %in% SetD)
[1] TRUE
SetD %in% SetC
[1] TRUE FALSE
all(SetD %in% SetC)
[1] FALSE
Sets of Numbers
(SetN <- c(-10:+10)) # Putting the expression between brackets, prints the object in the console!
[1] -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
length(SetN)
[1] 21
SetN[3]
[1] -8
SetN[3] < SetN[4]
[1] TRUE
Exercises
Exercise 1
Given are the following sets:
A = {2,4,6,8,10,12,14,16} B = {1,2,3,4,5,6,7} C = {3,6,9,12,15,18}
- A \(\cap\) B = (Intersection of A and B)
- A \(\cap\) C =
- B \(\cap\) C =
- A \(\cap\) B \(\cap\) C =
- A \(\cup\) B = (Union of A and B)
- A \(\cap\) B \(\cap\) C =
Solution
Let’s first create the sets A, B and C
A <- seq(2, 16, 2)
B <- c(1:7)
C <- seq(3, 18, 3)
A; B; C # Prints the three sets
[1] 2 4 6 8 10 12 14 16
[1] 1 2 3 4 5 6 7
[1] 3 6 9 12 15 18
- A \(\cap\) B
(intersect(A, B)) # Putting the command between brackets, prints the result
[1] 2 4 6
- A \(\cap\) C
(intersect(A, C))
[1] 6 12
- B \(\cap\) C
(intersect(B, C))
[1] 3 6
- A \(\cap\) B \(\cap\) C
intersect(A, B, C) will give an error message, as intersect() works on two sets!
we can do it in 2 steps; first, we determine the intersection of A and B; followed by the intersection of that result with C!
You can imagine that this is doable for three sets. But if we were to find the intersection between many sets, then the experession would get very lengthy! A better alternative is to use the Reduce() function!
Often, you can find solutions to this kind of challenging by googling. The Reduce() function, for example, we found on link
intersect(intersect(A,B),C)
[1] 6
Reduce(intersect, list(A,B,C))
[1] 6
- A \(\cup\) B
sort((union(A, B)))
[1] 1 2 3 4 5 6 7 8 10 12 14 16
- A \(\cap\) B \(\cap\) C
sort(Reduce(union, list(A,B, C)))
[1] 1 2 3 4 5 6 7 8 9 10 12 14 15 16 18
Are you in for a challenge?
Which elements are in intersections of sets, but not in all sets?
(AB <- intersect(A,B)) # Pairwise intersection A and B; stored as set AB
[1] 2 4 6
(AC <- intersect(A,C)) # Id, for A and C
[1] 6 12
(BC <- intersect(B,C)) # Id, for B and C
[1] 3 6
(sort(pairs <- Reduce(union, list(AB,AC,BC)))) # All pairwise intersection combined, stored as "pairs"
[1] 2 3 4 6 12
(trios <- Reduce(intersect, list(A,B,C))) # Elements present in the intersection of all three sets
[1] 6
sort(setdiff(pairs, trios)) # Elements in pairs, but not in trios
[1] 2 3 4 12
You see that elements 2, 3, 4 and 12 are in pairwise intersections, but not in the three-way intersection!
Graphically, just to convince you:
Exercise 2
Are the following statements true or false?
- {2,4} \(\subset\) (A \(\cap\) B)
- 6 \(\in\) (A \(\cap\) B \(\cap\) C)
- {6} \(\subset\) (A \(\cap\) B)
- (A \(\cap\) B) \(\subset\) (A \(\cup\) B)
Solutions
- {2,4} \(\subset\) (A \(\cap\) B)
SetTest <- c(2,4)
all(SetTest %in% A)
[1] TRUE
- 6 \(\in\) (A \(\cap\) B \(\cap\) C)
- {6} \(\subset\) (A \(\cap\) B)
SetTest <- c(6)
trios
[1] 6
all(SetTest %in% trios) # Remember we stored the intersect as "trios"!
[1] TRUE
# Alternatively, define 6 as a single element rather than a set of one element
6 %in% trios
[1] TRUE
- (A \(\cap\) B) \(\subset\) (A \(\cup\) B)
This is true in general. Elements in the intersection of A and B, are by definition part of both A and B. The intersection is therefore a subset of all elements in A or B!
Just to train the formulation of this exercise in R:
all(intersect(A, B) %in% union(A, B))
[1] TRUE
Exercise 3
A = {1,2,3,4,5,6} B = {5,6} C = {1,2,5,6} D = {2,3,4} E = {2,3,4,5}
Complete the statement with one of the symbols:
- \(\in\) (element of)
- \(\notin\) (not an element of)
- \(\subset\) (subset of)
- \(\supset\) (superset; if A is subset of B, then B is superset of A)
- \(\cap\) (intersection)
- \(\cup\)
- B….C
- B….C = B
- B….C = C
- B….D = 0 (empty set)
- C….D = A
- D….E = D
- 4….C \(\cap\) B
- D….E….A
Solutions
Use reason to answer each of the questions! As an additional challenge, formulate the sets in R, and use any of the commands introduced in this chapter to check your answer!
- \(\subset\) (B is obviously a subset of C, as all elements of B are also in C)
- \(\cap\) (the intersect of B and C, is equal to B, as B is a subset of B)
- \(\cup\) (the combined elements of B and C, are equal to C, as C is a superset of B)
- \(\cap\) (as B and D have no elements in common, the interesect is the empty set)
- \(\cup\) (all elements of C and D combined, match A)
- \(\cap\) or \(\subset\)
- \(\notin\) (4 is not part of the intersection of B and C; it is not even part of the union of B and C)
- \(\subset\) (D is a subset of E, which in turn is a subset of A; you can conclude that therefore D is a subset of A)
Working with data files (in data science) requires logicaal thinking. Set theory is a good exercise in logical thinking!
Exercise 4
Consider the following sets.
- A = {x | x is an even number and x<20; x is a positive natural number}
- B = {x | x is a multiple of 3 and x<20; x is a positive natural number}
- C = {x | x is an odd numbber and x<20; x is a positive natural number}
This looks cryptic. Set A, for example, reads like the set of numbers x conditioned by the following rules: x is a positive natural number (1, 2, 3 to infinity), smaller than 20 and divisible by 2. We can enumerate these numbers easily: 2, 4, 6 .. up to 18.
Set B then is 3, 6, 9 .. up to 18).
Set C is 1, 3, 5 .. up to 19.
Determine:
- A \(\cap\) B
- A \(\cap\) B \(\cap\) C
- \(\cup\) B
Again, create the sets in R and use the proper functions to get the solutions!
Answers:
- {6, 12, 18}
- 0 (empty set)
- {2,3,4,6,8,9,10,11,12,14,15,16,18}
