Mathematics for Statistics

The module illustrates all topics discussed the textbook, and shows how they can be implemented in R.

4.1 Relations between sets

We have discussed sets, in chapter 1.

We have also discussed differences and overlaps between sets.

We can relate sets to one another, by linking all elements of one set, to all elements of the other set. This gives the product of the two sets.

If we have two sets, a and b, consisting of four and three elements respectively, then the product of the two sets gives 12 pairs, as below. We use expand.grid() to generate the product, and store the result as a data frame.

a<-c(3,5,12,16)
b<-c(5,6,10,12)
ab <- as.data.frame(expand.grid(a,b)); ab

A relation between two sets is special version of the product. For example, we can consider only those pairs satisfying the condition that the element of a is greater than the element of b. This gives a subset of the product.

In the last line of code below, we select from the data frame only those records that satisfy the condition Var1>Var2. The variable names are generated by the software. We can apply our own labels, if we want to. Here, we will settle for the default names.

Between the brackets [] we specify the conditions. Since data frames have two dimensions, we have to specify both the rows (before the comma) and the columns (after the comma) that we want to keep. Here, we keep the rows that meet the condition, and we keep all variables.

a<-c(3,5,12,16)
b<-c(5,6,10,12)
ab <- as.data.frame(expand.grid(a,b))
abr <- ab[ab$Var1>ab$Var2,]; abr

4.2 Functions graphically displayed

We have already looked at plots of functions in the previous chapter. In that chapter, we examined pairs of linear equations, which are relations between x and y. These relations are often alled functions, in which, traditionally, y is considered a function of x.

Linear equations, as then name implies, can be graphically displayed as straight lines. Unless two lines run parallel or coincide, there is exactly one point where they intersect. That point is a unique combination of one pair (x,y) where both equations are true simultaneously.

Below, we plot the linear function $y = 0.5*x$. This can be rewritten as $0.5*x - y = 0$. The latter way is convenient when writing the corresponding R-code. The matrix is of size (1*2), that is one row and two columns. The right hand side for one equation, is just one number (0).

We have also applied our own labels, x and y, for the two variables, overriding the defaults x1 and x2.

library(matlib)

## Warning: package 'matlib' was built under R version 4.0.3

A <- matrix(c(0.5,-1),1,2)
b <- 0
showEqn(A,b,vars=list("x","y"))

## 0.5*x - 1*y  =  0

plotEqn(A,b,vars=list("x","y"))

## 0.5*x - 1*y  =  0

We can write

$0.5*x - y = 0$

or alternatively

$f(x) = 0.5*x$

indicating that $f(x)$ (or $g(x)$, and so on) are functions of $x$.

Consider two functions:

$f(x) = 0.5*x + 5$ and

$g(x) = 3*x - 2$

When plotting these functions by hand, it suffices to compute two ordered pairs, of values for $x$ and $f(x) - or, alternatively, $y$

For $f(x)$ we can derve those order pairs, for, for example, $x=0$ and $x=10$

For $x=0$, $f(x)= 0.5*0 + 5 = 5$, so we have the ordered pair (0,5)

For $x=10$, $f(x)= 0.5*10 + 5 = 10$, so we have the ordered pair (10,10)

Likewise, for $g(x)$ we can derive ordered pairs (0,-2) and (2,4). Check this for yourself.

library(matlib)
A <- matrix(c(-0.5,1),1,2)
b <- 5
C <- matrix(c(-3,1),1,2)
d <- -2
showEqn(A,b,vars=list("x","y"))

## -0.5*x + 1*y  =  5

showEqn(C,d,vars=list("x","y"))

## -3*x + 1*y  =  -2

plotEqn(A,b,vars=list("x","y"))

## -0.5*x + y  =  5

plotEqn(C,d,vars=list("x","y"))

## -3*x + y  =  -2

Above we have plotted the two functions separately. The automatic settings for the minimum and maximum values of the $x$ and $y$ axes see to it that the lines seemingly run parallel, while actually they don't.

It is better, of course, to draw both functions in one graph, to better compare their behaviors. Where do they cross the $x$ and $y$ axes? where do the lines intersect? Which line is steeper? And so on.

Below, we use ggplot which not only produces better looking graphs but it is also easier to adapt to our needs.

The commands look frightening, don't they. Don't worry, most of it is just an adaptation of what we found from smart googling. For example, the search string ggplot plot two functions guided us to this link from which we copied and adapted the code below.

library(ggplot2)
fx <- function(x) 0.5*x+5
gx <- function(x) 3*x-2
ggplot(data.frame(x=c(-10, 10)), aes(x=x)) + stat_function(fun=fx, color="red") + stat_function(fun=gx, color="blue") + geom_hline(yintercept=0, size=2)+ geom_vline(xintercept=0, size=2)

There's some things to note:

The blue line is steeper
The function $f(x)$ intersects with the y-axis at $y=5$
The lines do intersect.

In the previous chapter, section 3.4, we have learned how to solve equations with two unknowns. We can use the same approach for solving the values of $x$ and $y$ where the lines intersect!

# install.packages("matlib")
library(matlib)
A <- matrix(c(-0.5,-3,1,1),2,2)
b <- c(5,-2)
showEqn(A,b)

## -0.5*x1 + 1*x2  =   5 
##   -3*x1 + 1*x2  =  -2

plotEqn(A,b)

## -0.5*x[1] + x[2]  =   5 
##   -3*x[1] + x[2]  =  -2

Solve(A,b,fractions=TRUE)

## x1    =  14/5 
##   x2  =  32/5

Solve(A,b,fractions=FALSE)

## x1    =  2.8 
##   x2  =  6.4

Mathematics for Statistics - A Refresher

4.1 Relations between sets

4.2 Functions graphically displayed