SandStat

Simple Operations
Distributions

Simple Operations

Expected Value

To calculate the expected value enter the two vectors into exp_value(outcomes, probabilities)

exp_value <- function(x, p) {
  return(sum(x %*% p))
}

Example:

exp_value(c(-50,-25,0,25,50), c(0.35,0.25,0.15,0.15,0.1))

## [1] -15

Variance with Samples and Populations

Enter the given vector to calculate the variance for (x), and give the second parameter TRUE for a sample calculation and FALSE for a population calculation (check_type), as var(vector, bool).

var <- function(x, check_type) {
  xAvg = sum(x)/length(x)
  if (check_type == TRUE) return(sum(((x - xAvg)^2)/(length(x)-1)))
  else return(sum(((x - xAvg)^2)/(length(x))))
}

Example:

var(c(6,3,8,5,3), TRUE)

## [1] 4.5

Standard Deviation of a population

To calculate Standard Deviation we work out the mean and thereafter subtract the mean for each number and square the result. Then work out the mean of the squared differences. Lastly we square root the mean to get the Standard Deviation. In short we take the square root of the variance.

sd <- function(x) {
  mu = sum(x)/length(x)
  return(sqrt((sum((x - mu)^2))/length(x)))
}

Example:

sd(c(5,6,5,10,9))

## [1] 2.097618

Covariance (used for Correlation Coefficient)

To calculate the correlation between two RV x and y we calculate the Covariance. The covariance is every index of x minus the mean of x which is then multiplied by y minus mean of y, lastly we divide the whole calculation by the sample N-1

coVar <- function(x,y) {
  mu_x = sum(x)/length(x)
  mu_y = sum(y)/length(y)
  c_var = (sum((x - mu_x)%*%(y - mu_y)))/(length(x)-1)
  return(c_var)
}

Example:

# Example of two stock prices changing over 5 years

coVar(c(1692,1978,1884,2151,2519), c(68,102,110,112,154))

## [1] 9107.3

Test of Independence between variables (x,y)

Check Variable Indepence on 2 or more non-discrete RVs.

library(MASS) 
tbl = table(c(1,2,3), c(4,5,6))
tbl

##    
##     4 5 6
##   1 1 0 0
##   2 0 1 0
##   3 0 0 1

Example:

chisq.test(tbl)

## Warning in chisq.test(tbl): Chi-squared approximation may be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  tbl
## X-squared = 6, df = 4, p-value = 0.1991

Distributions

Uniform

Uniform Plot

unif_area <- function(min = 0, max = 1, lb, ub, col = 1,
                      acolor = "lightgray", ...) {
    x <- seq(min - 0.25 * max, max + 0.25 * max, 0.001) 
    
    if (missing(lb)) {
       lb <- min(x)
    }
    if (missing(ub)) {
        ub <- max(x)
    }
    if(max < min) {
        stop("'min' must be lower than 'max'")
    }

    x2 <- seq(lb, ub, length = 1000) 
    plot(x, dunif(x, min = min, max = max),
         xlim = c(min - 0.25 * max, max + 0.25 * max), type = "l",
         ylab = "f(x)", lty = 0, ...)   

    y <- dunif(x2, min = min, max = max)
    polygon(c(lb, x2, ub), c(0, y, 0), col = acolor, lty = 0)
    segments(min, 1/(max - min), max, 1/(max - min), lwd = 2, col = col)
    segments(min - 2 * max, 0, min, 0, lwd = 2, col = col)
    segments(max, 0, max + 2 * max, 0, lwd = 2, col = col)
    points(min, 1/(max - min), pch = 19, col = col)
    points(max, 1/(max - min), pch = 19, col = col)
    segments(min, 0, min, 1/(max - min), lty = 2, col = col, lwd = 2)
    segments(max, 0, max, 1/(max - min), lty = 2, col = col, lwd = 2)
    points(0, min, pch = 21, col = col, bg = "white")
    points(max, min, pch = 21, col = col, bg = "white")
}

Calculate the Uniform Cumulative Distribution

Probability of a variable X taking a value lower than x, example: The probability of waiting less than 15 minutes:

punif(c(15), min = 0, max = 60)

## [1] 0.25

Parameters
- q - Vector of quantiles
- min - Lower limit of the distribution (a)
- max - Upper limit of the distribution (b)
- lower.tail - If TRUE, probabilities are P(X <= x), or P(X > x) otherwise
- log.p - If TRUE, probabilities are given as log

Corresponding quantile for any probability (p) for a Uniform Distribution

calculate the corresponding quantile for any probability (p) for a uniform distribution with the qunif function, which has the following syntax:

qunif(0.5, min = 0, max = 60)

## [1] 30

Plot Quantile for a Uniform Distribution

unif_area(min = 0, max = 60, lb = 0, ub = 30)
text(15, 0.008, "50%", srt = 90, cex = 1.2)
arrows(38, 0.005, 31, 0.0005, length = 0.15)
text(44, 0.006, "qunif(0.5)")

Bernoulli

PDF (dbern)

library("Rlab")

## Rlab 2.15.1 attached.

## 
## Attaching package: 'Rlab'

## The following object is masked from 'package:MASS':
## 
##     michelson

## The following objects are masked from 'package:stats':
## 
##     dexp, dgamma, dweibull, pexp, pgamma, pweibull, qexp, qgamma,
##     qweibull, rexp, rgamma, rweibull

## The following object is masked from 'package:datasets':
## 
##     precip

x_dbern <- seq(0, 10, by = 1)                    # Specify x-values for dbern function
y_dbern <- dbern(x_dbern, prob = 0.7)            # Apply dbern function
plot(y_dbern, type = "o")                        # Plot dbern values

CDF (pbern)

x_pbern <- seq(0, 10, by = 1)                    # Specify x-values for pbern function
y_pbern <- pbern(x_pbern, prob = 0.7)            # Apply pbern function
plot(y_pbern, type = "o")                        # Plot pbern values

Quantile (qbern)

x_qbern <- seq(0, 1, by = 0.1)                   # Specify x-values for qbern function
y_qbern <- qbern(x_qbern, prob = 0.7)            # Apply qbern function
plot(y_qbern, type = "o")                        # Plot qbern values

Binominal

Dbinom - Probability (PDF) (values, trials, success)

In order to calculate the binomial probability function for a set of values x, a number of trials n and a probability of success p you can make use of the dbinom function, which has the following syntax:
For instance, if you want to calculate the binomial probability mass function for x = 1, 2, , 10 and a probability of succces in each trial of 0.2, you can type:

dbinom(x = 1:10, size = 10, prob = 0.2)

##  [1] 0.2684354560 0.3019898880 0.2013265920 0.0880803840 0.0264241152
##  [6] 0.0055050240 0.0007864320 0.0000737280 0.0000040960 0.0000001024

Plotting the Binominal Probability function

# Grid of X-axis values
x <- 1:80

# size = 80, prob = 0.2
plot(dbinom(x, size = 80, prob = 0.2), type = "h", lwd = 2,
     main = "Binomial probability function",
     ylab = "P(X = x)", xlab = "Number of successes")

# size = 80, prob = 0.3
lines(dbinom(x, size = 80, prob = 0.3), type = "h",
      lwd = 2, col = rgb(1,0,0, 0.7))

# size = 80, prob = 0.4
lines(dbinom(x, size = 80, prob = 0.4), type = "h",
      lwd = 2, col = rgb(0, 1, 0, 0.7))

# Add a legend
legend("topright", legend = c("80  0.2", "80  0.3", "80  0.4"),
       title = "size  prob", title.adj = 0.95,
       lty = 1, col = 1:3, lwd = 2, box.lty = 0)

Pibnom - Distribution (CDF)

In order to calculate the probability of a variable x following a binomial distribution taking values lower than or equal to x you can use the pbinom function. By ways of illustration, the probability of the success occurring less than 3 times if the number of trials is 10 and the probability of success is 0.3 is:

pbinom(3, size = 10, prob = 0.3) # 0.6496107 or 64.96%

## [1] 0.6496107

Visualizing Pbinom

As the binomial distribution is discrete, the cumulative probability can be calculated adding the corresponding probabilities of the probability function. The following R function allows visualizing the probabilities that are added based on a lower bound and an upper bound.

# size: number of trials (n > = 0)
# prob: probability of success on each trial
# lb: lower bound of the sum
# ub: upper bound of the sum
# col: color
# lwd: line width
binom_sum <- function(size, prob, lb, ub, col = 4, lwd = 1, ...) {
    x <- 0:size
    
    if (missing(lb)) {
       lb <- min(x)
    }
    if (missing(ub)) {
        ub <- max(x)
    }
      
    plot(dbinom(x, size = size, prob = prob), type = "h", lwd = lwd, ...)
  
    if(lb == min(x) & ub == max(x)) {
        color <- col
    } else {
        color <- rep(1, length(x))
        color[(lb + 1):ub ] <- col
    }
    
    lines(dbinom(x, size = size, prob = prob), type = "h",
          col =  color, lwd = lwd, ...)
}

# Grid of X-axis values
x <- 1:80

# size = 80, prob = 0.2
plot(pbinom(x, size = 80, prob = 0.2), type = "s", lwd = 2,
     main = "Binomial distribution function",
     xlab = "Number of successes", ylab = "F(x)")

# size = 80, prob = 0.3
lines(pbinom(x, size = 80, prob = 0.3), type = "s",
      lwd = 2, col = 2)

# size = 80, prob = 0.4
lines(pbinom(x, size = 80, prob = 0.4), type = "s",
      lwd = 2, col = 3)

# Add a legend
legend("bottomright", legend = c("80  0.2", "80  0.3", "80  0.4"),
       title = "size  prob", title.adj = 0.95,
       lty = 1, col = 1:3, lwd = 2, box.lty = 0)

Pbinom Example - Basketball

In this section we will review a more complete example to understand how to calculate binomial probabilities in several scenarios. Consider that a basketball player scores 4 out of 10 baskets (p = 0.4). If the player throws 20 baskets (20 trials), The probability of scoring 6 or less baskets:

sum(dbinom(0:6, size = 20, prob = 0.4)) # 0.2500107 or 25%

## [1] 0.2500107

binom_sum(size = 20, prob = 0.4, ub = 6, lwd = 2,
          ylab = "P(X = x)", xlab = "Number of successes")

The probability of scoring more than 12 baskets

pbinom(12, size = 20, prob = 0.4, lower.tail = FALSE) # 0.02102893 or 2.1%

## [1] 0.02102893

The probability of scoring between 7 and 11 baskets, P(7≤X≤11), is:

pbinom(11, size = 20, prob = 0.4) - pbinom(7, size = 20, prob = 0.4) # 0.5275807 or 52.8%

## [1] 0.5275807

Qbinom - Quantile (Binominal Quantile)

Given a probability or a set of probabilities, the qbinom function allows you to obtain the corresponding binomial quantile. As an example, the binomial quantile for the probability 0.4 if n = 5 and p = 0.7 is:

qbinom(p = 0.4, size = 5, prob = 0.7)

## [1] 3

Geometric

Geometric Density

In the first example, we will illustrate the density of the geometric distribution in a plot. As a first step, we need to create a vector of quantiles:

x_dgeom <- seq(0, 20, by = 1)                         # Specify x-values for dgeom function

Now, we can apply the dgeom function to this vector as shown in the R code below. Note that I’m using a probability of 0.5 (i.e. 50%) in the examples of this tutorial. You may modify this probability to your specific preferences.

y_dgeom <- dgeom(x_dgeom, prob = 0.5)                 # Apply dgeom function
plot(y_dgeom)                                         # Plot dgeom values

Geometric Cumulative Distribution

Example 2 shows how to draw a plot of the geometric cumulative distribution function (CDF). As in Example 1, we first need to create a sequence of quantiles:

x_pgeom <- seq(0, 20, by = 1)                         # Specify x-values for pgeom function

We can use this sequence of quantiles as input for the pgeom function:

y_pgeom <- pgeom(x_pgeom, prob = 0.5)                 # Apply pgeom function

And then, we can apply the plot function to draw a graphic containing the values of the geometric cumulative distribution function:

plot(y_pgeom)                                         # Plot pgeom values

Negative Binominal

Negative Binomial Density

This explains how to create an R graphic showing the negative binomial density. As a first step, we need to create a sequence with non-negative integers in R

x_dnbinom <- seq(0, 100, by = 1)                         # Specify x-values for dnbinom function

Now, we can use the dnbinom R function to return the corresponding negative binomial values of each element of our input vector with non-negative integers. Note that we are using a size (i.e. number of trials) and a probability of 0.5 (i.e. 50%) in this example:

y_dnbinom <- dnbinom(x_dnbinom, size = 100, prob = 0.5)  # Apply dnbinom function
plot(y_dnbinom)                                          # Plot dnbinom values

Negative Binomial Cumulative Distribution Function

I’ll show you how to plot the cumulative distribution function of the negative binomial distribution based on the pnbinom command. Again, we need to create a sequence on non-negative integers as input for the pnbinom function:

x_pnbinom <- seq(0, 100, by = 1)                         # Specify x-values for pnbinom function
y_pnbinom <- pnbinom(x_pnbinom, size = 100, prob = 0.5)  # Apply pnbinom function
plot(y_pnbinom)                                          # Plot pnbinom values

Negative Binomial Quantile Function

Similar to the R syntax of Examples 1 and 2, we can create a plot containing the negative binomial quantile function. As input, we need to specify a vector of probabilities:

x_qnbinom <- seq(0, 1, by = 0.01)                        # Specify x-values for qnbinom function
y_qnbinom <- qnbinom(x_qnbinom, size = 100, prob = 0.5)  # Apply qnbinom function
plot(y_qnbinom)                                          # Plot qnbinom values

Poisson

Poisson Density in R

In order to create a poisson density in R, we first need to create a sequence of integer values:

x_dpois <- seq(- 5, 30, by = 1)                    # Specify x-values for dpois function

Now we can return the corresponding values of the poisson density for each of these values. In the example, we use a lambda of 10:

y_dpois <- dpois(x_dpois, lambda = 10)             # Apply dpois function
plot(y_dpois)                                      # Plot dpois values

Poisson Distribution Function

In the second example, we will use the ppois R command to plot the cumulative distribution function (CDF) of the poisson distribution.

Again, we first need to specify a vector of values, for which we want to return the corresponding value of the poisson distribution:

x_ppois <- seq(- 5, 30, by = 1)                    # Specify x-values for ppois function
y_ppois <- ppois(x_ppois, lambda = 10)             # Apply ppois function
plot(y_ppois)                                      # Plot ppois values

Poisson Quantile Function

Similar to the previous examples, we can also create a plot of the poisson quantile function. Let’s create a sequence of values to which we can apply the qpois function:

x_qpois <- seq(0, 1, by = 0.005)                   # Specify x-values for qpois function

Now, we can apply the qpois function with a lambda of 10 as follows:

y_qpois <- qpois(x_qpois, lambda = 10)             # Apply qpois function
plot(y_qpois)                                      # Plot qpois values