To calculate the expected value enter the two vectors into exp_value(outcomes, probabilities)
exp_value <- function(x, p) {
return(sum(x %*% p))
}
exp_value(c(-50,-25,0,25,50), c(0.35,0.25,0.15,0.15,0.1))
## [1] -15
Enter the given vector to calculate the variance for (x), and give the second parameter TRUE for a sample calculation and FALSE for a population calculation (check_type), as var(vector, bool).
var <- function(x, check_type) {
xAvg = sum(x)/length(x)
if (check_type == TRUE) return(sum(((x - xAvg)^2)/(length(x)-1)))
else return(sum(((x - xAvg)^2)/(length(x))))
}
var(c(6,3,8,5,3), TRUE)
## [1] 4.5
To calculate Standard Deviation we work out the mean and thereafter subtract the mean for each number and square the result. Then work out the mean of the squared differences. Lastly we square root the mean to get the Standard Deviation. In short we take the square root of the variance.
sd <- function(x) {
mu = sum(x)/length(x)
return(sqrt((sum((x - mu)^2))/length(x)))
}
sd(c(5,6,5,10,9))
## [1] 2.097618
To calculate the correlation between two RV x and y we calculate the Covariance. The covariance is every index of x minus the mean of x which is then multiplied by y minus mean of y, lastly we divide the whole calculation by the sample N-1
coVar <- function(x,y) {
mu_x = sum(x)/length(x)
mu_y = sum(y)/length(y)
c_var = (sum((x - mu_x)%*%(y - mu_y)))/(length(x)-1)
return(c_var)
}
# Example of two stock prices changing over 5 years
coVar(c(1692,1978,1884,2151,2519), c(68,102,110,112,154))
## [1] 9107.3
Check Variable Indepence on 2 or more non-discrete RVs.
library(MASS)
tbl = table(c(1,2,3), c(4,5,6))
tbl
##
## 4 5 6
## 1 1 0 0
## 2 0 1 0
## 3 0 0 1
chisq.test(tbl)
## Warning in chisq.test(tbl): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: tbl
## X-squared = 6, df = 4, p-value = 0.1991
unif_area <- function(min = 0, max = 1, lb, ub, col = 1,
acolor = "lightgray", ...) {
x <- seq(min - 0.25 * max, max + 0.25 * max, 0.001)
if (missing(lb)) {
lb <- min(x)
}
if (missing(ub)) {
ub <- max(x)
}
if(max < min) {
stop("'min' must be lower than 'max'")
}
x2 <- seq(lb, ub, length = 1000)
plot(x, dunif(x, min = min, max = max),
xlim = c(min - 0.25 * max, max + 0.25 * max), type = "l",
ylab = "f(x)", lty = 0, ...)
y <- dunif(x2, min = min, max = max)
polygon(c(lb, x2, ub), c(0, y, 0), col = acolor, lty = 0)
segments(min, 1/(max - min), max, 1/(max - min), lwd = 2, col = col)
segments(min - 2 * max, 0, min, 0, lwd = 2, col = col)
segments(max, 0, max + 2 * max, 0, lwd = 2, col = col)
points(min, 1/(max - min), pch = 19, col = col)
points(max, 1/(max - min), pch = 19, col = col)
segments(min, 0, min, 1/(max - min), lty = 2, col = col, lwd = 2)
segments(max, 0, max, 1/(max - min), lty = 2, col = col, lwd = 2)
points(0, min, pch = 21, col = col, bg = "white")
points(max, min, pch = 21, col = col, bg = "white")
}
Probability of a variable X taking a value lower than x, example: The probability of waiting less than 15 minutes:
punif(c(15), min = 0, max = 60)
## [1] 0.25
calculate the corresponding quantile for any probability (p) for a uniform distribution with the qunif function, which has the following syntax:
qunif(0.5, min = 0, max = 60)
## [1] 30
unif_area(min = 0, max = 60, lb = 0, ub = 30)
text(15, 0.008, "50%", srt = 90, cex = 1.2)
arrows(38, 0.005, 31, 0.0005, length = 0.15)
text(44, 0.006, "qunif(0.5)")
library("Rlab")
## Rlab 2.15.1 attached.
##
## Attaching package: 'Rlab'
## The following object is masked from 'package:MASS':
##
## michelson
## The following objects are masked from 'package:stats':
##
## dexp, dgamma, dweibull, pexp, pgamma, pweibull, qexp, qgamma,
## qweibull, rexp, rgamma, rweibull
## The following object is masked from 'package:datasets':
##
## precip
x_dbern <- seq(0, 10, by = 1) # Specify x-values for dbern function
y_dbern <- dbern(x_dbern, prob = 0.7) # Apply dbern function
plot(y_dbern, type = "o") # Plot dbern values
x_pbern <- seq(0, 10, by = 1) # Specify x-values for pbern function
y_pbern <- pbern(x_pbern, prob = 0.7) # Apply pbern function
plot(y_pbern, type = "o") # Plot pbern values
x_qbern <- seq(0, 1, by = 0.1) # Specify x-values for qbern function
y_qbern <- qbern(x_qbern, prob = 0.7) # Apply qbern function
plot(y_qbern, type = "o") # Plot qbern values
In order to calculate the binomial probability function for a set of values x, a number of trials n and a probability of success p you can make use of the dbinom function, which has the following syntax:
For instance, if you want to calculate the binomial probability mass function for x = 1, 2, , 10 and a probability of succces in each trial of 0.2, you can type:
dbinom(x = 1:10, size = 10, prob = 0.2)
## [1] 0.2684354560 0.3019898880 0.2013265920 0.0880803840 0.0264241152
## [6] 0.0055050240 0.0007864320 0.0000737280 0.0000040960 0.0000001024
# Grid of X-axis values
x <- 1:80
# size = 80, prob = 0.2
plot(dbinom(x, size = 80, prob = 0.2), type = "h", lwd = 2,
main = "Binomial probability function",
ylab = "P(X = x)", xlab = "Number of successes")
# size = 80, prob = 0.3
lines(dbinom(x, size = 80, prob = 0.3), type = "h",
lwd = 2, col = rgb(1,0,0, 0.7))
# size = 80, prob = 0.4
lines(dbinom(x, size = 80, prob = 0.4), type = "h",
lwd = 2, col = rgb(0, 1, 0, 0.7))
# Add a legend
legend("topright", legend = c("80 0.2", "80 0.3", "80 0.4"),
title = "size prob", title.adj = 0.95,
lty = 1, col = 1:3, lwd = 2, box.lty = 0)
In order to calculate the probability of a variable x following a binomial distribution taking values lower than or equal to x you can use the pbinom function. By ways of illustration, the probability of the success occurring less than 3 times if the number of trials is 10 and the probability of success is 0.3 is:
pbinom(3, size = 10, prob = 0.3) # 0.6496107 or 64.96%
## [1] 0.6496107
As the binomial distribution is discrete, the cumulative probability can be calculated adding the corresponding probabilities of the probability function. The following R function allows visualizing the probabilities that are added based on a lower bound and an upper bound.
# size: number of trials (n > = 0)
# prob: probability of success on each trial
# lb: lower bound of the sum
# ub: upper bound of the sum
# col: color
# lwd: line width
binom_sum <- function(size, prob, lb, ub, col = 4, lwd = 1, ...) {
x <- 0:size
if (missing(lb)) {
lb <- min(x)
}
if (missing(ub)) {
ub <- max(x)
}
plot(dbinom(x, size = size, prob = prob), type = "h", lwd = lwd, ...)
if(lb == min(x) & ub == max(x)) {
color <- col
} else {
color <- rep(1, length(x))
color[(lb + 1):ub ] <- col
}
lines(dbinom(x, size = size, prob = prob), type = "h",
col = color, lwd = lwd, ...)
}
# Grid of X-axis values
x <- 1:80
# size = 80, prob = 0.2
plot(pbinom(x, size = 80, prob = 0.2), type = "s", lwd = 2,
main = "Binomial distribution function",
xlab = "Number of successes", ylab = "F(x)")
# size = 80, prob = 0.3
lines(pbinom(x, size = 80, prob = 0.3), type = "s",
lwd = 2, col = 2)
# size = 80, prob = 0.4
lines(pbinom(x, size = 80, prob = 0.4), type = "s",
lwd = 2, col = 3)
# Add a legend
legend("bottomright", legend = c("80 0.2", "80 0.3", "80 0.4"),
title = "size prob", title.adj = 0.95,
lty = 1, col = 1:3, lwd = 2, box.lty = 0)
In this section we will review a more complete example to understand how to calculate binomial probabilities in several scenarios. Consider that a basketball player scores 4 out of 10 baskets (p = 0.4). If the player throws 20 baskets (20 trials), The probability of scoring 6 or less baskets:
sum(dbinom(0:6, size = 20, prob = 0.4)) # 0.2500107 or 25%
## [1] 0.2500107
binom_sum(size = 20, prob = 0.4, ub = 6, lwd = 2,
ylab = "P(X = x)", xlab = "Number of successes")
The probability of scoring more than 12 baskets
pbinom(12, size = 20, prob = 0.4, lower.tail = FALSE) # 0.02102893 or 2.1%
## [1] 0.02102893
The probability of scoring between 7 and 11 baskets, P(7≤X≤11), is:
pbinom(11, size = 20, prob = 0.4) - pbinom(7, size = 20, prob = 0.4) # 0.5275807 or 52.8%
## [1] 0.5275807
Given a probability or a set of probabilities, the qbinom function allows you to obtain the corresponding binomial quantile. As an example, the binomial quantile for the probability 0.4 if n = 5 and p = 0.7 is:
qbinom(p = 0.4, size = 5, prob = 0.7)
## [1] 3
In the first example, we will illustrate the density of the geometric distribution in a plot. As a first step, we need to create a vector of quantiles:
x_dgeom <- seq(0, 20, by = 1) # Specify x-values for dgeom function
Now, we can apply the dgeom function to this vector as shown in the R code below. Note that I’m using a probability of 0.5 (i.e. 50%) in the examples of this tutorial. You may modify this probability to your specific preferences.
y_dgeom <- dgeom(x_dgeom, prob = 0.5) # Apply dgeom function
plot(y_dgeom) # Plot dgeom values
Example 2 shows how to draw a plot of the geometric cumulative distribution function (CDF). As in Example 1, we first need to create a sequence of quantiles:
x_pgeom <- seq(0, 20, by = 1) # Specify x-values for pgeom function
We can use this sequence of quantiles as input for the pgeom function:
y_pgeom <- pgeom(x_pgeom, prob = 0.5) # Apply pgeom function
And then, we can apply the plot function to draw a graphic containing the values of the geometric cumulative distribution function:
plot(y_pgeom) # Plot pgeom values
This explains how to create an R graphic showing the negative binomial density. As a first step, we need to create a sequence with non-negative integers in R
x_dnbinom <- seq(0, 100, by = 1) # Specify x-values for dnbinom function
Now, we can use the dnbinom R function to return the corresponding negative binomial values of each element of our input vector with non-negative integers. Note that we are using a size (i.e. number of trials) and a probability of 0.5 (i.e. 50%) in this example:
y_dnbinom <- dnbinom(x_dnbinom, size = 100, prob = 0.5) # Apply dnbinom function
plot(y_dnbinom) # Plot dnbinom values
I’ll show you how to plot the cumulative distribution function of the negative binomial distribution based on the pnbinom command. Again, we need to create a sequence on non-negative integers as input for the pnbinom function:
x_pnbinom <- seq(0, 100, by = 1) # Specify x-values for pnbinom function
y_pnbinom <- pnbinom(x_pnbinom, size = 100, prob = 0.5) # Apply pnbinom function
plot(y_pnbinom) # Plot pnbinom values
Similar to the R syntax of Examples 1 and 2, we can create a plot containing the negative binomial quantile function. As input, we need to specify a vector of probabilities:
x_qnbinom <- seq(0, 1, by = 0.01) # Specify x-values for qnbinom function
y_qnbinom <- qnbinom(x_qnbinom, size = 100, prob = 0.5) # Apply qnbinom function
plot(y_qnbinom) # Plot qnbinom values
In order to create a poisson density in R, we first need to create a sequence of integer values:
x_dpois <- seq(- 5, 30, by = 1) # Specify x-values for dpois function
Now we can return the corresponding values of the poisson density for each of these values. In the example, we use a lambda of 10:
y_dpois <- dpois(x_dpois, lambda = 10) # Apply dpois function
plot(y_dpois) # Plot dpois values
In the second example, we will use the ppois R command to plot the cumulative distribution function (CDF) of the poisson distribution.
Again, we first need to specify a vector of values, for which we want to return the corresponding value of the poisson distribution:
x_ppois <- seq(- 5, 30, by = 1) # Specify x-values for ppois function
y_ppois <- ppois(x_ppois, lambda = 10) # Apply ppois function
plot(y_ppois) # Plot ppois values
Similar to the previous examples, we can also create a plot of the poisson quantile function. Let’s create a sequence of values to which we can apply the qpois function:
x_qpois <- seq(0, 1, by = 0.005) # Specify x-values for qpois function
Now, we can apply the qpois function with a lambda of 10 as follows:
y_qpois <- qpois(x_qpois, lambda = 10) # Apply qpois function
plot(y_qpois) # Plot qpois values