R02-STA1511
Random Variable
A Random Variable is a numerical description of the results of an experiment.
Discrete Random Variable
- A discrete random variable is a random variable with distinct or discontinous outcomes.
- A discrete random variable has a value of integers, can be finite or infinity.
- Example
X: The number of car accidents in a city.
Y: The number of customers who come to a bank
Continous Random Variable
- A continuous random variable can have any value (including decimal numbers) in in a certain interval.
- Example:
X: The depth of drilling to find oil
Y: The weight of a truck in a truck-weighing station
Distribution of Discrete Random Variable
1. Binomial Distribution
Random varaible \(X\) is distributed \(X∼b(n,p)\) with mean \(μ=np\) and variance \(σ^{2}=np(1−p)\) if \(X\) is the count of successful events in n identical and independent Bernoulli trials with constant probability of success \(p\).
R function
dbinom
(x, size, prob) is the probability of x successes in size trials when the probability of success is prob.R function
pbinom
(q, size, prob, lower.tail) is the cumulative probability (lower.tail = TRUE for left tail, lower.tail = FALSE for right tail) of less than or equal to q successes.R function
rbinom
(n, size, prob) returns n random numbers from the binomial distribution x~b(size,prob).
Example of use:
- What is the probability of <=5 heads in 10 coin flips where probability of heads is 0.3?
dbinom(5,size = 10, p = 0.3)+dbinom(4,size = 10, p = 0.3)+dbinom(3,size = 10, p = 0.3)+dbinom(2,size = 10, p = 0.3)+dbinom(1,size = 10, p = 0.3)+dbinom(0,size = 10, p = 0.3)
## [1] 0.952651
# exact
pbinom(q = 5, size = 10, p = 0.3, lower.tail = TRUE)
## [1] 0.952651
library(dplyr)
library(ggplot2)
data.frame(heads = 0:10,
pmf = dbinom(x = 0:10, size = 10, prob = 0.3),
cdf = pbinom(q = 0:10, size = 10, prob = 0.3,
lower.tail = TRUE)) %>%
mutate(Heads = ifelse(heads <= 5, "<=5", "other")) %>%
ggplot(aes(x = factor(heads), y = cdf, fill = Heads)) +
geom_col() +
geom_text(
aes(label = round(cdf,2), y = cdf + 0.01),
position = position_dodge(0.9),
size = 3,
vjust = 0
+
) labs(title = "Probability of X <= 5 successes.",
subtitle = "b(10, .3)",
x = "Successes (x)",
y = "probability")
2.What is the expected number and variance of heads in 25 coin flips where probability of heads is 0.3?
#Expected number : exact
=0.3
p=25
n
<-n*p
E E
## [1] 7.5
#variance : exact
# Variance = n*p*q
=1-0.3 #probability of failed
q
<-25 * 0.3 * (1 - 0.3)
variance variance
## [1] 5.25
2. Poisson Distribution
Random varaible \(X\) is distributed
\(X∼P(λ)\) with mean \(μ=λ\) and variance \(σ^{2}=λ\) if \(X=x\) is the number of successes in \(n\) (many) trials when the
probability of success
\(λ/n\) is small
.
R function
dpois
(x, lambda) is the probability of x successes in a period when the expected number of events is lambda.R function
ppois
(q, lambda, lower.tail) is the cumulative probability (lower.tail = TRUE for left tail, lower.tail = FALSE for right tail) of less than or equal to q successes.R function
rpois
(n, lambda) returns n random numbers from the Poisson distribution x ~ P(lambda).R function
qpois
(p, lambda, lower.tail returns the value (quantile) at the specified cumulative probability (percentile) p.
Example of use:
What is the probability of making 2 to 4 sales in a week if the average sales rate is 3 per week?
# Using exact probability
dpois(x = 2, lambda = 3) +
dpois(x = 3, lambda = 3) +
dpois(x = 4, lambda = 3)
## [1] 0.616115
#using ppois
ppois(4, lambda = 3,lower.tail = TRUE)- ppois(2, lambda = 3,lower.tail = TRUE)+dpois(x = 2, lambda = 3)
## [1] 0.616115
Distribution of Continous Random Variable
1. The F Distribution
The F distribution has numerous applications. The F test is used in to test whether two distributions are equivalent \(H_{0}: \sigma_A^{2} = \sigma_B^{2}\).
Like the chi-square distribution, the F distribution contains only positive values and in nonsymmetrical. There is an F distribution for each degree of freedom associated with \(s_{A^{2}}\) and \(s_{B^{2}}\)
R function
df
(x, df1, df2) is the probability of F equalling x when the degrees of freedom are df1 and df2.R function
pf
(q, df1, df2, lower.tail) is the cumulative probability (lower.tail = TRUE for left tail, lower.tail = FALSE for right tail) of less than or equal to value q.R function
qf
(p, df1, df2, lower.tail) is the value of x at the qth percentile (lower.tail = TRUE). R function rf(n, df1, df2) returns n random numbers from the F distribution.
library(dplyr)
library(ggplot2)
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.1.3
data.frame(f = 0:1000 / 100) %>%
mutate(df_10_20 = df(x = f, df1 = 10, df2 = 20),
df_05_10 = df(x = f, df1 = 5, df2 = 10)) %>%
gather(key = "df", value = "density", -f) %>%
ggplot() +
geom_line(aes(x = f, y = density, color = df)) +
labs(title = "F at Various Degrees of Freedom",
x = "F",
y = "Density")
2. Chi-Square Distribution
The chi-squared distribution has numerous applications. The Chi-squared test of population variance tests the likelihood of a hypothesized population variance. The Chi-squared goodness of fit test tests goodness of fit in categorical data analysis, and the Chi-square test of independence tests independence.
R function
dchisq
(x, df) is the probability of χ2 equalling x when the degrees of freedom is df.R function
pchisq
(q, sd, lower.tail) is the cumulative probability (lower.tail = TRUE for left tail, lower.tail = FALSE for right tail) of less than or equal to value q.R function
rchisq
(n, df) returns n random numbers from the chi-square distribution.R function
qchisq
(p, df, lower.tail) is the value of x at the qth percentile (lower.tail = TRUE).
library(dplyr)
library(ggplot2)
library(tidyr)
data.frame(chisq = 0:7000 / 100) %>%
mutate(df_05 = dchisq(x = chisq, df = 5),
df_15 = dchisq(x = chisq, df = 15),
df_30 = dchisq(x = chisq, df = 30)) %>%
gather(key = "df", value = "density", -chisq) %>%
ggplot() +
geom_line(aes(x = chisq, y = density, color = df)) +
labs(title = "Chi-Square at Various Degrees of Freedom",
x = "Chi-square",
y = "Density")
3. Normal Distribution
R function
dnorm
(x, mean, sd) is the probability of x when the mean is mean and the standard deviation is sd.R function
pnorm
(q, mean, sd, lower.tail) is the cumulative probability (lower.tail = TRUE for left tail, lower.tail = FALSE for right tail) of less than or equal to value q.R function
rnorm
(n, mean, sd) returns n random numbers from the normal distribution \(X\)~\(N(\mu, \sigma^{2})\).R function
qnorm
(p, mean, sd, lower.tail) is the value of x at the qth percentile (lower.tail = TRUE).
Example of use:
- IQ scores are distributed \(X\)∼\(N(100,144)\). What is the probability a randomly selected person’s IQ is <90?
= 100
mean1 = sqrt(144)
sd1 = 90
x1
# exact
pnorm(q = x1, mean = mean1, sd = sd1, lower.tail = TRUE)
## [1] 0.2023284
- IQ scores are distributed \(X\)∼\(N(100,16^{2})\). What is the probability a randomly selected person’s IQ is between 92 and 114?
= 100
my_mean = 16
my_sd = 92
my_x_l = 114
my_x_h # exact
<-pnorm(q = my_x_h, mean = my_mean, sd = my_sd, lower.tail = TRUE) -
probab2pnorm(q = my_x_l, mean = my_mean, sd = my_sd, lower.tail = TRUE)
probab2
## [1] 0.5006755
EXCERCISE
Use this data (download here) to make box plot, pie chart, and histogram.
Let y be a normal random variable with \(\mu=500\) and \(\sigma^{2}=100\) . Find the following probabilities:
P(500<y<665)
P(y>665)
P(y<500)
Suppose the probability that a drug produces a certain side effect is p = 0.3% and n = 3000 patients in a clinical trial receive the drug. What is the probability 4 people experience the side effect?
The average number of car accidents in a city per month is 5.
What is the probability that there will be at most 2 accidents for the next month?
What is the expected value of the number of accidents in 1 year?