What is a Random Variable?

Let’s start by breaking it down:

So, a random variable is a quantity that can take on different values depending on the outcome of some random event.
  • Example 1: Coin Flip

Think about flipping a coin. When you flip a coin, there are two possible outcomes: heads or tails. We can’t predict exactly what will happen, but we know one of these two results will occur. Now, let’s introduce a random variable here: Let’s call the random variable X. We say that \(X = 1\) if the coin lands on heads, and \(X = 0\) if it lands on tails. X is the random variable because its value depends on the outcome of the coin flip, which is random.

Probability Density Function

So we mentioned normal distribution \(f(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-(x-\mu)^2/2\sigma^2}\) and standard normal distribution \(f(x)=\frac{1}{\sqrt{2\pi}}e^{-x^2/2}\) We can always transform a normal random variable \(X\) to a standard normal random variable Z by using the relationship \(z=\frac{x-\mu}{\sigma}\).

x <- seq(-4, 4, length=1000) 
y <- dnorm(x) 
plot(x, y,type="l",lwd=1) 
#add one point
xp <- 2
yp <- dnorm(xp)
points(xp, yp, col="red",pch=19)
text(xp, yp+0.03,labels=paste0("(",xp,", ",round(yp,3),")"))
polygon(c(seq(xp, 4,length=500),4, xp), c(0, dnorm( seq(xp, 4,length=500)), 0), col=rgb(0, 0, 1, 0.2), border=NA)

We can use a Z-table to find P(X >2), as illustrated above , which represents the area under the normal the normal curve to the right of 2.

How to use distribution table:

  • Look at the row for \(z=2.0\).
  • Find the column corresponding to \(z=.00\).
  • At the intersection of the row for \(z=2.0\) and the column for \(z=.00\), you will find the probability to the right side of the 2
  • The area in the right-hand tail of the distribution corresponds to the probability that greater than 2.

How about the the following area;

Now, I encourage you to ask yourself what the following area represents and how to find this probability in the table.

plot(x, y,type="l",lwd=1)   
polygon(c(-4,  seq(-4, xp, length=500), xp), c(0, dnorm( seq(-4, xp, length=500)), 0), col=rgb(0, 0, 1, 0.2), border=NA)

And the following area:

plot(x, y,type="l",lwd=1) 
#add one point
xp <- 2
yp <- dnorm(xp)
polygon(c(-2,  seq(-2, xp, length=500), xp), c(0, dnorm( seq(-2, xp, length=500)), 0), col=rgb(0, 0, 1, 0.2), border=NA)

Now let’s generalize to the normal distribution with \(\mu =2.2, \sigma =1.5\). I want to find the probability P(X>1.3). Z is negative, areas are found by symmetry.

x <- seq(-4, 4, length=1000) 
y <- dnorm(x,mean=2.2,sd=1.5) 
plot(x, y,type="l",lwd=1) 

#it's not symmetric to x=0 anymore, let's increase the range.
x <- seq(-2, 6.4, length=1000) 
y <- dnorm(x,mean=2.2,sd=1.5) 
plot(x, y,type="l",lwd=1) 
xp <- 1.3
yp <- dnorm(xp,mean=2.2,sd=1.5)
polygon(c(seq(xp, 6.4, length=500),6.4, xp), c(0, dnorm(seq(xp,6.4, length=500),mean=2.2,sd=1.5), 0), col=rgb(0, 0, 1, 0.2), border=NA)

#### We can also use powerfull R to find those probabiltiy.

p1 <- pnorm(2) 
1- p1
## [1] 0.02275013
p2 <- pnorm(-2, mean = 0, sd = 1) 
prob <- p1 - p2 
xs <- seq(-4, xp, length=500)
ys <- dnorm(xs)
plot(x, y,type="l",lwd=1)  
polygon(c(-4, xs, xp), c(0, ys, 0), col=rgb(0, 0, 1, 0.2), border=NA)

Values above the average are given a plus sign; values below the average get a minus sign.

Exercise:

  • Finding the probability that a randomly selected value from a standard normal distribution is greater than \(Z=−1.5\). In mathematical terms, you need to calculate: \(P(Z>-1.5)\)

  • Finding the Area Under the Normal Distribution We have a random variable \(X\sim \mathcal{N}(2,2)\). Find the probability that a randomly selected value from this distribution falls between x=1 and x=3. Step-by-Step Guide:

  • We need to find the area under the normal distribution curve between \(x=1,x=3\) This area will tell us the probability that a randomly selected value lies between 1 and 3.

  • Convert to Z-scores using: \(Z=\frac{x-\mu}{\sigma}\)

  • Find the area Using table:

  • Calculate the area in-between. 0.383

1-pnorm(-1.5) #E1
## [1] 0.9331928
pnorm(3,mean=2,sd=2) - pnorm(1,mean=2,sd=2) #E2
## [1] 0.3829249
  • A real example The achievement scores for a college entrance examination are normally distributed with mean 75 and standard deviation 10. What fraction of the scores lies between 80 and 90? (0.2417)

t-Distribution

The t-distribution is often used in hypothesis testing. Its shape is similar to the normal distribution but with heavier tails.

# Generate PDF values for the t-distribution with df = 5
plot(x, y, type = "l",col="red") #normal
yt <- dt(x, df = 1) 
lines(x,yt,type="l",col="blue") #tdis

Standard normal probability in right-hand tail #### Instructions for using a t-table 1. Identify Your Degrees of Freedom (df). 2. Determine Your Desired Alpha Level 3. Locate the Correct Critical Value.

Example of Reading a t-Table:

Suppose you have: \(df=9, \alpha=0.05\). 1. Find the row for \(df=9\) in the df column. 2. Look across the row to the column that corresponds to \(\alpha=0.05\) 3. The number you find in this cell is the critical t-value, which you would compare with the test statistic from your data to make decisions about the hypothesis test.

For a Two-Tailed Test: 2.262 For a single-Tailed test: 1.833