Probability

Discrete Probability Distributions

To find mean and standard deviation for discrete probability distribution: You will need to have the package arm and weighted.Desc.Stat loaded. Click on the box next to their names in the list of packages.

Typing in a probability distribution

General process: variable<-c(type in data with commas) probability<-c(type in probabilities for the x values as decimals with commas)

Finding mean variance, and standard deviation, and draw a histogram.

mean: w.mean(variable, probability)

variance w.var(variable, probability)

Standard deviation: w.sd(variable, probability)

A histogram of the distribution: discrete.histogram(variable, probability, bar.width= 1, main=“title of graph”)

You can also use a cvs file and then type in the appropriate variables for variable and probability.

Example

Suppose you want to have the sizes of families and the percentage of families that are these size.

Size of family
size 2 3 4 5 6 7
percent 42% 23% 21% 10% 3% 1%

To find the mean size of a family, the standard deviation of the family size, and to draw the distribution of family size, do the following:

size<-c(2, 3, 4, 5, 6, 7)
probability<-c(0.42, 0.23, 0.21, 0.10, 0.03, 0.01)
w.mean(size, probability)
## [1] 3.12
w.var(size, probability)
## [1] 1.4456
w.sd(size, probability)
## [1] 1.202331
discrete.histogram(size, probability, bar.width = 1, main="Probability distribution for Size of a Family", xlab="Size")

Probability Distribution

Binomial Distribution

To find probabilities for probability distributions for a binomial experiment with n trials and the probability of a success is p.

To find P(success=r), use

dbinom(r,n,p)

To find \(P(success\le r)\) use

pbinom(r,n,p, lower.tail=TRUE)

To find \(P(success \ge r)\) use

pbinom(r-1,n,p, lower.tail=FALSE)

Example

If there are 20 trials, the probability of a success is 0.01, and you want to find the probability of 9 successes, this would be P(success = 9). This means you use the dbinom command.

dbinom(9,20,0.01) 
## [1] 1.50381e-13

If you want to find the probability of at most 3 successes, then you want P(success is less than or equal to 3), which would be the pbinom command

pbinom(3,20, 0.01, lower.tail=TRUE) 
## [1] 0.9999574

If you want to find the probability of at least 16 successes, then you want P(success is greater than or equal to 16), which would be the pbinom command.

pbinom(16-1, 20, 0.01, lower.tail = FALSE)
## [1] 4.665168e-29

Storing values of random variable and probability values.

If you want to store all values of a random variable, type success<-c(0:n) If you want to save the probabilities for all values of the random variable, use probability_value<-dbinom(0:n,n,p) As an example, If you have 20 trials, and the probability of a success is 0.01, and you want to save the values of the success and the probability values and the then you would do the following.

success<-c(0:20)
probability_values<-dbinom(0:20, 20, 0.01)
success #displays the value of the variable
##  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
probability_values #displays the probability values
##  [1] 8.179069e-01 1.652337e-01 1.585576e-02 9.609552e-04 4.125313e-05
##  [6] 1.333434e-06 3.367259e-08 6.802543e-10 1.116579e-11 1.503810e-13
## [11] 1.670900e-15 1.534344e-17 1.162381e-19 7.225371e-22 3.649177e-24
## [16] 1.474415e-26 4.654088e-29 1.106141e-31 1.862190e-34 1.980000e-37
## [21] 1.000000e-40

Scientific Notation

Note: R doesn’t write numbers that are really small, more than 4 decimal places, in standard notation. Instead numbers are given in scientific notation. As an example 0.00000000532 would be written as 5.32e-9 in r. This is common with many computers and calculators. You can either write the number with all the decimal places or write using correct scientific notation \(5.32X10^{-9}\) and not the way that R writes it.

Normal Distribution

To find probabilities for the normal distribution that shows you the graph for P(variable < r ), you use

xpnorm(r, mean= , sd= , lower.tail=TRUE).

If you want the probability for P(variable > r), use

xprnom(r, mean=, sd=, lower.tail=FALSE).

To find the P(a is less than x which is less than b), then you do

xpnorm(b, mean, sd, lower.tail=TRUE)-xprnom(a, mean, sd. lower.tail=TRUE).

Find a data value when you know the area to the left you use

xqnorm(area to the left, mean, sd, lower.tail=TRUE).

Find a data value when you know the area to the right you use

xqnorm(area to the right, mean, sd, lower.tail=FALSE).

This also show the graph.

Example

If the mean is 272 and the standard deviation is 9, and you want P(x<250), then use

xpnorm(250, 272, 9, lower.tail=TRUE)
## 
## If X ~ N(272, 9), then
##  P(X <= 250) = P(Z <= -2.444) = 0.007254
##  P(X >  250) = P(Z >  -2.444) = 0.9927
## 

## [1] 0.007253771

If the mean is 272 and the standard deviation is 9, and you want P(x>305), then use

xpnorm(305, 272, 9, lower.tail=FALSE)
## 
## If X ~ N(272, 9), then
##  P(X <= 305) = P(Z <= 3.667) = 0.9999
##  P(X >  305) = P(Z >  3.667) = 0.0001229
## 

Normal Distribution with right tail shaded

## [1] 0.0001228664

If know the mean is 272 and the standard deviation is 9, you know the probability to the left of an x value is 10%, and you want to find x, then use

xqnorm(0.1, mean = 272, sd = 9, lower.tail=TRUE) 
## 
## If X ~ N(272, 9), then
##  P(X <= 260.466) = 0.1
##  P(X >  260.466) = 0.9
## 

Normal Distribution with left tail shaded

## [1] 260.466

Student’s t distribution

To find probabilities for the t distribution use xpt(r, df = degrees of freedom) –finds P(t less than r) with degrees of freedom equal to df. Use xqt(p, df = degrees of freedom) to find when you know the P(t less than r_ with degrees of freedom equal to df and you want to know r.

Example:

If you know the degrees of freedom is 52, and you want to find P(t more than 7.707) then use

xpt(7.707, df = 52, lower.tail = FALSE)

t Distribution very right tail shaded

## [1] 1.853009e-10

If you know the degrees of freedom is 19, and you know the P(t less than r) = 0.01, and you want to find r, then you use

xqt(0.01, df = 19, lower.tail = TRUE)

t Distribution with left tail shaded

## [1] -2.539483

Chi-Squared Distribution

To find probabilities for the chi-square distribution use xpchisq(value, df = degrees of freedom, lower.tail=FALSE)

Example

xpchisq(4, df=3, lower.tail=FALSE)

Chi-square Distribution with right tail shaded

## [1] 0.2614641

Normal Quantile Plot

To create a normal quantile plot, you use gf_qq(~variable, data=Dataset, title=“title you want”)

Example

Draw the normal quantile plot for the dataset Example

gf_qq(~data, data=Example, title="Graph of Example", color="blue")

Normal Quantile Plot for the Example