To find mean and standard deviation for discrete probability distribution: You will need to have the package arm and weighted.Desc.Stat loaded. Click on the box next to their names in the list of packages.
General process: variable<-c(type in data with commas) probability<-c(type in probabilities for the x values as decimals with commas)
mean: w.mean(variable, probability)
variance w.var(variable, probability)
Standard deviation: w.sd(variable, probability)
A histogram of the distribution: discrete.histogram(variable, probability, bar.width= 1, main=“title of graph”)
You can also use a cvs file and then type in the appropriate variables for variable and probability.
Suppose you want to have the sizes of families and the percentage of families that are these size.
| size | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|
| percent | 42% | 23% | 21% | 10% | 3% | 1% |
To find the mean size of a family, the standard deviation of the family size, and to draw the distribution of family size, do the following:
size<-c(2, 3, 4, 5, 6, 7)
probability<-c(0.42, 0.23, 0.21, 0.10, 0.03, 0.01)
w.mean(size, probability)
## [1] 3.12
w.var(size, probability)
## [1] 1.4456
w.sd(size, probability)
## [1] 1.202331
discrete.histogram(size, probability, bar.width = 1, main="Probability distribution for Size of a Family", xlab="Size")
To find probabilities for probability distributions for a binomial experiment with n trials and the probability of a success is p.
To find P(success=r), use
dbinom(r,n,p)
To find \(P(success\le r)\) use
pbinom(r,n,p, lower.tail=TRUE)
To find \(P(success \ge r)\) use
pbinom(r-1,n,p, lower.tail=FALSE)
If there are 20 trials, the probability of a success is 0.01, and you want to find the probability of 9 successes, this would be P(success = 9). This means you use the dbinom command.
dbinom(9,20,0.01)
## [1] 1.50381e-13
If you want to find the probability of at most 3 successes, then you want P(success is less than or equal to 3), which would be the pbinom command
pbinom(3,20, 0.01, lower.tail=TRUE)
## [1] 0.9999574
If you want to find the probability of at least 16 successes, then you want P(success is greater than or equal to 16), which would be the pbinom command.
pbinom(16-1, 20, 0.01, lower.tail = FALSE)
## [1] 4.665168e-29
If you want to store all values of a random variable, type success<-c(0:n) If you want to save the probabilities for all values of the random variable, use probability_value<-dbinom(0:n,n,p) As an example, If you have 20 trials, and the probability of a success is 0.01, and you want to save the values of the success and the probability values and the then you would do the following.
success<-c(0:20)
probability_values<-dbinom(0:20, 20, 0.01)
success #displays the value of the variable
## [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
probability_values #displays the probability values
## [1] 8.179069e-01 1.652337e-01 1.585576e-02 9.609552e-04 4.125313e-05
## [6] 1.333434e-06 3.367259e-08 6.802543e-10 1.116579e-11 1.503810e-13
## [11] 1.670900e-15 1.534344e-17 1.162381e-19 7.225371e-22 3.649177e-24
## [16] 1.474415e-26 4.654088e-29 1.106141e-31 1.862190e-34 1.980000e-37
## [21] 1.000000e-40
Note: R doesn’t write numbers that are really small, more than 4 decimal places, in standard notation. Instead numbers are given in scientific notation. As an example 0.00000000532 would be written as 5.32e-9 in r. This is common with many computers and calculators. You can either write the number with all the decimal places or write using correct scientific notation \(5.32X10^{-9}\) and not the way that R writes it.
To find probabilities for the normal distribution that shows you the graph for P(variable < r ), you use
xpnorm(r, mean= , sd= , lower.tail=TRUE).
If you want the probability for P(variable > r), use
xprnom(r, mean=, sd=, lower.tail=FALSE).
To find the P(a is less than x which is less than b), then you do
xpnorm(b, mean, sd, lower.tail=TRUE)-xprnom(a, mean, sd. lower.tail=TRUE).
Find a data value when you know the area to the left you use
xqnorm(area to the left, mean, sd, lower.tail=TRUE).
Find a data value when you know the area to the right you use
xqnorm(area to the right, mean, sd, lower.tail=FALSE).
This also show the graph.
If the mean is 272 and the standard deviation is 9, and you want P(x<250), then use
xpnorm(250, 272, 9, lower.tail=TRUE)
##
## If X ~ N(272, 9), then
## P(X <= 250) = P(Z <= -2.444) = 0.007254
## P(X > 250) = P(Z > -2.444) = 0.9927
##
## [1] 0.007253771
If the mean is 272 and the standard deviation is 9, and you want P(x>305), then use
xpnorm(305, 272, 9, lower.tail=FALSE)
##
## If X ~ N(272, 9), then
## P(X <= 305) = P(Z <= 3.667) = 0.9999
## P(X > 305) = P(Z > 3.667) = 0.0001229
##
## [1] 0.0001228664
If know the mean is 272 and the standard deviation is 9, you know the probability to the left of an x value is 10%, and you want to find x, then use
xqnorm(0.1, mean = 272, sd = 9, lower.tail=TRUE)
##
## If X ~ N(272, 9), then
## P(X <= 260.466) = 0.1
## P(X > 260.466) = 0.9
##
## [1] 260.466
To find probabilities for the t distribution use xpt(r, df = degrees of freedom) –finds P(t less than r) with degrees of freedom equal to df. Use xqt(p, df = degrees of freedom) to find when you know the P(t less than r_ with degrees of freedom equal to df and you want to know r.
If you know the degrees of freedom is 52, and you want to find P(t more than 7.707) then use
xpt(7.707, df = 52, lower.tail = FALSE)
## [1] 1.853009e-10
If you know the degrees of freedom is 19, and you know the P(t less than r) = 0.01, and you want to find r, then you use
xqt(0.01, df = 19, lower.tail = TRUE)
## [1] -2.539483
To find probabilities for the chi-square distribution use xpchisq(value, df = degrees of freedom, lower.tail=FALSE)
xpchisq(4, df=3, lower.tail=FALSE)
## [1] 0.2614641
To create a normal quantile plot, you use gf_qq(~variable, data=Dataset, title=“title you want”)
Draw the normal quantile plot for the dataset Example
gf_qq(~data, data=Example, title="Graph of Example", color="blue")