The sample function sample(x,n,replace=,prob=) takes a random sample of a set of data. Arguments as follows:
Start with a positive integer x to select from 1 through x, or use a vector (e.g. using the c() function).
Next is the sample size; sample size must be less than population unless using replacement.
replace= is either TRUE or FALSE; FALSE by default.
prob= determines probability for each outcome in the population. By default, every outcome has an equal probability.
sample(c("H", "T"), 10, replace = TRUE, prob = c(0.8, 0.2))
## [1] "H" "T" "H" "H" "T" "H" "H" "H" "H" "T"
prod(x) takes the prodcut of a vector. choose(x,y) performs x!/y!(x-y)! which is combination (choose y outcomes from x possibilities, order doesn't matter, no replacement)
sample(1:40, 5)
## [1] 13 15 6 1 20
1/prod(40:36) #Probability of getting a set of 5 numbers IN ORDER without replacement.
## [1] 1.266e-08
prod(5:1)/prod(40:36) #Probability of getting a set of 5 numbers in ANY ORDER without replacement.
## [1] 1.52e-06
1/choose(40, 5) #Same as above. This performs, out of 40 options, choose 5 = 40!/5!35!
## [1] 1.52e-06
Binomial Distribution:
f(x) = (n choose x) px * (1-p)n-x
where n = # of possible outcomes and x = # of favorable outcomes and p = probability of a favorable outcome in an individual trial.
Uniform Distribution: constant density over a specified interval.
Normal Distribution: bell-shaped distribution dependent on mean [mu] and standard deviation [sigma]. Larger std dev means wider curve. Good math for error distribution and large sample sizes. f(x)=((2πs)-½)*exp(-(x-m)2/(2s2))
Probability of getting a value in a particular interval = area under the distribution curve.
For continuous distributions:
seq(x,y,z) returns a sequence of numbers from x to y at an interval of z.
dnorm(x) returns the density for a normal distribution. Default is mean = 0 and std dev = 1.
x <- seq(-4, 4, 0.05)
plot(x, dnorm(x), type = "l")
For discrete distributions:
dbinom(x,size=,prob=) where x is the # of observations, size= gives the # of trials, and prob= gives the probability of a favorable outcome.
x <- seq(0, 100, 1)
plot(x, dbinom(x, size = 100, prob = 0.6), type = "h")
Find the probability of “hitting” x or less in a given distribution. (Looking or a tail, essentially.)
pnorm(x,mean=,sd=) returns the percent of the area under the normal distribution less than x, given a mean and sd.
pnorm(70, mean = 85, sd = 5) # Probablity of getting a 70 on an exam when the mean was 85 and the std dev was 5, assuming grades are randomly assigned.
## [1] 0.00135
T-tests: If you know that sample size (size=) and probability of a favorable outcome (prob=), you can test the probability that x or fewer outcomes were favorable using pbinom(x,size=,prob=).
# Be careful when using discrete values! Since it measures x or less, you
# include the observation if you want x or less and don't include it if
# you want x or more.
pbinom(4, size = 20, prob = 0.4)
## [1] 0.05095
1 - pbinom(15, size = 20, prob = 0.6)
## [1] 0.05095
1 - pbinom(16, size = 20, prob = 0.6)
## [1] 0.01596
qnorm(x) calculates the quantile x, assuming by default mean = 0 and std dev = 1
m <- 6
q75 <- qnorm(0.75, mean = m, sd = 2)
q25 <- qnorm(0.25, mean = m, sd = 2)
q75 - m == m - q25
## [1] TRUE
rnorm(x,mean=,sd=) returns x random numbers for a population of mean= and sd=.
rbinom(x,size=y,prob=p)returns x observations counting the # of favorable outcomes with y trials each and a probability of a favorable outcome equal to p for each observation.
rnorm(10, mean = 3, sd = 1)
## [1] 3.215 2.319 3.886 4.521 3.609 4.127 4.554 2.578 4.547 5.041
rbinom(8, 500, prob = 0.5) # Like designing an experiment where you toss a coin 500 times and counting the number of heads. Perfrom the experiment 8 times.
## [1] 252 248 254 254 235 225 246 246
1 - pnorm(3)
## [1] 0.00135
1 - pnorm(42, mean = 35, sd = 6)
## [1] 0.1217
pbinom(0, size = 10, prob = 0.2)
## [1] 0.1074
# Actually testing 'What is the chance of 0 unfavorable outcomes when
# probability of the unfavorable is 20%?' (As opposed to testing the
# probability of 10 favorable with probability of 80%.)
pbinom(0, size = 10, 0.2)
## [1] 0.1074
# Make prob= 0.5. size= is the number of flips per experiment. x is the
# number of experiments. The return is a vector reporting the number of
# heads.
cointoss <- rbinom(1000, size = 500, p = 0.5)
summary(cointoss)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 210 242 250 250 257 281
hist(cointoss, xlab = "# of Heads", main = "100 Coin Tossing Experiments", col = "beige")
qqnorm(cointoss, main = "Normal Q-Q Plot for Coin Toss")
qqline(cointoss)