Dalgaard Chapter 3 Functions – Alex Crawford

3.1 Sampling

The sample function sample(x,n,replace=,prob=) takes a random sample of a set of data. Arguments as follows:
Start with a positive integer x to select from 1 through x, or use a vector (e.g. using the c() function).
Next is the sample size; sample size must be less than population unless using replacement.
replace= is either TRUE or FALSE; FALSE by default.
prob= determines probability for each outcome in the population. By default, every outcome has an equal probability.

sample(c("H", "T"), 10, replace = TRUE, prob = c(0.8, 0.2))

##  [1] "H" "T" "H" "H" "T" "H" "H" "H" "H" "T"

3.2 Probability calculations and combinatorics

prod(x) takes the prodcut of a vector. choose(x,y) performs x!/y!(x-y)! which is combination (choose y outcomes from x possibilities, order doesn't matter, no replacement)

sample(1:40, 5)

## [1] 13 15  6  1 20

1/prod(40:36)  #Probability of getting a set of 5 numbers IN ORDER without replacement.

## [1] 1.266e-08

prod(5:1)/prod(40:36)  #Probability of getting a set of 5 numbers in ANY ORDER without replacement.

## [1] 1.52e-06

1/choose(40, 5)  #Same as above.  This performs, out of 40 options, choose 5 = 40!/5!35!

## [1] 1.52e-06

3.3. Discrete Distributions

Binomial Distribution:
f(x) = (n choose x) p^x * (1-p)^n-x where n = # of possible outcomes and x = # of favorable outcomes and p = probability of a favorable outcome in an individual trial.

3.4. Continuous Distributions

Uniform Distribution: constant density over a specified interval.
Normal Distribution: bell-shaped distribution dependent on mean [mu] and standard deviation [sigma]. Larger std dev means wider curve. Good math for error distribution and large sample sizes. f(x)=((2πs)^{-½)*exp(-(x-m)^{2/(2s²⁾⁾}}

3.5. The built-in distributions in R

3.5.1. Densities

Probability of getting a value in a particular interval = area under the distribution curve. For continuous distributions:
seq(x,y,z) returns a sequence of numbers from x to y at an interval of z. dnorm(x) returns the density for a normal distribution. Default is mean = 0 and std dev = 1.

x <- seq(-4, 4, 0.05)

plot(x, dnorm(x), type = "l")

plot of chunk unnamed-chunk-4

For discrete distributions:
dbinom(x,size=,prob=) where x is the # of observations, size= gives the # of trials, and prob= gives the probability of a favorable outcome.

x <- seq(0, 100, 1)

plot(x, dbinom(x, size = 100, prob = 0.6), type = "h")

plot of chunk unnamed-chunk-6

3.5.2. Cumulative distribution functions

Find the probability of “hitting” x or less in a given distribution. (Looking or a tail, essentially.)
pnorm(x,mean=,sd=) returns the percent of the area under the normal distribution less than x, given a mean and sd.

pnorm(70, mean = 85, sd = 5)  # Probablity of getting a 70 on an exam when the mean was 85 and the std dev was 5, assuming grades are randomly assigned.

## [1] 0.00135

T-tests: If you know that sample size (size=) and probability of a favorable outcome (prob=), you can test the probability that x or fewer outcomes were favorable using pbinom(x,size=,prob=).

# Be careful when using discrete values!  Since it measures x or less, you
# include the observation if you want x or less and don't include it if
# you want x or more.
pbinom(4, size = 20, prob = 0.4)

## [1] 0.05095

1 - pbinom(15, size = 20, prob = 0.6)

## [1] 0.05095

1 - pbinom(16, size = 20, prob = 0.6)

## [1] 0.01596

3.5.3. Quantiles

qnorm(x) calculates the quantile x, assuming by default mean = 0 and std dev = 1

m <- 6
q75 <- qnorm(0.75, mean = m, sd = 2)
q25 <- qnorm(0.25, mean = m, sd = 2)
q75 - m == m - q25

## [1] TRUE

3.5.4. Random Numbers

rnorm(x,mean=,sd=) returns x random numbers for a population of mean= and sd=.
rbinom(x,size=y,prob=p)returns x observations counting the # of favorable outcomes with y trials each and a probability of a favorable outcome equal to p for each observation.

rnorm(10, mean = 3, sd = 1)

##  [1] 3.215 2.319 3.886 4.521 3.609 4.127 4.554 2.578 4.547 5.041

rbinom(8, 500, prob = 0.5)  # Like designing an experiment where you toss a coin 500 times and counting the number of heads.  Perfrom the experiment 8 times.

## [1] 252 248 254 254 235 225 246 246

3.6. Exercises

3.1.

1 - pnorm(3)

## [1] 0.00135

1 - pnorm(42, mean = 35, sd = 6)

## [1] 0.1217

pbinom(0, size = 10, prob = 0.2)

## [1] 0.1074

3.3.

# Actually testing 'What is the chance of 0 unfavorable outcomes when
# probability of the unfavorable is 20%?' (As opposed to testing the
# probability of 10 favorable with probability of 80%.)
pbinom(0, size = 10, 0.2)

## [1] 0.1074

3.4.

# Make prob= 0.5.  size= is the number of flips per experiment.  x is the
# number of experiments.  The return is a vector reporting the number of
# heads.
cointoss <- rbinom(1000, size = 500, p = 0.5)
summary(cointoss)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     210     242     250     250     257     281

hist(cointoss, xlab = "# of Heads", main = "100 Coin Tossing Experiments", col = "beige")

plot of chunk unnamed-chunk-14

qqnorm(cointoss, main = "Normal Q-Q Plot for Coin Toss")
qqline(cointoss)

plot of chunk unnamed-chunk-15