Monday 25 March 2013 Stats 155 Class Notes

Definitions

A probability is a number between 0 and 1. 0 means “impossible” and 1 means “certain”. Values between 0 and 1 indicate possibility, with bigger numbers indicating a greater possibility.

A random variable is a quantity (a number) that is random.

A sample space (poorly named) is the set of possible values for that number.

A probability model is an assignment of a probability to each member of the sample space.

It's helpful to distinguish between two kinds of sample spaces that apply to random variables:

For discrete numbers, it's possible to assign a probability to each outcome.

For continuous numbers, it's possible to assign a probability to a range of outcomes. Or, by dividing the probability by the extent of the range, one can assign a probability density to each outcome. We usually treat this probability density as a function of the value of the random value: \( p(x) \)

We'll often use probabilities and probability densities in a similar way.

Our Main Use for Probability

The sampling distribution

For each probability model that we study, we'll name a setting that involves a confidence interval.

Some Important Probability Models

Introduce the rxxx() operation for each. For equal probabilities, just use resample(1:k, size=n). Generate random numbers from each. Ask them to find the mean and standard deviation of each.

Discrete

qt(c(0.025, 0.975), df = 1)
## [1] -12.71  12.71
qt(c(0.025, 0.975), df = 2)
## [1] -4.303  4.303
qt(c(0.025, 0.975), df = 10000)  # the famous 1.96
## [1] -1.96  1.96

Why the “Normal” Distribution is Normal

Basic Operations: P and Q

The D operation

ACTIVITY: Returns on investments

Stock market gives return of, say 5%/year on average with a standard deviation of about 6%. Simulate the total investment return over 50 years.

prod(1 + rnorm(50, mean = 0.05, sd = 0.06))
## [1] 7.265

Have each student do their own, then congratulate the student who got the highest return.

Then show the overall distribution:

trials = do(1000) * prod(1 + rnorm(50, mean = 0.05, sd = 0.06))
densityplot(~trials)

plot of chunk unnamed-chunk-4

This distribution has a name: lognormal. It reflects the fact that the log of the values has a normal distribution.

densityplot(~log(trials))

plot of chunk unnamed-chunk-5

Random Angles

Distribution of random angles.