Analyzing proportions

Comic-of-the-day

https://xkcd.com/539/

From means to proportions

So far, the population parameter of interest was a mean/average.

For means and averages, the random variable of interest was numerical.

In this chapter, the population parameter of interest is a proportion.

For proportions, the random variable of interest is categorical.

More specifically, we are interested in the proportion of times in the population that a particular level of a categorical variable occurs.

Example

Practice Problem #6 (estimation)

Population: All $1 US bills

Variable: Measurable cocaine? (Levels: Yes/No)

Parameter: Proportion of bills with measurable cocaine.

Sample: 50 $1 bills [BTW, in actual study, 46 had measurable cocaine]

Example

Practice Problem #5 (hypothesis testing)

Population: All humans who could have been downwind of site of 11 previous aboveground nuclear bomb tests in 1955.

Variable: Developed cancer by 1980s? (Levels: Yes/No)

Parameter: Probability of developing cancer by 1980s.

Sample: 220 actors in film The Conqueror (including John Wayne) who were downwind of … [91 developed cancer by 1980s]

Note: 14% of age group within this time frame should have been stricken with cancer.

Preliminaries - What variable?

Suppose I have a population of size $N$ and a categorical variable $Y$ (e.g. $Y$ could be genotype with levels {aa, Aa, AA}).

For proportions, we really need a categorical variable with only two levels, which would be

“Success = has level of interest”
“Failure = does not have level of interest”.

Preliminaries - What variable?

Suppose I have a population of size $N$ and a categorical variable $Y$ (e.g. $Y$ could be genotype with levels {aa, Aa, AA}).

For example, with the genotype case, we could be interested in the proportion of heterozygotes in a population, so we would have

“Success = Aa”
“Failure = not Aa = {aa, AA}”

Preliminaries - What parameter?

Given the categorical variable with levels “Success” and “Failure”, the proportion of successes in the population would be denoted by

\[p = \frac{\mathrm{Number \ of \ successes \ in \ population}}{\mathrm{Total \ population \ size}} = \frac{X}{N}\]

If we have a sample of size $n$, then we would have a sample estimate of this proportion $p$ given by

\[\hat{p} = \frac{\mathrm{Number \ of \ successes \ in \ sample}}{\mathrm{Total \ sample \ size}} = \frac{\hat{X}}{n}\]

Estimation

What is the standard error for a proportion (i.e. what is the measure of precision for the sample proportion)?

Definition: The standard error of a proportion is the standard deviation of the sampling distribution for a proportion and is given by \[\sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}\]

Estimation

What is the standard error for a proportion (i.e. what is the measure of precision for the sample proportion)?

Definition: An estimate of the standard error for the proportion is given by \[\mathrm{SE}_{\hat{p}} = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]