The purpose of this report is to highlight the various capabilities for analyzing population proportions. Several of the response variables can be analyzed as proportions (unique click, unique open, and enrollment) if the response variables are treated as binomial random variables.
From Wikipedia, “the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success (with probability \(p\)) or failure (with probability \(q = 1 − p\)).”
Let proportion \(p = Y/n\) where \(n\) is the sample size and let \(Y = \Sigma_{i=1}^{n}{I(...)}\). \(I\) is an indicator function and “…” is one of clicked, opened, or enrolled. If a customer clicked, opened, or enrolled then \(I(...) = 1\) else \(I(...) = 0\). The sum of these indicators is \(Y\).
For this example we will work with a fictitious data set. Let’s pretend the below is from a recent experiment. The experimental design was a factorial design with one treatment factor and two levels (AKA one-way split, A/B test). The method of randomization was simple random sampling (AKA complete randomized design (CRD)).
| Example Data | ||||
|---|---|---|---|---|
| Data Frame | ||||
| treatment | y | n | p | no |
| A | 882 | 6,789 | 12.99% | 5,907 |
| B | 597 | 5,977 | 9.99% | 5,380 |
We can also consider the above data set as a 2x2 contingency table of the form seen below. This formulation will be useful to analyze the data. The nature of these analyses are known as Categorical Data Analysis.
| Example Data | ||
|---|---|---|
| 2x2 Contingency Table | ||
| Treatment | Response | |
| Yes | No | |
| A | 882 | 5,907 |
| B | 597 | 5,380 |
Using the above example, the stakeholders want to test whether treatment A is better than treatment B. That is, did treatment A result in a higher click rate, open rate, or enrollment rate. The setup looks like this:
Let us also assume that we know something about treatment A. For example, historically, that \(p \approx 0.09\). We can also test whether \(p\) for treatment A in this experiment is greater than the benchmark. That setup looks like this:
Finally, a note about \(\alpha\) AKA the size of the experiment or the level of significance. \(\alpha\) must be chosen prior to the start of the experiment. Let’s assume that for both of the hypothesis tests above the stakeholder wanted to be extra confident that the results were not by chance. The stakeholder decided \(\alpha = 0.01\).
There are several ways to analyze a hypothesis test. Two such ways are highlighted below:
An analysis of this hypothesis test yields many useful bits of information. For example, since n is “large”, the method used to analyze the results is Asymptotic, n large method (i.e., CLT).
| Treatment vs Benchmark | ||
|---|---|---|
| Distribution Statistics | ||
| Test Statistic | Critical Value | Result |
| 11.49 | 2.33 | Reject null hypothesis |
In the table above the null hypothesis (\(H_0\)) is rejected because the test statistic is greater than the critical value.
| Treatment vs Benchmark | ||
|---|---|---|
| P-Value Assessment | ||
| Alpha | P-Value | Result |
| 0.01 | 0 | Reject null hypothesis |
In the table above it can be seen that the result of the analysis is the same, reject the null hypothesis (\(H_0\)). This time, though, the result was obtained by comparing the p-value of the test to the \(\alpha\) value chosen by the stakeholder. If the p-value of the test is less than \(\alpha\) we can reject the null hypothesis.
Below are other tests for \(p\) vs a benchmark.
| Additional Hypothesis Tests | |||||
|---|---|---|---|---|---|
| Treatment vs Benchmark | |||||
| method | alternative | p.value | result | conf.low | conf.high |
| Exact binomial test | greater | 0 | Reject Null Hypothesis | 0.1206 | 1 |
| 1-sample proportions test without continuity correction | greater | 0 | Reject Null Hypothesis | 0.1207 | 1 |
| Exact one-sided binomial test, mid-p version | greater | 0 | Reject Null Hypothesis | 0.1206 | 1 |
In conclusion, we can claim that the response from treatment A is greater than the benchmark.
Next, we analyze a comparison of treatment A vs treatment B. This branch of analysis is called Categorical Data Analysis. Specifically, the technique below is known as a 2x2 contingency table analysis.
##
## 2-sample test for equality of proportions without continuity correction
##
## data: example_data$y out of example_data$n
## X-squared = 27.99, df = 1, p-value = 6.098e-08
## alternative hypothesis: greater
## 1 percent confidence interval:
## 0.04312952 1.00000000
## sample estimates:
## prop 1 prop 2
## 0.12991604 0.09988288
Since the p-value of the test is \(\approx\) 0 and that is less than \(\alpha\), the results above highlight that \(p\) from treatment A is statistically greater than treatment B. Thus, we reject the null hypothesis (\(H_0\)) in favor for the alternative hypothesis (\(H_1\)).
The power of a test describes the probability of making a correct decision given that the null hypothesis is false. Khan Academy has a great resource about power (link).
##
## difference of proportion power calculation for binomial distribution (arcsine transformation)
##
## h = 0.09436566
## n1 = 6789
## n2 = 5977
## sig.level = 0.01
## power = 0.9986228
## alternative = greater
##
## NOTE: different sample sizes
From the results above it can be seen that the power of this hypothesis test is \(\approx\) 100%. This means that the probability treatment A is \(\gt\) treatment B given that treatment A is actually \(\gt\) treatment B is practically 100%. We can be confident in our previous decision to reject the null hypothesis \(H_0\) in favor for the alternative hypothesis (\(H_1)\).
It’s also possible to visually inspect the power analysis and how power relates to sample size. Generally, the larger that sample size the greater the power will be, all else equal.
We can also visually analyze the hypothesis test via confidence intervals.
| Simultaneous 99% Confidence Interval | |||
|---|---|---|---|
| treatment | mean | lower | upper |
| A | 12.99% | 11.81% | 14.28% |
| B | 9.99% | 8.87% | 11.22% |
To construct 99% simultaneous confidence intervals the \(\alpha\) of 0.01 has to be adjusted to 0.005. The table above shows that the confidence intervals of the treatments above don’t overlap.
The plot above is based on the simultaneous CI table and it allows us to visually inspect how the treatments differ. Since the confidence interval for treatment A is to the left of that of the confidence interval for treatment B and the two confidence intervals don’t overlap, we can claim that treatment A is \(\gt\) treatment B.
When the response variable is categorical and the relationship between the response and a list of covariates is of interest, we turn our attention to a different analytic technique, the linear model:
This data was obtained from Alan Agresti’s An Introduction to Categorical Data Analysis (Agresti 2019). From the book, “… a survey that asked students in their final year of high school near Dayton, Ohio, whether they had ever used marijuana.”
| Example Data | |||
|---|---|---|---|
| Survey About Marijuana Use | |||
| race | gender | Marijuana Use | |
| yes | no | ||
| white | female | 420 | 620 |
| white | male | 483 | 579 |
| other | female | 25 | 55 |
| other | male | 32 | 62 |
This is an example of grouped data. The data can also be ungrouped, that is where each record/row would have race, gender, and a 1 or 0 for marijuana use.
In this example, both of the covariates are categorical. However, logistic regression can also handle numeric covariates as well. The logistic regression model has the form \(log(\frac{\pi(x)}{1-\pi(x)}) = \alpha + \beta_x\).
Below is the output of an analysis of how race and gender relate to marijuana use.
##
## Call:
## glm(formula = cbind(yes, no) ~ gender + race, family = binomial(),
## data = data_marijuana)
##
## Deviance Residuals:
## 1 2 3 4
## -0.04513 0.04402 0.17321 -0.15493
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.83035 0.16854 -4.927 8.37e-07 ***
## gendermale 0.20261 0.08519 2.378 0.01739 *
## racewhite 0.44374 0.16766 2.647 0.00813 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 12.752784 on 3 degrees of freedom
## Residual deviance: 0.057982 on 1 degrees of freedom
## AIC: 30.414
##
## Number of Fisher Scoring iterations: 3
The model predicts that, on average:
Next is an analysis of whether gender or race have no effect on marijuana use.
## Analysis of Deviance Table (Type II tests)
##
## Response: cbind(yes, no)
## LR Chisq Df Pr(>Chisq)
## gender 5.6662 1 0.017295 *
## race 7.2770 1 0.006984 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The probability that gender or race have no effect is low (\(\approx 0\)), thus the conclusion is that gender and race are related to marijuana use.