Reporting and Analysis Capabilities

The purpose of this report is to highlight the various capabilities for analyzing population proportions. Several of the response variables can be analyzed as proportions (unique click, unique open, and enrollment) if the response variables are treated as binomial random variables.

Binomial Random Variables

From Wikipedia, “the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success (with probability \(p\)) or failure (with probability \(q = 1 − p\)).”

Let proportion \(p = Y/n\) where \(n\) is the sample size and let \(Y = \Sigma_{i=1}^{n}{I(...)}\). \(I\) is an indicator function and “…” is one of clicked, opened, or enrolled. If a customer clicked, opened, or enrolled then \(I(...) = 1\) else \(I(...) = 0\). The sum of these indicators is \(Y\).

Example Data

For this example we will work with a fictitious data set. Let’s pretend the below is from a recent experiment. The experimental design was a factorial design with one treatment factor and two levels (AKA one-way split, A/B test). The method of randomization was simple random sampling (AKA complete randomized design (CRD)).


Example Data
Data Frame
treatment y n p no
A 882 6,789 12.99% 5,907
B 597 5,977 9.99% 5,380


We can also consider the above data set as a 2x2 contingency table of the form seen below. This formulation will be useful to analyze the data. The nature of these analyses are known as Categorical Data Analysis.


Example Data
2x2 Contingency Table
Treatment Response
Yes No
A 882 5,907
B 597 5,380

Hypothesis Testing Framework

Using the above example, the stakeholders want to test whether treatment A is better than treatment B. That is, did treatment A result in a higher click rate, open rate, or enrollment rate. The setup looks like this:

  • \(H_0: p_A \le p_B\)
  • \(H_1: p_A \gt p_B\)

Let us also assume that we know something about treatment A. For example, historically, that \(p \approx 0.09\). We can also test whether \(p\) for treatment A in this experiment is greater than the benchmark. That setup looks like this:

  • \(H_0: p_A \le 0.09\)
  • \(H_1: p_A \gt 0.09\)

Finally, a note about \(\alpha\) AKA the size of the experiment or the level of significance. \(\alpha\) must be chosen prior to the start of the experiment. Let’s assume that for both of the hypothesis tests above the stakeholder wanted to be extra confident that the results were not by chance. The stakeholder decided \(\alpha = 0.01\).

Treatment A vs Benchmark

There are several ways to analyze a hypothesis test. Two such ways are highlighted below:

  • a comparison of the test statistics vs the critical value
  • comparing the p-value of the test to \(\alpha\)

An analysis of this hypothesis test yields many useful bits of information. For example, since n is “large”, the method used to analyze the results is Asymptotic, n large method (i.e., CLT).


Treatment vs Benchmark
Distribution Statistics
Test Statistic Critical Value Result
11.49 2.33 Reject null hypothesis


In the table above the null hypothesis (\(H_0\)) is rejected because the test statistic is greater than the critical value.


Treatment vs Benchmark
P-Value Assessment
Alpha P-Value Result
0.01 0 Reject null hypothesis


In the table above it can be seen that the result of the analysis is the same, reject the null hypothesis (\(H_0\)). This time, though, the result was obtained by comparing the p-value of the test to the \(\alpha\) value chosen by the stakeholder. If the p-value of the test is less than \(\alpha\) we can reject the null hypothesis.

Below are other tests for \(p\) vs a benchmark.


Additional Hypothesis Tests
Treatment vs Benchmark
method alternative p.value result conf.low conf.high
Exact binomial test greater 0 Reject Null Hypothesis 0.1206 1
1-sample proportions test without continuity correction greater 0 Reject Null Hypothesis 0.1207 1
Exact one-sided binomial test, mid-p version greater 0 Reject Null Hypothesis 0.1206 1


In conclusion, we can claim that the response from treatment A is greater than the benchmark.

Treatment A vs Treatment B

2x2 Contingency Table

Next, we analyze a comparison of treatment A vs treatment B. This branch of analysis is called Categorical Data Analysis. Specifically, the technique below is known as a 2x2 contingency table analysis.

## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  example_data$y out of example_data$n
## X-squared = 27.99, df = 1, p-value = 6.098e-08
## alternative hypothesis: greater
## 1 percent confidence interval:
##  0.04312952 1.00000000
## sample estimates:
##     prop 1     prop 2 
## 0.12991604 0.09988288

Since the p-value of the test is \(\approx\) 0 and that is less than \(\alpha\), the results above highlight that \(p\) from treatment A is statistically greater than treatment B. Thus, we reject the null hypothesis (\(H_0\)) in favor for the alternative hypothesis (\(H_1\)).

Power analysis

The power of a test describes the probability of making a correct decision given that the null hypothesis is false. Khan Academy has a great resource about power (link).

## 
##      difference of proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.09436566
##              n1 = 6789
##              n2 = 5977
##       sig.level = 0.01
##           power = 0.9986228
##     alternative = greater
## 
## NOTE: different sample sizes

From the results above it can be seen that the power of this hypothesis test is \(\approx\) 100%. This means that the probability treatment A is \(\gt\) treatment B given that treatment A is actually \(\gt\) treatment B is practically 100%. We can be confident in our previous decision to reject the null hypothesis \(H_0\) in favor for the alternative hypothesis (\(H_1)\).

It’s also possible to visually inspect the power analysis and how power relates to sample size. Generally, the larger that sample size the greater the power will be, all else equal.



Confidence Intervals

We can also visually analyze the hypothesis test via confidence intervals.


Simultaneous 99% Confidence Interval
treatment mean lower upper
A 12.99% 11.81% 14.28%
B 9.99% 8.87% 11.22%


To construct 99% simultaneous confidence intervals the \(\alpha\) of 0.01 has to be adjusted to 0.005. The table above shows that the confidence intervals of the treatments above don’t overlap.

The plot above is based on the simultaneous CI table and it allows us to visually inspect how the treatments differ. Since the confidence interval for treatment A is to the left of that of the confidence interval for treatment B and the two confidence intervals don’t overlap, we can claim that treatment A is \(\gt\) treatment B.

Analysis of Covariates

When the response variable is categorical and the relationship between the response and a list of covariates is of interest, we turn our attention to a different analytic technique, the linear model:

Example Data

This data was obtained from Alan Agresti’s An Introduction to Categorical Data Analysis (Agresti 2019). From the book, “… a survey that asked students in their final year of high school near Dayton, Ohio, whether they had ever used marijuana.”


Example Data
Survey About Marijuana Use
race gender Marijuana Use
yes no
white female 420 620
white male 483 579
other female 25 55
other male 32 62


This is an example of grouped data. The data can also be ungrouped, that is where each record/row would have race, gender, and a 1 or 0 for marijuana use.

Logistic Regression

In this example, both of the covariates are categorical. However, logistic regression can also handle numeric covariates as well. The logistic regression model has the form \(log(\frac{\pi(x)}{1-\pi(x)}) = \alpha + \beta_x\).

Below is the output of an analysis of how race and gender relate to marijuana use.

## 
## Call:
## glm(formula = cbind(yes, no) ~ gender + race, family = binomial(), 
##     data = data_marijuana)
## 
## Deviance Residuals: 
##        1         2         3         4  
## -0.04513   0.04402   0.17321  -0.15493  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.83035    0.16854  -4.927 8.37e-07 ***
## gendermale   0.20261    0.08519   2.378  0.01739 *  
## racewhite    0.44374    0.16766   2.647  0.00813 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 12.752784  on 3  degrees of freedom
## Residual deviance:  0.057982  on 1  degrees of freedom
## AIC: 30.414
## 
## Number of Fisher Scoring iterations: 3

The model predicts that, on average:

  • the estimated odds that a male had used marijuana was 1.22 times the estimated odds that a female had used marijuana, holding race constant
  • the estimated odds that a person of white race had used marijuana was 1.56 times the estimated odds that a person of non-white race had used marijuana, holding gender constant

Next is an analysis of whether gender or race have no effect on marijuana use.

## Analysis of Deviance Table (Type II tests)
## 
## Response: cbind(yes, no)
##        LR Chisq Df Pr(>Chisq)   
## gender   5.6662  1   0.017295 * 
## race     7.2770  1   0.006984 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The probability that gender or race have no effect is low (\(\approx 0\)), thus the conclusion is that gender and race are related to marijuana use.

Sources

Agresti, Alan. 2019. An Introduction to Categorical Data Analysis. 3rd ed. Hoboken, NJ: John Wiley & Sons, Inc.