Problem 1: NPC Pilot Survey

Part A. Consider the shows “Living with Ed” and “My Name is Earl.” Who makes people happier: Ed or Earl? Construct a filtered data set containing only viewer responses where Show == “Living with Ed” or Show == “My Name is Earl”. Then construct a 95% confidence interval for the difference in mean viewer response to the Q1_Happy question for these two shows. Is there evidence that one show consistently produces a higher mean Q1_Happy response among viewers?
Question:

Can we find statistically significant evidence if one show makes people happier on average?

Approach:

We filtered the data to only include responses for “Living with Ed” and “My Name is Earl.” We used the Q1_Happy variable, which reflects viewer happiness. Using bootstrapping (10,000) via the do() function in R, we calculated the sampling distribution of the difference in means (Ed – Earl). A 95% confidence interval was generated using the percentile method from the bootstrap distribution.

Results:
Bootstrap 95% CI for Difference in Mean Q1_Happy Scores (Ed - Earl)
Comparison Lower Bound (95% CI) Upper Bound (95% CI) Confidence Level Method Mean Difference Estimate
diff_in_means_Ed_vs_Earl -0.098 0.396 0.95 percentile -0.149

Figure: Histogram showing the bootstrap distribution of the difference in mean Q1_Happy scores across 10,000 samples. Red dashed lines mark the 95% percentile confidence interval.

Conclusion:

Although Ed had a slightly higher observed mean happiness score, the 95% confidence interval includes zero. This shows that there is no statistically significant evidence that viewers found more joy or happiness in one show than the other.

Part B. Consider the shows “The Biggest Loser” and “The Apprentice: Los Angeles.” Which reality/contest show made people feel more annoyed? Construct a filtered data set containing only viewer responses where Show == “The Biggest Loser” or Show == “The Apprentice: Los Angeles”. Then construct a 95% confidence interval for the difference in mean viewer response to the Q1_Annoyed question for these two shows. Is there evidence that one show consistently produces a higher mean Q1_Annoyed response among viewers?
Question:

Is there a statistically significant evidence that one show made viewers feel more annoyed on average? The Biggest Loser or The Apprentice: Los Angeles?

Approach:

We filtered the data set to include only the two shows and used the do() and resample() functions from the mosaic package in R to generate 10,000 bootstrapped samples. For each sample, we computed the difference in mean Q1_Annoyed scores. We then constructed a 95% percentile confidence interval from the bootstrap distribution. Method: Bootstrap percentile confidence interval.

Results:
Bootstrap 95% CI for Difference in Mean Q1_Annoyed Scores (Loser - Apprentice)
Comparison Lower Bound (95% CI) Upper Bound (95% CI) Confidence Level Method Mean Difference Estimate
diff_in_means_Loser_vs_Apprentice -0.526 -0.02 0.95 percentile -0.271

Figure: Histogram showing the bootstrap distribution of the difference in mean Q1_Annoyed scores across 10,000 samples. Red dashed lines mark the 95% percentile confidence interval.

Conclusion:

The Biggest Loser has a slightly higher average annoyance score than The Apprentice: Los Angeles, the 95% confidence interval does not include zero,indicating statistically significant evidence that The Biggest Loser annoyed viewers more.

Part C. Consider the show “Dancing with the Stars.” This show has a straightforward premise: it is a dancing competition between couples, with each couple consisting of a celebrity paired with a professional dancer. Per Wikipedia: “Each couple performs predetermined dances and competes against the others for judges’ points and audience votes.”
Question:

What proportion of American TV watchers found Dancing with the Stars confusing?

Approach:

We filtered the dataset to include only the responses for Dancing with the Stars and created a new binary variable to mark viewers who responded with 4 or 5 on Q2_Confusing. We used the do() and resample() functions from the mosaic package to generate 10,000 bootstrap samples and calculated the proportion of “confused” responses in each sample. Method: Bootstrap percentile confidence. interval.

Results:
Bootstrap 95% CI for Proportion of Viewers Who Found DWTS Confusing
Statistic Lower Bound (95% CI) Upper Bound (95% CI) Confidence Level Method Proportion Estimate
prop_confused_dwts 0.039 0.116 0.95 percentile 0.077

Conclusion:

Based on 10,000 bootstrap simulations, we estimate that approximately 7.7% of viewers found Dancing with the Stars confusing. The 95% confidence interval ranges from 3.9% to 11.6%, meaning we are fairly confident that the true population proportion lies within this range.

Problem 2: EBay. In this problem, you’ll analyze data from an experiment run by EBay in order to assess whether the company’s paid advertising on Google’s search platform was improving EBay’s revenue.

Question:

Does paid advertising on Google create additional revenue for EBay? Specifically, is there a statistically significant difference in the revenue ratio (revenue after / revenue before) between treatment group DMAs (where advertising was paused) and control group DMAs (where advertising continued)?

Approach:

We calculated the revenue ratio for each DMA, defined as rev_after / rev_before. We then computed the difference in mean revenue ratio between the control group and the treatment group. To estimate the uncertainty, we used a bootstrap simulation with 10,000 resamples to generate a 95% confidence interval for this difference. Method: Bootstrap percentile confidence interval using the do() and resample() functions from the mosaic package in R.

Results:
Bootstrapped 95% CI for Revenue Ratio Difference
name lower upper level method estimate
diffmean -0.091742 -0.013486 0.95 percentile -0.052532

Figure: Histogram showing the bootstrap distribution of the difference in mean revenue ratio between the control and treatment DMAs. Red dashed lines indicate the 95% confidence interval.

Conclusion:

The mean revenue ratio was higher in the control group (where ads continued) than in the treatment group (where ads were paused). The 95% bootstrap confidence interval for the difference in means is approximately [-0.0917, -0.0135] and does not include zero. This provides statistically significant evidence that turning off paid search ads led to a decrease in revenue. Therefore, EBay’s paid advertising on Google appears to generate additional revenue.

Problem 3:Iron Bank. The Securities and Exchange Commission (SEC) is investigating the Iron Bank, where a cluster of employees have recently been identified in various suspicious patterns of securities trading that violate federal “insider trading” laws.

Question:

Are the observed data (70 flagged trades out of 2021) consistent with the SEC’s null hypothesis that trades are flagged at the baseline rate of 2.4%? Or does the evidence suggest a significantly higher flagging rate?

The null hypothesis

Let \(X \sim \text{Binomial}(n = 2021, \, p = 0.024)\), where 2.4% is the baseline flag rate.

  • \(H_0\): \(p = 0.024\) (Iron Bank trades are flagged at the normal rate)
  • \(H_A\): \(p > 0.024\) (Iron Bank trades are flagged more often)

This is a one-sided test.

The test statistic

The observed number of flagged trades is:

\[ T_{\text{obs}} = 70 \]

We compare this to values simulated under \(H_0\) to compute the p-value.

The plot assuming that the null hypothesis is true

The p-value is: 0.002040

This means that only 0.204% of the simulations produced 70 or more flagged trades under the null.

Conclusion

Since the p-value is extremely small, we reject the null hypothesis. There is very strong evidence that the Iron Bank is being flagged at a higher rate than the SEC’s baseline of 2.4%. The data supports the idea that this is not random. It’s highly likely something irregular is happening with Iron Bank’s trades.

Problem 4:milk demand, revisited. use bootstrapping to quantify your uncertainty regarding the price elasticity of demand for milk based on this data.

Question:

How uncertain is our estimate of the price elasticity of demand for milk based on the sample data? Use bootstrapping to quantify the uncertainty and report a 95% confidence interval.

Approach:

We aim to estimate the price elasticity of demand for milk, defined as the percentage change in quantity demanded resulting from a 1% change in price.

Since both price and sales are positive and skewed, we take logarithms to linearize the relationship. Specifically, we model:

\[ \log(d) = \log(A) + \beta \cdot \log(p) \]

Here:

  • \(d\) is the quantity demanded (sales)
  • \(p\) is the price
  • \(\beta\) is the price elasticity of demand

To quantify uncertainty around \(\beta\), we use bootstrap resampling: - Sample the dataset 10,000 times
- Fit a log-log linear regression each time
- Extract the \(\beta\) coefficient (elasticity)
- Construct the bootstrap distribution and derive a 95% confidence interval

Results:

Figure: Histogram of 10,000 bootstrapped estimates for the price elasticity of milk demand. The solid blue line shows the average estimated elasticity, and the dashed red lines show the 95% confidence interval, which ranges from -1.77 to -1.46. This plot illustrates the uncertainty in estimating the elasticity based on our sample.

Conclusion:

Since the entire confidence interval is less than -1, we conclude that demand for milk is elastic, meaning consumers are relatively sensitive to price changes.

Problem 5:

Suppose that \(X_1, \ldots, X_N \sim \text{Bernoulli}(p)\) and that \(Y_1, \ldots, Y_M \sim \text{Bernoulli}(q)\) (all independent). We will consider \(\hat{p} = \bar{X}_N\) and \(\hat{q} = \bar{Y}_M\) as estimators of \(p\) and \(q\), respectively.


i.

Show that \(E[\hat{p} - \hat{q}] = p - q\), the true difference in success probabilities.

Let

\[ \hat{p} = \frac{X_1 + \cdots + X_N}{N}, \quad \hat{q} = \frac{Y_1 + \cdots + Y_M}{M} \]

Then we compute the expectation:

\[ E[\hat{p} - \hat{q}] = E\left[\frac{X_1 + \cdots + X_N}{N} - \frac{Y_1 + \cdots + Y_M}{M} \right] \]

Passing constants out of expectations:

\[ = \frac{1}{N} \left(E[X_1] + \cdots + E[X_N] \right) - \frac{1}{M} \left(E[Y_1] + \cdots + E[Y_M] \right) \]

Using \(E[X_i] = p\) and \(E[Y_j] = q\), we simplify:

\[ E[\hat{p} - \hat{q}] = \frac{Np}{N} - \frac{Mq}{M} = p - q \]


ii.

Compute the standard error of \(\hat{p}\), the sampling distribution’s standard deviation.

We use:

\[ SE[\hat{p}] = \sqrt{\text{Var}[\hat{p}]} = \sqrt{\text{Var} \left( \frac{X_1 + \cdots + X_N}{N} \right)} \]

Since the \(X_i\) are independent:

\[ \text{Var}[\hat{p}] = \frac{1}{N^2} \sum_{i=1}^{N} \text{Var}(X_i) = \frac{1}{N^2} \cdot N \cdot p(1 - p) = \frac{p(1 - p)}{N} \]

So:

\[ SE[\hat{p}] = \sqrt{ \frac{p(1 - p)}{N} } \]


iii.

Compute the standard error of \(\hat{\Delta} = \hat{p} - \hat{q}\) as an estimator of the true difference \(\Delta = p - q\).

We use:

\[ \text{Var}[\hat{p} - \hat{q}] = \text{Var}[\hat{p}] + \text{Var}[\hat{q}] \quad \text{(independence)} \]

So:

\[ SE[\hat{\Delta}] = \sqrt{ \text{Var}[\hat{p}] + \text{Var}[\hat{q}] } = \sqrt{ \frac{p(1 - p)}{N} + \frac{q(1 - q)}{M} } \]


Part B

Given our added information that \(E[X_i] = \mu_X\), \(\text{Var}[X_i] = \sigma_X^2\) and \(E[Y_i] = \mu_Y\), \(\text{Var}[Y_i] = \sigma_Y^2\), we are interested in:

\[ \Delta = \mu_X - \mu_Y, \quad \hat{\Delta} = \bar{X}_N - \bar{Y}_M \]


Expected Value:

\[ E[\hat{\Delta}] = E[\bar{X}_N - \bar{Y}_M] = \mu_X - \mu_Y \]


Standard Error:

By properties of variance:

\[ \text{Var}[\hat{\Delta}] = \text{Var}[\bar{X}_N] + \text{Var}[\bar{Y}_M] = \frac{\sigma_X^2}{N} + \frac{\sigma_Y^2}{M} \]

So:

\[ SE[\hat{\Delta}] = \sqrt{ \frac{\sigma_X^2}{N} + \frac{\sigma_Y^2}{M} } \]