## Packages used across the course. APPLIED code (Weeks 1 & 3) needs only the first four;
## the rest are for the Week 4 TEMPLATE chunks (not run on render).
# install.packages(c('tidyverse','lmtest','effectsize','MASS',
# 'PSweight','boot','magrittr','lubridate'))
library(MASS) # stepAIC() (load BEFORE tidyverse so dplyr::select wins)
library(tidyverse) # dplyr/ggplot2/readr
library(lmtest) # bptest()
library(effectsize) # eta_squared()
options(scipen = 99) # readable p-values, no scientific notationMid-term Exam Helper — All Weeks (1–4)
Causal Analytics for Business 2026 · applied to newspaper_ads.csv
What this document is. A week-by-week compendium of every R technique from Weeks 1–4 (in-class and tutorials), with interpretations and connections to the weekly content after each result — the same style as the class/tutorial notebooks. Open it with
newspaper_ads.csvin the same folder and Render, or run chunks with ▶.Two kinds of section:
- APPLIED (runs) — computed on
newspaper_ads.csv, prints real numbers, and is interpreted below the chunk. All of Week 1 and Week 3 are applied (this dataset is a two-factor experiment, the natural Week 3 setting).- TEMPLATE (
eval: false) — needs a data structure this dataset lacks (Week 2 needs a causal system of variables; Week 4 needs a treatment column + covariates). These hold the exact class code patterns + interpretation of what they would show.Only methods/libraries used in class appear here. Conceptual/DAG/formula MCQs →
MIDTERM_CHEATSHEET.md.⚠️ The data spells Thursday as
Thrusday— that label is literal; do not “fix” it.
0 · Setup
df <- read.csv("newspaper_ads.csv", stringsAsFactors = FALSE)
df$day <- factor(df$day) # keep "Thrusday" exactly as in the data
df$section <- factor(df$section)
str(df)'data.frame': 60 obs. of 3 variables:
$ day : Factor w/ 5 levels "Friday","Monday",..: 2 2 2 2 4 4 4 4 5 5 ...
$ section : Factor w/ 3 levels "Business","News",..: 3 3 3 3 3 3 3 3 3 3 ...
$ inquiries: int 4 3 5 6 5 8 6 7 5 9 ...
summary(df) day section inquiries
Friday :12 Business:20 Min. : 3.000
Monday :12 News :20 1st Qu.: 6.000
Thrusday :12 Sports :20 Median : 8.000
Tuesday :12 Mean : 8.333
Wednesday:12 3rd Qu.:10.000
Max. :14.000
table(df$day, df$section) # 5 days x 3 sections x 4 reps = 60
Business News Sports
Friday 4 4 4
Monday 4 4 4
Thrusday 4 4 4
Tuesday 4 4 4
Wednesday 4 4 4
Interpretation. The cross-tab is 4 in every cell → a perfectly balanced design (equal n per treatment combination) and, because each
sectionappears equally on eachday, an orthogonal one. These two design properties (Week 3) are what make the analysis clean: balance minimises the standard errors, and orthogonality means each factor’s effect can be estimated independently of the other (we verify this numerically in §3.5). Note the alphabetical baseline ordering —BusinessandFridaycome first, so they become the reference levels in every regression below.
WEEK 1 — Regression, model fit & diagnostics · APPLIED
Week 1 toolkit: correlation, outcome transformation, simple vs multiple OLS, R²/adj-R²/AIC, stepAIC, polynomial terms, standardised coefficients, residual normality, Breusch-Pagan, and prediction (MAE on a hold-out). Class datasets: football managers (in-class), HKRE apartments (tutorial). Same machinery, applied to inquiries.
1.1 Correlation & outcome distribution
df %>% select_if(is.numeric) %>% summary() # 'inquiries' is the only numeric column inquiries
Min. : 3.000
1st Qu.: 6.000
Median : 8.000
Mean : 8.333
3rd Qu.:10.000
Max. :14.000
par(mfrow = c(2, 2))
hist(df$inquiries, main = "inquiries (raw)", xlab = "inquiries")
hist(log(df$inquiries), main = "log(inquiries)", xlab = "log(inquiries)")
qqnorm(df$inquiries, main = "QQ raw"); qqline(df$inquiries)
qqnorm(log(df$inquiries), main = "QQ log"); qqline(log(df$inquiries))par(mfrow = c(1, 1))Interpretation. A correlation matrix is N/A —
cor()needs ≥2 numeric variables andinquiriesis the only one (the predictors are categorical). On the distribution: rawinquiriesis already fairly symmetric (mean 8.33 ≈ median 8, range 3–14), and the raw QQ points sit close to the line. Logging actually makes the small counts more skewed, so there is no case for a log transform here — the opposite of the HKREpriceexample, where the raw outcome was strongly right-skewed andlog(price)was clearly better. As in class, remember the decision rule: we transform to get normal residuals, not because of the raw outcome per se (we re-check residuals in §1.5).
1.2 Simple vs multiple regression, R² / adjusted R² / AIC
fit_section <- lm(inquiries ~ section, data = df)
fit_day <- lm(inquiries ~ day, data = df)
summary(fit_section)
Call:
lm(formula = inquiries ~ section, data = df)
Residuals:
Min 1Q Median 3Q Max
-5.9 -1.1 -0.1 1.9 5.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.1000 0.5632 16.158 <0.0000000000000002 ***
sectionNews -0.2000 0.7965 -0.251 0.8026
sectionSports -2.1000 0.7965 -2.637 0.0108 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.519 on 57 degrees of freedom
Multiple R-squared: 0.1294, Adjusted R-squared: 0.09883
F-statistic: 4.235 on 2 and 57 DF, p-value: 0.01928
summary(fit_day)
Call:
lm(formula = inquiries ~ day, data = df)
Residuals:
Min 1Q Median 3Q Max
-5.083 -1.500 0.000 1.500 4.917
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.9167 0.6378 17.116 < 0.0000000000000002 ***
dayMonday -2.8333 0.9020 -3.141 0.00271 **
dayThrusday -4.9167 0.9020 -5.451 0.00000122 ***
dayTuesday -2.4167 0.9020 -2.679 0.00972 **
dayWednesday -2.7500 0.9020 -3.049 0.00353 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.209 on 55 degrees of freedom
Multiple R-squared: 0.3535, Adjusted R-squared: 0.3065
F-statistic: 7.519 on 4 and 55 DF, p-value: 0.00006612
fit_both <- lm(inquiries ~ section + day, data = df)
summary(fit_both)
Call:
lm(formula = inquiries ~ section + day, data = df)
Residuals:
Min 1Q Median 3Q Max
-3.7500 -1.5667 0.1167 1.3542 4.1500
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.6833 0.6876 16.992 < 0.0000000000000002 ***
sectionNews -0.2000 0.6366 -0.314 0.75461
sectionSports -2.1000 0.6366 -3.299 0.00174 **
dayMonday -2.8333 0.8218 -3.448 0.00112 **
dayThrusday -4.9167 0.8218 -5.983 0.000000193 ***
dayTuesday -2.4167 0.8218 -2.941 0.00485 **
dayWednesday -2.7500 0.8218 -3.346 0.00151 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.013 on 53 degrees of freedom
Multiple R-squared: 0.4829, Adjusted R-squared: 0.4244
F-statistic: 8.249 on 6 and 53 DF, p-value: 0.000002536
AIC(fit_section, fit_day, fit_both)Interpretation. Each coefficient is a mean difference vs the baseline (the Week 1 reading of dummy-coded factors), e.g. in
fit_sectionthesectionSportscoefficient ≈ −2.1 says Sports averages ~2 fewer inquiries than Business. Going from a single predictor to both, Adjusted R² rises from ~0.13 (sectionalone) to ~0.42 (section + day) and AIC falls —daycarries a lot of independent signal thatsectionalone misses. This is the multiple-vs-simple-regression theme from Week 1: adding a relevant variable improves fit and can change how much “credit” each predictor gets. AIC is used here exactly as in class — for model comparison — and it is valid because all three models share the same outcome (inquiries).
1.3 Stepwise AIC selection (MASS::stepAIC)
full_mod <- lm(inquiries ~ section * day, data = df)
selected <- stepAIC(full_mod, direction = "both", trace = FALSE)
summary(selected)
Call:
lm(formula = inquiries ~ section * day, data = df)
Residuals:
Min 1Q Median 3Q Max
-2.2500 -0.8125 -0.1250 0.7500 2.7500
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.0000 0.6625 13.585 < 0.0000000000000002 ***
sectionNews 3.5000 0.9369 3.736 0.000525 ***
sectionSports 2.2500 0.9369 2.402 0.020518 *
dayMonday 2.5000 0.9369 2.668 0.010560 *
dayThrusday -1.2500 0.9369 -1.334 0.188854
dayTuesday -0.2500 0.9369 -0.267 0.790813
dayWednesday -0.5000 0.9369 -0.534 0.596192
sectionNews:dayMonday -6.7500 1.3250 -5.094 0.0000067127 ***
sectionSports:dayMonday -9.2500 1.3250 -6.981 0.0000000109 ***
sectionNews:dayThrusday -7.0000 1.3250 -5.283 0.0000035643 ***
sectionSports:dayThrusday -4.0000 1.3250 -3.019 0.004168 **
sectionNews:dayTuesday -2.0000 1.3250 -1.509 0.138170
sectionSports:dayTuesday -4.5000 1.3250 -3.396 0.001437 **
sectionNews:dayWednesday -2.7500 1.3250 -2.076 0.043684 *
sectionSports:dayWednesday -4.0000 1.3250 -3.019 0.004168 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.325 on 45 degrees of freedom
Multiple R-squared: 0.8098, Adjusted R-squared: 0.7506
F-statistic: 13.68 on 14 and 45 DF, p-value: 0.000000000007697
AIC(selected)[1] 218.7788
Interpretation.
stepAICkeeps the fullsection * dayinteraction model (AIC ≈ 218.8) — it does not drop the interaction. That is a strong, automatic signal that the two factors do not act additively here: the effect of one depends on the other. This contrasts with the football-manager and salary exercises, where stepwise selection often pruned terms. Caveat from Week 1:stepAICoptimises fit, so treat its output as a guide and sanity-check against theory — here it happens to agree with the formal interaction test in §3.5.
1.4 Polynomial / squared terms & standardised coefficients — why N/A here
## Both require a NUMERIC predictor; section & day are categorical. In class these were used
## on numeric predictors (area, age, exp):
## lm(log(price) ~ poly(area, 2, raw = TRUE) + ..., data = hkre) # squared terms
## lm(salary ~ male + factor(education) + exp + I(exp^2), data = df) # quadratic
## lm(damage ~ scale(attended) + scale(area) + scale(grade), data = d) # standardised
cat("Polynomial & standardised-coefficient methods need a numeric predictor — N/A here.\n")Polynomial & standardised-coefficient methods need a numeric predictor — N/A here.
Interpretation. Squared/
poly()terms capture curvature in a numeric predictor (diminishing returns toarea,exp²for salary), andscale()puts numeric coefficients on a common SD scale so you can rank “which variable matters most” (the firemen/HKRE standardised bar charts). Neither is meaningful forsection/day: a factor has no slope to bend and no units to standardise. The right Week 1 reading for categorical predictors is simply the per-level mean differences already shown in §1.2.
1.5 Residual diagnostics: normality + Breusch-Pagan
par(mfrow = c(1, 2))
hist(fit_both$residuals, main = "Residuals", xlab = "residuals")
qqnorm(fit_both$residuals); qqline(fit_both$residuals)par(mfrow = c(1, 1))
bptest(fit_both) # H0: constant residual variance. p<0.05 => heteroscedasticity
studentized Breusch-Pagan test
data: fit_both
BP = 10.818, df = 6, p-value = 0.09418
Interpretation. The residual histogram is roughly bell-shaped and the QQ points track the line, so the normality assumption looks acceptable. The Breusch-Pagan test gives p ≈ 0.094 > 0.05, so we fail to reject homoscedasticity — the residual variance is effectively constant and the standard errors / p-values can be trusted (the same “can we trust our SEs?” check that closes the firemen and salary exercises). No remedy (robust SEs, transformation) is needed here.
1.6 Prediction & MAE on a hold-out
set.seed(1)
train_idx <- sample(seq_len(nrow(df)), size = floor(0.7 * nrow(df)))
train <- df[train_idx, ]; test <- df[-train_idx, ]
fit_tr <- lm(inquiries ~ section, data = train) # section only: 3 levels always in split
pred <- predict(fit_tr, newdata = test)
mae <- mean(abs(test$inquiries - pred))
cat("Hold-out MAE (inquiries ~ section):", round(mae, 3), "\n")Hold-out MAE (inquiries ~ section): 2.35
Interpretation. This is the Week 1 prediction mindset (Shmueli’s explain vs predict): fit on a training split, score on unseen data with MAE = mean(|actual − predicted|). A section-only model is off by ≈ 2.35 inquiries on average — not great, because (as §1.2–§1.3 showed)
dayand the interaction carry most of the signal. Adding them would lower the MAE. The key Week 1 distinction: for prediction you keep any input that helps out-of-sample accuracy; for explanation you’d instead let the causal diagram decide what belongs in the model.
Week 1 function map:
cor()·hist()/qqnorm()/qqline()·lm()·summary()(R²/adj-R²/coeffs) ·AIC()(same outcome only) ·MASS::stepAIC()·I(x^2)/poly()&scale()(numeric only) ·bptest()·predict()+ MAE.
WEEK 2 — Causal diagrams & system of regressions · TEMPLATE
Week 2 = observational data: draw a causal diagram, decide what to control for (confounder / mediator / collider), run a system of regressions for path coefficients, compute total effects (product along a path, sum across paths) and omitted-variable bias. Class datasets: salary (in-class), firemen (tutorial).
Why template, not applied — and the connection. newspaper_ads.csv is a designed experiment: day and section were set by the design and are orthogonal, so there is no confounding structure to model — no back-door path, no mediator, no collider among the three columns. In fact the design guarantees the Week 2 story can’t appear: because the factors are orthogonal, controlling for one does not change the other’s coefficient (we demonstrate this in §3.5) — the exact opposite of omitted-variable bias. A Week 2 MCQ on this exam is therefore most likely a pen-&-paper DAG question (MIDTERM_CHEATSHEET.md). The code below is the class pattern, ready for a real variable system.
2.1 Control rules
| Role of Z (between X and Y) | Structure | Control it? |
|---|---|---|
| Confounder | Z → X and Z → Y | YES (closes a back-door) |
| Mediator | X → Z → Y | NO for the total effect (it blocks the path) |
| Collider | X → Z ← Y | NO (controlling creates spurious association) |
Total effect of X→Y = sum over directed paths; each path’s effect = product of its coefficients. To estimate a total effect in one regression: control variables that block all back-door paths but are not mediators/colliders (the answer can be None).
2.2 System of regressions + total effect via path products
## Salary in-class pattern: each node with incoming arrows gets its own regression.
modelA <- lm(salary ~ male + education + exp, data = df) # exp & education -> salary
model1 <- lm(education ~ age, data = df) # age -> education (path a)
model2 <- lm(exp ~ age, data = df) # age -> exp (path b)
indirect_via_education <- coef(model1)["age"] * coef(modelA)["education"]
indirect_via_exp <- coef(model2)["age"] * coef(modelA)["exp"]
indirect_via_education + indirect_via_exp # total effect of age on salary
lm(salary ~ age + male, data = df) # cross-check: control confounder, NOT mediatorsInterpretation (what it would show). The two routes agree: summing the path products equals the
agecoefficient from the single regression that controls the confounder but omits the mediators. That is the Week 2 rule made concrete — to get a total effect you leave mediators out (the path stays open); to get a direct effect you control them.
2.3 Omitted-variable bias (OVB)
## Firemen pattern: omitting confounder 'area' biases 'attended'.
correct <- lm(damage ~ attended + area, data = df) # area -> damage (direct)
reverse <- lm(area ~ attended, data = df) # attended -> area (open backdoor leg)
OVB <- coef(reverse)["attended"] * coef(correct)["area"]
biased <- lm(damage ~ attended, data = df) # omit 'area'
coef(biased)["attended"] # = correct + OVB
coef(correct)["attended"] + OVBInterpretation (what it would show). OVB = (effect of omitted var on outcome) × (effect of treatment on omitted var), and
biased = correct + OVBexactly. In the firemen case this is why the mayor was wrong: the spurious +24.5 “firemen cause damage” decomposes into the true protective effect (−12.4) plus the area-confounding bias (+36.9).
2.4 Multicollinearity check (manual VIF)
## VIF = 1/(1-R^2) from regressing each predictor on the others. >5 a concern, >10 severe.
predictors <- c("attended", "area", "trucks", "grade")
sapply(predictors, function(p) {
others <- setdiff(predictors, p)
1 / (1 - summary(lm(reformulate(others, response = p), data = df))$r.squared)
})Interpretation (what it would show). High VIFs (firemen: ≈28 for
attended/area) flag that predictors carry little independent variation — Week 2’s lesson that collinearity costs precision (wide CIs, unstable coefficients), not bias; the remedy is the causal diagram, not deleting a variable.
Week 2 function map:
lm()per node ·coef()for path coefficients · products/sums for total effects · OVB = product of the two omitted-path coefficients · manual VIF viareformulate()+1/(1-R²).
WEEK 3 — Experiments & ANOVA · APPLIED (core of this dataset)
newspaper_ads.csv is a balanced full-factorial design: section (3 levels) × day (5 levels), 4 replicates per cell — exactly the in-class paper-towels / store-design setting.
3.1 Balance & orthogonality
table(df$section, df$day)
Friday Monday Thrusday Tuesday Wednesday
Business 4 4 4 4 4
News 4 4 4 4 4
Sports 4 4 4 4 4
replications(inquiries ~ section * day, data = df) section day section:day
20 12 4
Interpretation.
replicationsreturns equal counts (section 20, day 12, each cell 4), confirming the design is balanced and orthogonal (Week 3). The payoff, just like the paper-towels exercise: balance gives the smallest possible standard errors, and orthogonality means adding the second factor will not disturb the first factor’s Sum Sq — only shrink the error term (shown next in §3.5).
3.2 Boxplots (distribution by group)
par(mfrow = c(1, 2))
boxplot(inquiries ~ section, data = df, main = "by section")
boxplot(inquiries ~ day, data = df, las = 2, main = "by day")par(mfrow = c(1, 1))
boxplot(inquiries ~ section + day, data = df, las = 2, cex.axis = 0.7, main = "by cell")Interpretation. The Week 3 “compare the distributions first” step. By section, Business (~9.1) and News (~8.9) sit above Sports (~7.0). By day, Friday (~10.9) is clearly highest and Thrusday (~6.0) lowest. The per-cell boxplot already hints at an interaction — the section ranking is not the same on every day (e.g. Sports is strong on Friday but weakest on Monday), which the formal test in §3.5 confirms.
3.3 Two-group t-tests (Welch, var.equal = FALSE)
ttest_pair <- function(data, var, group, lvl1, lvl2) {
sub <- data[data[[group]] %in% c(lvl1, lvl2), ]
sub[[group]] <- factor(sub[[group]])
cat("\n----", lvl1, "vs", lvl2, "----\n")
print(t.test(sub[[var]] ~ sub[[group]], var.equal = FALSE))
}
for (p in combn(levels(df$section), 2, simplify = FALSE)) {
ttest_pair(df, "inquiries", "section", p[1], p[2])
}
---- Business vs News ----
Welch Two Sample t-test
data: sub[[var]] by sub[[group]]
t = 0.254, df = 30.172, p-value = 0.8012
alternative hypothesis: true difference in means between group Business and group News is not equal to 0
95 percent confidence interval:
-1.407702 1.807702
sample estimates:
mean in group Business mean in group News
9.1 8.9
---- Business vs Sports ----
Welch Two Sample t-test
data: sub[[var]] by sub[[group]]
t = 3.0195, df = 33.401, p-value = 0.004824
alternative hypothesis: true difference in means between group Business and group Sports is not equal to 0
95 percent confidence interval:
0.6856928 3.5143072
sample estimates:
mean in group Business mean in group Sports
9.1 7.0
---- News vs Sports ----
Welch Two Sample t-test
data: sub[[var]] by sub[[group]]
t = 2.125, df = 36.926, p-value = 0.04035
alternative hypothesis: true difference in means between group News and group Sports is not equal to 0
95 percent confidence interval:
0.0881928 3.7118072
sample estimates:
mean in group News mean in group Sports
8.9 7.0
for (p in combn(levels(df$day), 2, simplify = FALSE)) ttest_pair(df, "inquiries", "day", p[1], p[2])Interpretation. Welch t-tests on the section pairs: Business vs Sports is significant (t ≈ 3.02, p ≈ 0.005), News vs Sports is borderline (t ≈ 2.13, p ≈ 0.040), and Business vs News is not (p ≈ 0.80). Recall the Week 3 equivalence: for a 2-group comparison the t-test =
lm= one-way ANOVA give the same p-value. But note these are uncorrected pairwise tests — running several inflates the false-positive rate, which is exactly why we move to the omnibus ANOVA + Tukey correction (§3.4, §3.6).
3.4 One-way ANOVA (each factor) + effect size
aov_section <- aov(inquiries ~ section, data = df); summary(aov_section) Df Sum Sq Mean Sq F value Pr(>F)
section 2 53.7 26.867 4.235 0.0193 *
Residuals 57 361.6 6.344
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
eta_squared(aov_section)aov_day <- aov(inquiries ~ day, data = df); summary(aov_day) Df Sum Sq Mean Sq F value Pr(>F)
day 4 146.8 36.71 7.519 0.0000661 ***
Residuals 55 268.5 4.88
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
eta_squared(aov_day)Interpretation. The F-test is between-group variance ÷ within-group variance (signal/noise).
section: F(2,57) ≈ 4.24, p ≈ 0.019 (η² ≈ 0.13) andday: F(4,55) ≈ 7.52, p ≈ 6.6e-5 (η² ≈ 0.35) — both factors matter, anddayis the stronger driver (over twice the variance explained). η² is the Week 3 effect-size analogue of R²: it tells you about magnitude, which the p-value alone does not.
3.5 Two-way ANOVA: main effects, then interaction
m_main <- aov(inquiries ~ section + day, data = df)
summary(m_main); AIC(m_main); eta_squared(m_main) Df Sum Sq Mean Sq F value Pr(>F)
section 2 53.73 26.87 6.630 0.00269 **
day 4 146.83 36.71 9.059 0.0000119 ***
Residuals 53 214.77 4.05
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] 262.7851
m_int <- aov(inquiries ~ section * day, data = df)
summary(m_int); AIC(m_int); eta_squared(m_int) Df Sum Sq Mean Sq F value Pr(>F)
section 2 53.73 26.87 15.304 0.000008502752 ***
day 4 146.83 36.71 20.910 0.000000000852 ***
section:day 8 135.77 16.97 9.667 0.000000112469 ***
Residuals 45 79.00 1.76
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] 218.7788
Interpretation — orthogonality in action. Compare the Sum Sq with §3.4: in the two-way model
sectionstill has Sum Sq ≈ 53.7 anddaystill ≈ 146.8 — identical to their one-way values (this is orthogonality: no shared variance to fight over). What changes is the error term, which shrinks from 361.6 (one-way section) to 214.8, so the F-values rise (section 4.24 → 6.63, day 7.52 → 9.06) and p-values drop. Each factor “soaks up noise” for the other → more precision. This is the precise lesson of the in-class paper-towels / store-design exercises.Interpretation — the interaction (the big contrast with class). The
section:dayinteraction is highly significant: F(8,45) ≈ 9.67, p ≈ 1.1e-7, with a large partial η² ≈ 0.63, and AIC drops sharply (262.8 → 218.8) when it is added. This is the opposite of the paper-towels and promotion exercises, where the interaction was not significant and the additive model won. Here the additive model is not adequate: the effect ofsectiongenuinely depends onday(Sports is strong on Friday but worst on Monday; News peaks on Friday and bottoms out on Thrusday). Practical consequence (Week 3): when the interaction is significant you should read the cell means, not the main effects in isolation — averaging over the other factor can mislead.
interaction.plot(df$day, df$section, df$inquiries,
xlab = "day", ylab = "mean inquiries", trace.label = "section")Interpretation. Always plot the interaction (Week 3) — the interaction is the whole pattern, no single number shows it. Here the three section lines cross rather than run parallel, the visual signature of the significant interaction from §3.5. Parallel lines would have meant “no interaction” (the class examples); crossing lines mean the best section changes from day to day.
3.6 Post-hoc — Tukey HSD
TukeyHSD(aov_section) Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = inquiries ~ section, data = df)
$section
diff lwr upr p adj
News-Business -0.2 -2.116672 1.71667235 0.9658594
Sports-Business -2.1 -4.016672 -0.18332765 0.0286002
Sports-News -1.9 -3.816672 0.01667235 0.0525040
TukeyHSD(aov_day) Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = inquiries ~ day, data = df)
$day
diff lwr upr p adj
Monday-Friday -2.83333333 -5.37731936 -0.2893473 0.0218355
Thrusday-Friday -4.91666667 -7.46065269 -2.3726806 0.0000118
Tuesday-Friday -2.41666667 -4.96065269 0.1273194 0.0702094
Wednesday-Friday -2.75000000 -5.29398603 -0.2060140 0.0279337
Thrusday-Monday -2.08333333 -4.62731936 0.4606527 0.1573669
Tuesday-Monday 0.41666667 -2.12731936 2.9606527 0.9903820
Wednesday-Monday 0.08333333 -2.46065269 2.6273194 0.9999830
Tuesday-Thrusday 2.50000000 -0.04398603 5.0439860 0.0563243
Wednesday-Thrusday 2.16666667 -0.37731936 4.7106527 0.1301276
Wednesday-Tuesday -0.33333333 -2.87731936 2.2106527 0.9959087
TukeyHSD(m_main) Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = inquiries ~ section + day, data = df)
$section
diff lwr upr p adj
News-Business -0.2 -1.734936 1.3349362 0.9470957
Sports-Business -2.1 -3.634936 -0.5650638 0.0048701
Sports-News -1.9 -3.434936 -0.3650638 0.0117434
$day
diff lwr upr p adj
Monday-Friday -2.83333333 -5.1540343 -0.51263240 0.0094385
Thrusday-Friday -4.91666667 -7.2373676 -2.59596574 0.0000019
Tuesday-Friday -2.41666667 -4.7373676 -0.09596574 0.0373566
Wednesday-Friday -2.75000000 -5.0707009 -0.42929907 0.0125928
Thrusday-Monday -2.08333333 -4.4040343 0.23736760 0.0981402
Tuesday-Monday 0.41666667 -1.9040343 2.73736760 0.9863219
Wednesday-Monday 0.08333333 -2.2373676 2.40403426 0.9999753
Tuesday-Thrusday 2.50000000 0.1792991 4.82070093 0.0287660
Wednesday-Thrusday 2.16666667 -0.1540343 4.48736760 0.0780657
Wednesday-Tuesday -0.33333333 -2.6540343 1.98736760 0.9941367
Interpretation. Tukey HSD gives family-wise-corrected pairwise comparisons (the fix for the multiple-testing problem flagged in §3.3). For
section, only Sports−Business survives (diff −2.1, p ≈ 0.029); Sports−News is now p ≈ 0.053 — just misses, even though its uncorrected t-test in §3.3 was p ≈ 0.040. That gap is the exact Week 3 point: one planned contrast vs many unplanned ones — correction makes the borderline result non-significant. Forday, the big gaps from Friday are significant (e.g. Thrusday−Friday ≈ −4.9, p ≈ 1e-5). (Caveat: with a significant interaction, these main-effect contrasts are “averaged over” the other factor — interpret with the cell means in mind.)
3.7 Regression (lm) view + diagnostics
lm_main <- lm(inquiries ~ section + day, data = df); summary(lm_main)
Call:
lm(formula = inquiries ~ section + day, data = df)
Residuals:
Min 1Q Median 3Q Max
-3.7500 -1.5667 0.1167 1.3542 4.1500
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.6833 0.6876 16.992 < 0.0000000000000002 ***
sectionNews -0.2000 0.6366 -0.314 0.75461
sectionSports -2.1000 0.6366 -3.299 0.00174 **
dayMonday -2.8333 0.8218 -3.448 0.00112 **
dayThrusday -4.9167 0.8218 -5.983 0.000000193 ***
dayTuesday -2.4167 0.8218 -2.941 0.00485 **
dayWednesday -2.7500 0.8218 -3.346 0.00151 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.013 on 53 degrees of freedom
Multiple R-squared: 0.4829, Adjusted R-squared: 0.4244
F-statistic: 8.249 on 6 and 53 DF, p-value: 0.000002536
lm_int <- lm(inquiries ~ section * day, data = df); summary(lm_int)
Call:
lm(formula = inquiries ~ section * day, data = df)
Residuals:
Min 1Q Median 3Q Max
-2.2500 -0.8125 -0.1250 0.7500 2.7500
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.0000 0.6625 13.585 < 0.0000000000000002 ***
sectionNews 3.5000 0.9369 3.736 0.000525 ***
sectionSports 2.2500 0.9369 2.402 0.020518 *
dayMonday 2.5000 0.9369 2.668 0.010560 *
dayThrusday -1.2500 0.9369 -1.334 0.188854
dayTuesday -0.2500 0.9369 -0.267 0.790813
dayWednesday -0.5000 0.9369 -0.534 0.596192
sectionNews:dayMonday -6.7500 1.3250 -5.094 0.0000067127 ***
sectionSports:dayMonday -9.2500 1.3250 -6.981 0.0000000109 ***
sectionNews:dayThrusday -7.0000 1.3250 -5.283 0.0000035643 ***
sectionSports:dayThrusday -4.0000 1.3250 -3.019 0.004168 **
sectionNews:dayTuesday -2.0000 1.3250 -1.509 0.138170
sectionSports:dayTuesday -4.5000 1.3250 -3.396 0.001437 **
sectionNews:dayWednesday -2.7500 1.3250 -2.076 0.043684 *
sectionSports:dayWednesday -4.0000 1.3250 -3.019 0.004168 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.325 on 45 degrees of freedom
Multiple R-squared: 0.8098, Adjusted R-squared: 0.7506
F-statistic: 13.68 on 14 and 45 DF, p-value: 0.000000000007697
bptest(lm_main); bptest(lm_int)
studentized Breusch-Pagan test
data: lm_main
BP = 10.818, df = 6, p-value = 0.09418
studentized Breusch-Pagan test
data: lm_int
BP = 11.831, df = 14, p-value = 0.6199
Interpretation.
aov()andlm()are the same model: ANOVA reports one F-test per factor, OLS one t-test per level vs the baseline (Business on Friday, intercept ≈ 11.7). The additivelmhas Adjusted R² ≈ 0.42; adding the interaction lifts it to ≈ 0.75 — consistent with §3.5 and withstepAICkeeping the interaction. Breusch-Pagan is non-significant for both models (main p ≈ 0.094, interaction p ≈ 0.62), so homoscedasticity holds and the SEs are trustworthy — the Week 3 “can we trust the inference?” check passes.
3.8 Log-transformed outcome (optional remedy)
lm_log <- lm(log(inquiries) ~ section + day, data = df)
summary(lm_log); bptest(lm_log)
Call:
lm(formula = log(inquiries) ~ section + day, data = df)
Residuals:
Min 1Q Median 3Q Max
-0.7204 -0.1916 0.0371 0.2023 0.4383
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.50574 0.09410 26.627 < 0.0000000000000002 ***
sectionNews -0.07610 0.08712 -0.874 0.386320
sectionSports -0.30771 0.08712 -3.532 0.000864 ***
dayMonday -0.37905 0.11248 -3.370 0.001408 **
dayThrusday -0.62859 0.11248 -5.589 0.000000811 ***
dayTuesday -0.26660 0.11248 -2.370 0.021444 *
dayWednesday -0.29936 0.11248 -2.662 0.010274 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2755 on 53 degrees of freedom
Multiple R-squared: 0.4637, Adjusted R-squared: 0.403
F-statistic: 7.638 on 6 and 53 DF, p-value: 0.000006185
studentized Breusch-Pagan test
data: lm_log
BP = 7.4916, df = 6, p-value = 0.2778
Interpretation. Logging the outcome is a general Week 3 remedy for skew/ heteroscedasticity — but here it is unnecessary (residuals were already well-behaved, §1.1/§1.5). Two class caveats apply: coefficients now read as approximate % effects, and the AIC of a log-outcome model is NOT comparable to the level model (different outcome scale) — judge transforms on residual diagnostics, not AIC.
Week 3 function map:
table()/replications()→ balance & orthogonality ·boxplot()→ distributions ·t.test()→ AB test ·aov()→ F per factor ·lm()→ t per level ·interaction.plot()→ interactions ·TukeyHSD()→ corrected pairwise ·effectsize::eta_squared()→ effect size ·bptest()/qqnorm()→ assumptions ·AIC()→ model comparison (same outcome only).
Week 3 takeaway for this dataset. Both factors significantly affect inquiries,
daymore thansection, and there is a strong section×day interaction — so the design’s story is in the cell means: the best combinations are News on Friday (≈12.5) and Business on Monday (≈11.5); the worst are News on Thrusday (≈4.25) and Sports on Monday (≈4.5). Balance + orthogonality kept the estimates clean, and the assumption checks mean we can trust the inference. The methodological headline vs the class examples: here the interaction matters, so don’t stop at the main effects.
WEEK 4 — Matching, propensity scores & IPWE · TEMPLATE
Week 4 estimates causal effects from observational data via stratification, exact matching, and propensity-score weighting (IPWE). Class datasets: Titanic (in-class), job training RCT vs CPS (in-class), catholic schools (tutorial).
Why template, and the connection. Every Week 4 method needs a binary treatment D and covariates — newspaper_ads.csv has neither, so there is literally nothing to build a propensity score on (ps = P(D=1|X) requires a D). Conceptually this dataset sits at the top of the method pyramid (a clean designed experiment), whereas Week 4 lives at the bottom (messy observational data). The code below is the exact class workflow; paste in a real treat/outcome/covariates to use it.
Estimands & IPWE weights (non-normalised):
- τ_ATE = π·τ_ATT + (1−π)·τ_ATU; SMD = ATE + pre-treatment het. + treatment-effect het.
| Estimand | Treated (D=1) | Control (D=0) |
|---|---|---|
| ATE | 1/ps |
1/(1-ps) |
| ATT | 1 |
ps/(1-ps) |
4.1 Naïve SMD vs regression adjustment
naive <- lm(outcome ~ D, data = df); summary(naive) # simple mean difference
adj <- lm(outcome ~ D + x1 + x2 + x3, data = df); summary(adj) # regression adjustmentInterpretation (what it would show). A large gap between
naiveandadjis the signature of confounding (Titanic: 0.354 → 0.224 oncemaleis added; job training: −$8497 → +$1066). If the two barely differ — as in the Week 3 WallMonkeys experiment — that itself is evidence randomisation worked (controls change precision, not the estimate).
4.2 Balance & common support (the testable assumption)
library(magrittr)
table(df$D) # treatment-level balance
df %>% group_by(D, x_categorical) %>% # covariate balance (proportions)
summarise(cnt = n()) %>% mutate(prop = round(cnt / sum(cnt), 3))
df %>% mutate(grp = ifelse(D == 1, "Treatment", "Control")) %>% # common support (overlap)
ggplot(aes(x = x_continuous, group = grp, colour = grp, fill = grp)) +
geom_density(alpha = 0.2) + theme(legend.position = "bottom")Interpretation (what it would show). Common support / overlap is the one assumption we can test (ignorability we cannot). Where the two densities overlap is the only region with “doppelgangers” to match; little overlap (job training
re75) → positivity is threatened and we must trim.
4.3 Propensity score (logistic nuisance) + stepAIC
glm.ps <- glm(D ~ x1 + x2 + x3, data = df, family = binomial(link = "logit"))
selected.ps <- stepAIC(glm.ps, direction = "both") # guide only — keep true confounders
df$ps <- predict(selected.ps, type = "response")
df.trim <- df %>% filter(ps >= 0.1 & ps <= 0.9) # trim extreme PS (explosive weights)Interpretation (what it would show). The PS collapses all covariates into one number (curse-of-dimensionality fix). Goal is overlap, not predictive accuracy — don’t drop a genuine confounder just because it predicts treatment poorly. Trimming matters because extreme PS create explosive inverse weights (the job-training
−348blow-up).
4.4 Manual IPWE — ATE and ATT (non-normalised)
library(magrittr)
df.trim %>% mutate(y1 = D*outcome/ps, y0 = (1-D)*outcome/(1-ps), ate = y1 - y0) %>%
pull(ate) %>% mean() # ATE: 1/ps , 1/(1-ps)
n_treated <- sum(df.trim$D)
df.trim %>% mutate(y1 = D*outcome, y0 = (1-D)*outcome*ps/(1-ps), att = y1 - y0) %>%
pull(att) %>% sum() / n_treated # ATT: 1 , ps/(1-ps)Interpretation (what it would show). Everything is a weighted mean: each unit is weighted by 1/(prob of the status it actually got). ATT often differs from ATE (treatment-effect heterogeneity) — Titanic ATT 0.237 > ATE 0.190; job training ATT ≈ $1943 > ATE ≈ $1637.
4.5 Normalised IPWE weights (Hirano–Imbens)
N <- nrow(df.trim)
df.trim %<>% mutate(d1 = D/ps, d0 = (1-D)/(1-ps))
adj1 <- sum(df.trim$d1)/N; adj0 <- sum(df.trim$d0)/N
df.trim %>% mutate(y1 = (D*outcome/ps)/adj1, y0 = ((1-D)*outcome/(1-ps))/adj0,
ate.norm = y1 - y0) %>% pull(ate.norm) %>% mean()Interpretation (what it would show). Raw inverse weights don’t sum to N; rescaling each group by its adjustment factor = (Σ inverse weights)/N tames extreme weights and stabilises the estimate (the Week 4 normalisation step;
PSweight’s default).
4.6 PSweight — IPWE + love plots + bootstrap SE
library(PSweight)
ps.formula <- D ~ x1 + x2 + x3
bal <- SumStat(ps.formula = ps.formula, data = df.trim, weight = c("IPW"))
plot(bal, type = "balance"); plot(bal, type = "hist") # love plot: aim |d| < 0.1
summary(PSweight(ps.formula, yname = "outcome", data = df.trim, weight = "IPW")) # ATE
summary(PSweight(ps.formula, yname = "outcome", data = df.trim, weight = "treated")) # ATT
summary(PSweight(ps.formula, yname = "outcome", bootstrap = TRUE, R = 100,
data = df.trim, weight = "IPW")) # bootstrap SEInterpretation (what it would show). The love plot is how you choose a PS model — the spec that balances all covariates (|standardised mean diff| < 0.1) is the one to trust, not the one closest to any benchmark (you never see the RCT in real life). Use the bootstrap SE when the analytical SE warns.
4.7 Stratification + bootstrap; exact-match feasibility
library(magrittr); library(boot)
df %<>% mutate(subclass = case_when(x == 1 ~ 1, x == 0 ~ 2, TRUE ~ 0))
strata.ate <- function(d) {
ey11 <- d %>% filter(subclass==1 & D==1) %$% mean(outcome)
ey10 <- d %>% filter(subclass==1 & D==0) %$% mean(outcome)
ey21 <- d %>% filter(subclass==2 & D==1) %$% mean(outcome)
ey20 <- d %>% filter(subclass==2 & D==0) %$% mean(outcome)
wt1 <- d %>% filter(subclass==1) %$% (nrow(.)/nrow(d)) # ATE weight = stratum share
wt2 <- d %>% filter(subclass==2) %$% (nrow(.)/nrow(d))
(ey11-ey10)*wt1 + (ey21-ey20)*wt2
}
strata.ate(df)
boot(df, function(data, i) strata.ate(data[i, ]), R = 100) # bootstrap SE
chk <- df %>% group_by(x1, x2, x3) %>% summarise(n = n(), .groups = "keep") %>%
ungroup() %>% complete(x1, x2, x3) %>% as.data.frame()
any(is.na(chk)) # TRUE => exact match infeasibleInterpretation (what it would show). Stratification = post-hoc blocking: compare within confounder strata, weight by stratum share (ATE) or treated share (ATT).
any(is.na())TRUE means some covariate combination has no counterfactual → exact matching fails (curse of dimensionality) → use the propensity score instead. Hand-built estimators have no closed-form SE, so we bootstrap.
Week 4 function map:
lm()(SMD/adjustment) ·group_by + summarise(prop)&geom_density()(balance/overlap) ·glm(..., family=binomial)+stepAIC()(PS) · manual IPWE with the weight table · normalising adjustment factor ·PSweight(weight="IPW"/"treated")+SumStat()love plots ·case_when()/%$%(stratification) ·boot()(SE) ·complete()+any(is.na())(exact-match feasibility).
Conceptual / DAG / formula MCQs → MIDTERM_CHEATSHEET.md. The companion .R script prints the Week-1 & Week-3 numbers under searchable banners for fast console lookup.