Question(s) on interrupted time series

Le Kang
June 26, 2025

BCL Summer 2025

The classical pre vs post analysis

  • The (naïve) 2-sample independent z-test/t-test

  • The Mann-Whitney U tes(Wilcoxon rank-sum test)

  • The 2-sample paired t-test

  • The Wilcoxon signed-rank test

An example

ex1 <- read.csv("ex1.csv",header = T)
head(ex1,n = 10)
   Obs Year Month Infection.Rate Trt
1    1 2011     1       6.823340 Pre
2    2 2011     2       7.661713 Pre
3    3 2011     3       6.071718 Pre
4    4 2011     4       6.907877 Pre
5    5 2011     5       7.053463 Pre
6    6 2011     6       7.075412 Pre
7    7 2011     7       5.808710 Pre
8    8 2011     8       4.429035 Pre
9    9 2011     9       7.966634 Pre
10  10 2011    10       5.677933 Pre

file:ex1.csv

Histogram

plot of chunk unnamed-chunk-2plot of chunk unnamed-chunk-2

Statistical inferences

t.test(Infection.Rate[Trt=="Pre"],Infection.Rate[Trt=="Post"],paired = F)

    Welch Two Sample t-test

data:  Infection.Rate[Trt == "Pre"] and Infection.Rate[Trt == "Post"]
t = -0.6961, df = 97.888, p-value = 0.488
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.9752373  0.4687357
sample estimates:
mean of x mean of y 
 4.381114  4.634365 

SStatistical inferences (cont'd)

wilcox.test(Infection.Rate[Trt=="Pre"],Infection.Rate[Trt=="Post"],paired = F)

    Wilcoxon rank sum test with continuity correction

data:  Infection.Rate[Trt == "Pre"] and Infection.Rate[Trt == "Post"]
W = 1164, p-value = 0.5556
alternative hypothesis: true location shift is not equal to 0

So whether parametric or nonparametric, we have non-significance.

A second example

ex2 <- read.csv("ex2.csv",header = T)
head(ex2,n = 10)
   Obs Year Month Infection.Rate Trt
1    1 2011     1      10.520862 Pre
2    2 2011     2      12.008921 Pre
3    3 2011     3       9.380510 Pre
4    4 2011     4      10.848942 Pre
5    5 2011     5       9.037392 Pre
6    6 2011     6      10.915647 Pre
7    7 2011     7      10.082868 Pre
8    8 2011     8       9.074189 Pre
9    9 2011     9      10.621038 Pre
10  10 2011    10      10.161518 Pre

file:ex2.csv

Histogram

plot of chunk unnamed-chunk-6plot of chunk unnamed-chunk-6

Statistical inferences

t.test(Infection.Rate[Trt=="Pre"],Infection.Rate[Trt=="Post"],paired = F)

    Welch Two Sample t-test

data:  Infection.Rate[Trt == "Pre"] and Infection.Rate[Trt == "Post"]
t = 13.707, df = 97.997, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 4.167357 5.578359
sample estimates:
mean of x mean of y 
 8.490436  3.617578 

Statistical inferences (cont'd)

wilcox.test(Infection.Rate[Trt=="Pre"],Infection.Rate[Trt=="Post"],paired = F)

    Wilcoxon rank sum test with continuity correction

data:  Infection.Rate[Trt == "Pre"] and Infection.Rate[Trt == "Post"]
W = 2435, p-value = 3.195e-16
alternative hypothesis: true location shift is not equal to 0

So whether parametric or nonparametric, we have statistical significance.

Data display when adding an additional dimension - time

plot of chunk unnamed-chunk-9plot of chunk unnamed-chunk-9

If these data points are for some disease infection counts, we think there is no interesting story for Ex.1?

What are the problems just comparing pre vs post with respect to mean or median?

Piecewise linear regression

plot of chunk unnamed-chunk-10plot of chunk unnamed-chunk-10