Why Does the World Look Linear?

Bob O'Hara

04 December, 2017

Short answer: I'm not sure

I want some help!

Help!

Why?

plot of chunk Irises

The world is non-linear

But linear models often work very well

A similar (?) problem: ANOVA

BIG main effects, smaller interactions

Linear Modelling of non-linear curves

Bill Venables: Exegeses on Linear Models

https://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf

A Taylor Series

A Taylor Series

And thus, to (low order) polynomials

Approximations

The implication of linearity working: a low-order polynomial is a good approximation to \( f(\cdot) \).

But why?

A Relevant Digression

Why are interactions less important in ANOVA?

  Obs row col           Y
1   1   1   1 -0.05500017
2   2   1   1  0.14942563
3   3   1   1  1.32273283
4   4   1   1 -0.49204261
5   5   1   1 -0.99309299
6   6   1   1 -1.54031970

An ANOVA

Df Sum Sq Mean Sq F value Pr(>F)
row 1 363.25 363.25 344.57 0.00
col 1 0.05 0.05 0.04 0.84
row:col 1 1.00 1.00 0.95 0.33
Residuals 396 417.47 1.05 NA NA

Lots of ANOVAs

N <- 100;  NRow <- 2;  NCol <- 2
SimulateANOVA <- function(N = 100, NRow = 2, NCol = 2) {
  NClass <- NRow*NCol
  mu <- matrix(rnorm(NClass), nrow = NRow)
  Grp <- rep(NClass, N)
  Data <- expand.grid(Obs = 1:N, row = 1:NRow, col = 1:NCol)
  Data$row <- factor(Data$row); Data$col <- factor(Data$col)
  Data$Y <- rnorm(nrow(Data), mu[Data$row,Data$col], 1)
  an <- anova(lm(Y~row*col, data = Data))
  (an['row', 'Mean Sq'] + an['row', 'Mean Sq'])/an['row:col', 'Mean Sq']
}

RepANOVA <- replicate(1e3, SimulateANOVA(N=100, NRow=2, NCol=2))

Main Effect MS/Interaction MS

plot of chunk PlotLotsofANOVAs

So...

We tend to get small interactions if the effects are independent.

Back to straight lines

A non-linear curve: only difference is size of noise

\[ y_i = \frac{x}{1+x} + \sigma \varepsilon_i \]

with \( \varepsilon_i \sim N(0,1) \)

What it looks like

plot of chunk PlotSimCurve

Baseline ANOVA

Small
Medium
Large
df SS F Pr(>F) SS F Pr(>F) SS F Pr(>F)
NA NA NA NA NA NA NA NA NA NA
1 0.65 9152.86 0.00 0.74 104.46 0.00 2.00 2.81 0.10
1 0.12 1689.78 0.00 0.12 16.62 0.00 0.10 0.14 0.71
1 0.02 324.92 0.00 0.03 4.43 0.04 0.19 0.26 0.61
1 0.00 61.77 0.00 0.01 1.09 0.30 0.09 0.13 0.72
1 0.00 2.47 0.12 0.01 1.79 0.18 1.88 2.65 0.11
1 0.00 12.62 0.00 0.04 5.35 0.02 3.40 4.79 0.03
1 0.00 4.77 0.03 0.02 2.75 0.10 1.83 2.58 0.11
1 0.00 0.02 0.88 0.00 0.01 0.94 0.01 0.01 0.92
1 0.00 0.06 0.81 0.00 0.12 0.73 0.09 0.12 0.73

Coefficients

plot of chunk PlotCubicEsts

As the residual error increases, the complexity of the "best" model decreases

plot of chunk PlotDegreeEsts

But a straight line?

Fit a model \( y = x^p \), and optimise \( p \)

plot of chunk PlotPowers

On average, inceases, but variance increases too

Difference between optimum & linear models decreases

plot of chunk PlotDeviance

So...

We have straight lines

More variance tends to favour fewer parameters

Increasing noise make one curve look more linear

In ANOVA, interaction tend to be less important

What Now

Could look at orthogonal polynomials?

  • but what parameters?

For a general result, need to summaries over all possible curves

  • Maximum Entropy?

Any Ideas?

Help!