Why Does the World Look Linear?

Bob O'Hara

04 December, 2017

Short answer: I'm not sure

I want some help!

Help!

Why?

plot of chunk Irises

The world is non-linear

But linear models often work very well

A similar (?) problem: ANOVA

BIG main effects, smaller interactions

Linear Modelling of non-linear curves

Bill Venables: Exegeses on Linear Models

https://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf

A Taylor Series

And thus, to (low order) polynomials

Approximations

The implication of linearity working: a low-order polynomial is a good approximation to \( f(\cdot) \).

But why?

A Relevant Digression

Why are interactions less important in ANOVA?

  Obs row col           Y
1   1   1   1 -0.05500017
2   2   1   1  0.14942563
3   3   1   1  1.32273283
4   4   1   1 -0.49204261
5   5   1   1 -0.99309299
6   6   1   1 -1.54031970

An ANOVA

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
row	1	363.25	363.25	344.57	0.00
col	1	0.05	0.05	0.04	0.84
row:col	1	1.00	1.00	0.95	0.33
Residuals	396	417.47	1.05	NA	NA

Lots of ANOVAs

N <- 100;  NRow <- 2;  NCol <- 2
SimulateANOVA <- function(N = 100, NRow = 2, NCol = 2) {
  NClass <- NRow*NCol
  mu <- matrix(rnorm(NClass), nrow = NRow)
  Grp <- rep(NClass, N)
  Data <- expand.grid(Obs = 1:N, row = 1:NRow, col = 1:NCol)
  Data$row <- factor(Data$row); Data$col <- factor(Data$col)
  Data$Y <- rnorm(nrow(Data), mu[Data$row,Data$col], 1)
  an <- anova(lm(Y~row*col, data = Data))
  (an['row', 'Mean Sq'] + an['row', 'Mean Sq'])/an['row:col', 'Mean Sq']
}

RepANOVA <- replicate(1e3, SimulateANOVA(N=100, NRow=2, NCol=2))

Main Effect MS/Interaction MS

plot of chunk PlotLotsofANOVAs

So...

We tend to get small interactions if the effects are independent.

Back to straight lines

A non-linear curve: only difference is size of noise

\[ y_i = \frac{x}{1+x} + \sigma \varepsilon_i \]

with \( \varepsilon_i \sim N(0,1) \)

What it looks like

plot of chunk PlotSimCurve

Baseline ANOVA

	Small			Medium			Large
df	SS	F	Pr(>F)	SS	F	Pr(>F)	SS	F	Pr(>F)
NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
1	0.65	9152.86	0.00	0.74	104.46	0.00	2.00	2.81	0.10
1	0.12	1689.78	0.00	0.12	16.62	0.00	0.10	0.14	0.71
1	0.02	324.92	0.00	0.03	4.43	0.04	0.19	0.26	0.61
1	0.00	61.77	0.00	0.01	1.09	0.30	0.09	0.13	0.72
1	0.00	2.47	0.12	0.01	1.79	0.18	1.88	2.65	0.11
1	0.00	12.62	0.00	0.04	5.35	0.02	3.40	4.79	0.03
1	0.00	4.77	0.03	0.02	2.75	0.10	1.83	2.58	0.11
1	0.00	0.02	0.88	0.00	0.01	0.94	0.01	0.01	0.92
1	0.00	0.06	0.81	0.00	0.12	0.73	0.09	0.12	0.73

Coefficients

plot of chunk PlotCubicEsts

As the residual error increases, the complexity of the "best" model decreases

plot of chunk PlotDegreeEsts

But a straight line?

Fit a model \( y = x^p \), and optimise \( p \)

plot of chunk PlotPowers

On average, inceases, but variance increases too

Difference between optimum & linear models decreases

plot of chunk PlotDeviance

So...

We have straight lines

More variance tends to favour fewer parameters

Increasing noise make one curve look more linear

In ANOVA, interaction tend to be less important

What Now

Could look at orthogonal polynomials?

but what parameters?

For a general result, need to summaries over all possible curves

Maximum Entropy?

Any Ideas?

Help!