From Nonlinear to Linear

Basically, statisticians don’t like anything that is nonlinear. Nonlinear functions like the exponential or the logarithm or the square root are common functions that we have to deal with. But since we don’t like nonlinear functions, a common strategy for us to find a way to approximate a nonlinear function with a linear function. The most common way to do that is with what’s called a Taylor Series Expansion. In particular, we use a first-order Taylor series expansion (statisticians call this the delta method).

The Taylor series expansion works like this. Suppose you have a function \(f(x)\), where the function \(f\) could be the log or the exponential, or any nonlinear function. Then we can approximate \(f(x)\) with the following equation

\[ f(x) \approx f(x_0) + f^\prime(x_0)(x - x_0) \]

where \(x_0\) is any arbitrary point that I pick. It could be 3 or whatever. However, the choice of \(x_0\) will determine how good of an appoximation this is. \(f^\prime(x_0)\) is the first derivative of \(f\) when it’s evaluated at the point \(x_0\).

Now notice that in the equation above, if you view it as a function of \(x\), it is a linear function (literally a line). If you think of the equation of a line as \(m x + b\), then \(m = f^\prime(x_0)\) and \(b = f(x_0) - f^\prime(x_0)x_0\) (if you do a little rearranging of terms). And remember, lines are good!

We can see how this works in practice. Here’s a picture of the log() function plotted from \(x=1\) to \(x=3\). Despite statisticians’ dislike of nonlinear functions, this is one of our favorites.

curve(log(x), 1, 3)

Notice that it’s not exactly a line; it’s a bit curved. However, it’s not that curved and you could kind of see it being close to some sort of line.

Recall that the first derivative of \(\log(x)\) is simply \(1 / x\), and so if we want to do a first-order Taylor series expansion of \(\log(x)\), we can do

\[ \log(x) \approx \log(x_0) + \frac{1}{x_0}(x - x_0) \]

where again, \(x_0\) is just something we choose. Obviously, we cannot choose \(x_0 = 0\) or we would be dividing by \(0\). But any other value \(> 0\) would be fine.

Below I show what happens if we choose \(x_0 = 1.5\). The black curve is the log function and the red line is the Taylor series linear approximation.

curve(log(x), 1, 3)
curve(log(1.5) + (1/1.5) * (x - 1.5), 1, 3, add = TRUE, col = 2, lwd = 2)

You’ll notice that at \(x = 1.5\) the two curves touch, so the red line is the tangent to the black curve. Right around \(x = 1.5\) you can see that the red line and black line are pretty close, meaning that the linear approximation is good. But around \(x = 2.5\), the two lines diverge and the approximation gets worse.

What would happen if I chose \(x_0 = 2\)? That gives us this plot.

curve(log(x), 1, 3)
curve(log(2) + (1/2) * (x - 2), 1, 3, add = TRUE, col = 2, lwd = 2)

Now you can see that the approximation is really good between about \(x = 1.7\) and \(x = 2.5\) or so. But it gets pretty bad around \(x = 1\). There’s no free lunch here. A linear approximation from a Taylor series expansion will only work in a small region of a function.

The problem that your mom and I were dealing with essentially goes like this. Suppose you collected some data on the air pollution levels in their homes. Let’s say there were 100 people. You could imagine that the data looked something like this histogram.

set.seed(1)
pollution <- rlnorm(100, 1, .9)
hist(pollution)
rug(pollution)

The black ticks on the bottom of the plot show what everyone’s levels were and the bars just group them into ranges. Two common numbers that are used to summarize this kind of data is the mean (or average) and standard deviation. Here, the mean is 4.12 and the standard deviation is 3.77. The standard deviation is particularly important for doing all kinds of other calcuations like sample size and power (which I won’t talk about here).

This plot is typical and you’ll notice that it doesn’t exactly have a nice “bell-shape”. However, if you take the logarithm of each data point and make a histogram, you get this:

hist(log(pollution))
rug(log(pollution))

Now that’s better! Not a perfect bell, but pretty close. Statisticians also like bell-shaped histograms. It make them happy, just like linear functions.

The question now is what is the standard deviation of this log-transformed dataset? It turns out that you can actually calculate this without using the log-transformed data. All you need is the mean and standard deviation of the original non-bell-shaped dataset (which were 4.12 and 3.77).

This gets a little hand-wavy now, but because we know that

\[ \log(x) \approx \log(x_0) + \frac{1}{x_0}(x - x_0) \]

we can say that (just trust me)

\[ SD(log(x)) \approx \frac{SD(x)}{mean(x)} \]

Or in other words

\[ SD(log(x)) \approx \frac{3.765876}{4.1188958} = 0.9142926 \]

The actual standard deviation of the log-transformed data is 0.8083794, so we are a little off. That’s the nature of an approximation!

This might all seem a bit pointless with this simple example, but there are many situations when we need to transform summary statistics of the data with a logarithm or some other nonlinear function and it’s not possible to go back to the data to recalculate the mean and standard deviation (maybe the dataset’s too big or we no longer have it). So these mathematical approximations are extremely useful. We use them all the time.

From Nonlinear to Linear

Roger Peng

February 12, 2015