Making Non-Linear Data Linear

Substitution Method

Inverse square relationships occur frequently in nature. Often times in situations where energy propagates with spherical symmetry (isotropically). Examples include how a light bulb’s brightness decreases with distance, electrostatic forces, and gravitational forces:

\[ B = \frac{L}{4\pi r^2} \space\text{Luminosity-Brightness} \\ F_e = \frac{q_1 q_2}{4\pi\epsilon_0 r^2} \space\text{Electrostatic Force} \\ F_g = \frac{G m_1 m_2}{ r^2} \space\text{Gravitational Force} \\ \]

Note that \(r\) denotes radial distance from the object of concern, q represents electric charge, m represents mass, \(\epsilon_0\) is electric permittivity in vacuum, G is the Universal Gravitation Constant.

The data plotted look like this. We will use The electrostatic force equation with \(q_1 = q_2 = 10^{-6} C\), and r in centimeters. Note \(\frac{1}{4\pi\epsilon_0} = 9*10^9 \frac{Nm^2}{C^2}\). I chose this example so that slope when the data are linearized will not be 1.

r <- c(1:10)/100
in_sq = (9*10^-3)/r^2
plot(r, in_sq, xlab = "r (m)", ylab = "F (N)")

If you try to do a linear regression:

fit <- lm(in_sq ~ r)
summary(fit)

## 
## Call:
## lm(formula = in_sq ~ r)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.206 -13.252  -5.676   7.073  48.587 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    47.52      14.78   3.214   0.0123 *
## r            -610.34     238.23  -2.562   0.0335 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.64 on 8 degrees of freedom
## Multiple R-squared:  0.4507, Adjusted R-squared:  0.382 
## F-statistic: 6.563 on 1 and 8 DF,  p-value: 0.03355

plot(r, in_sq, xlab = "r (m)", ylab = "F (N)")
abline(fit, col='blue')

hist(resid(fit))

qqnorm((resid(fit)))
qqline(resid(fit))

plot(fitted(fit),resid(fit))

You do get a statistically significant fit to the data, but you’ll note that the model violated our linear regression assumptions.

Let’s use an algebraic substitution to make the equation look linear: \[ x = \frac{1}{r^2} \\ F_e = 9*10^{-3} x \]

x <- 1/(c(1:10)/100)^2
in_sq = (9*10^-3)*x
plot(x, in_sq, xlab = "1/r^2 (m^-2)", ylab = "F (N)")

You’ll note that our force calculation has not changed, but the distance calculation has because we are plotting \(\frac{1}{{r^2}\) instead of \(r\). But, the plot looks very linear.

fit2 <- lm(in_sq ~ x)
summary(fit2)

## Warning in summary.lm(fit2): essentially perfect fit: summary may be
## unreliable

## 
## Call:
## lm(formula = in_sq ~ x)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -1.763e-15 -1.572e-15 -1.400e-15  7.271e-16  7.246e-15 
## 
## Coefficients:
##              Estimate Std. Error   t value Pr(>|t|)    
## (Intercept) 0.000e+00  1.103e-15 0.000e+00        1    
## x           9.000e-03  3.353e-19 2.684e+16   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.077e-15 on 8 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 7.203e+32 on 1 and 8 DF,  p-value: < 2.2e-16

plot(x, in_sq, xlab = "r (m)", ylab = "F (N)")
abline(fit2, col='blue')

hist(resid(fit2))

qqnorm((resid(fit2)))
qqline(resid(fit2))

plot(fitted(fit2),resid(fit2))

Note that in this case, The linear regression fails because the fit is perfect. Had this been an experiment with error, your residuals would meet the criteria.

This is a common method for Physicists to take non-linear data, and make it look linear so that linear regression can be performed.

The log-log Transformation.

The log-log transformation is good at making power-law relationships (\(y = \alpha x^k +\epsilon\)) look linear. In this case the breaking distance of a car from last week’s homework.

\[ \Delta KE = KE_f - KE_i\\ \text{The car stops.}\\ KE_f = 0\space J \\ - KE_i = W \\ \space\text{Recall work is negative.}\\ KE_i = W \\ \frac{1}{2}mv_i^2 = F*d \\ d = \frac{1}{2}\frac{m}{F}v_i^2 \\ \text{F, m and 1/2 are all constant, we can absorb them into a constant C.} \\ d = Cv_i^2 \\ log(d) = log(Cv_i^2)\\ log(d) = log(C) + 2log(v_i) \\ \text{Let: log(d) = y, B_o = log(C), x = log(v_i)} \\ y = B_o + 2x \]

fit3 <- lm(log(cars$dist) ~ log(cars$speed))
summary(fit3)

## 
## Call:
## lm(formula = log(cars$dist) ~ log(cars$speed))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.00215 -0.24578 -0.02898  0.20717  0.88289 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -0.7297     0.3758  -1.941   0.0581 .  
## log(cars$speed)   1.6024     0.1395  11.484 2.26e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4053 on 48 degrees of freedom
## Multiple R-squared:  0.7331, Adjusted R-squared:  0.7276 
## F-statistic: 131.9 on 1 and 48 DF,  p-value: 2.259e-15

plot(log(cars$speed), log(cars$dist), xlab = 'log(Speed) in log(mph)', ylab = 'log(Breaking distance) in log(m)', col = 'steelblue')
abline(fit3, col = 'orangered')

hist(resid(fit3))

plot(fitted(fit3), resid(fit3))

qqnorm(resid(fit3))
qqline(resid(fit3))

Note that after the log-log transformation the breaking distance linear regression model meets residual requirements whereas before they did not.

Making Non-Linear Data Linear

Nathan Cooper

November 15, 2017

Substitution Method

The log-log Transformation.