Summary
Piecewise regression

You can use the segmented package to conduct piecewise regression after fitting a linear model to the data. The linear model can have multiple explanatory variables. However, it is unclear from the documentation how the different combinations of variables may affect the value of the breakpoint, slopes, or the piecewise regression (PR) fit.

I looked at a tip skewness plot that had an obvious breakpoint (extinction = 0, beta = 0.1), and one that could potentially be linear (extinction = 0, beta = 1)

To each, I then fitted linear models with the following combination of variables:

I then conducted PR on each.

The breakpoint was not dramatically affected by the variables used.

rbind(my_seg1_01$psi, my_seg2_01$psi, my_seg3_01$psi)
##              Initial      Est.     St.Err
## psi1.shift01     0.5 0.3570321 0.03005497
## psi1.shift01     0.5 0.3599519 0.02065618
## psi1.shift01     0.5 0.3605539 0.02083951
rbind(my_seg1_1$psi, my_seg2_1$psi, my_seg3_1$psi)
##             Initial      Est.     St.Err
## psi1.shift1     0.5 0.7224358 0.17497295
## psi1.shift1     0.5 0.7197169 0.11090791
## psi1.shift1     0.5 0.7152755 0.08143257

However, the slope values and y-axis range varied hugely depending on the linear models, particularly for the model without a clear breakpoint (beta = 1).

The graph for beta = 1 that looks incorrect is the one with the best AIC. This holds whether you look at the AIC of the linear regression or the PR.

rbind(AIC(mod1_1, mod2_1, mod3_1))
##        df       AIC
## mod1_1  3 -162.7520
## mod2_1  4 -313.4574
## mod3_1  5 -416.9325
rbind(AIC(my_seg1_1, my_seg2_1,my_seg3_1))
##           df       AIC
## my_seg1_1  5 -161.8425
## my_seg2_1  6 -317.3701
## my_seg3_1  7 -428.2621

Let’s look at the model summaries for the beta = 1 PR to see what’s going on.

Apparently you interpret the output like an ordinary regression model. Except that the U1 estimate is the difference between the two slopes, and you don’t get a p value for that because of complicated stats reasons.

I was hoping that maybe some of the PR would not be significant, which might have explained the strange plots. However, everything seems significant.

summary(my_seg1_1)
## 
##  ***Regression Model with Segmented Relationship(s)***
## 
## Call: 
## segmented.lm(obj = mod1_1, seg.Z = ~shift1)
## 
## Estimated Break-Point(s):
##    Est. St.Err 
##  0.722  0.175 
## 
## Meaningful coefficients of the linear terms:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.00983    0.02425  -0.405    0.686    
## shift1      -0.31619    0.05798  -5.453 1.83e-07 ***
## U1.shift1    0.33025    0.27194   1.214       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1455 on 161 degrees of freedom
## Multiple R-Squared: 0.238,  Adjusted R-squared: 0.2238 
## 
## Convergence attained in 3 iterations with relative change -1.302083e-16
summary(my_seg2_1)
## 
##  ***Regression Model with Segmented Relationship(s)***
## 
## Call: 
## segmented.lm(obj = mod2_1, seg.Z = ~shift1)
## 
## Estimated Break-Point(s):
##    Est. St.Err 
##  0.720  0.111 
## 
## Meaningful coefficients of the linear terms:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       0.19899    0.01996   9.971  < 2e-16 ***
## shift1           -0.31652    0.03608  -8.772 2.49e-15 ***
## diversification1 -0.26094    0.01632 -15.990  < 2e-16 ***
## U1.shift1         0.32725    0.16925   1.934       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09057 on 160 degrees of freedom
## Multiple R-Squared: 0.7067,  Adjusted R-squared: 0.6993 
## 
## Convergence attained in 2 iterations with relative change -1.690561e-16
summary(my_seg3_1)
## 
##  ***Regression Model with Segmented Relationship(s)***
## 
## Call: 
## segmented.lm(obj = mod3_1, seg.Z = ~shift1)
## 
## Estimated Break-Point(s):
##    Est. St.Err 
##  0.715  0.081 
## 
## Meaningful coefficients of the linear terms:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              0.01530    0.02046   0.748    0.456    
## shift1                   0.05057    0.03907   1.294    0.197    
## diversification1        -0.03119    0.02175  -1.434    0.154    
## U1.shift1                0.32236    0.12059   2.673       NA    
## shift1:diversification1 -0.45950    0.03677 -12.497   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06453 on 159 degrees of freedom
## Multiple R-Squared: 0.852,  Adjusted R-squared: 0.8474 
## 
## Convergence attained in 2 iterations with relative change 0

Maybe the beta = 1 model doesn’t have a significant difference between slopes. This can be calculated with the Davies test, which is part of the segmented package. It tests the null hypothesis that both slopes are equal.

Here, the first model is not significant, while the other two are. This confirms the importance of model selection when using this package.

davies.test(mod1_1, seg.Z = ~ shift1)
## 
##  Davies' test for a change in the slope
## 
## data:  formula = skewness1 ~ shift1 ,   method = lm 
## model = gaussian , link = identity  
## segmented variable = shift1
## 'best' at = 0.66667, n.points = 8, p-value = 0.1956
## alternative hypothesis: two.sided
davies.test(mod2_1, seg.Z = ~ shift1)
## 
##  Davies' test for a change in the slope
## 
## data:  formula = skewness1 ~ shift1 + diversification1 ,   method = lm 
## model = gaussian , link = identity  
## segmented variable = shift1
## 'best' at = 0.66667, n.points = 8, p-value = 0.02385
## alternative hypothesis: two.sided
davies.test(mod3_1, seg.Z = ~ shift1)
## 
##  Davies' test for a change in the slope
## 
## data:  formula = skewness1 ~ shift1 * diversification1 ,   method = lm 
## model = gaussian , link = identity  
## segmented variable = shift1
## 'best' at = 0.66667, n.points = 8, p-value = 0.0008914
## alternative hypothesis: two.sided