The SMATR package (Standardised Major Axis estimation and Testing Routines) is designed for when you are fitting lines and:
This situation commonly arises in allometry (the study of how one size variable scales against another), this is the main place these methods are useful in ecology.
These methods were NOT designed to deal with situations where you have measurement error in X (a very widely held misconception). That is a completely unrelated issue and measurement error should not be used to inform whether you use SMATR or not (nor does it inform which SMATR method you use).
WHY NOT USE REGRESSION (LM FUNCTION)?
Example: does brain size scale as the 2/3 power of body size? (Thus brain surface area would scale against mass)
library(MASS)
plot(Animals, log = "xy")
ftBrainBody = lm(log(Animals$brain) ~ log(Animals$body))
confint(ftBrainBody)
## 2.5 % 97.5 %
## (Intercept) 1.7057 3.4041
## log(Animals$body) 0.3353 0.6567
From a linear regression of brain size against body size, the estimated slope is about 0.5, 95% CI does not cover 2/3, so we would conclude that there is some evidence that the relationshpi is flatter than 2/3 scaling.
ftBodyBrain = lm(log(Animals$body) ~ log(Animals$brain))
confint(ftBodyBrain)
## 2.5 % 97.5 %
## (Intercept) -3.6397 0.3396
## log(Animals$brain) 0.8282 1.6219
Reversing axes now you might expect the slope to be 1/0.5=2, and the CI to not cover 1.5, the inverse of 2/3. Nope - the estimated slope is about 1.2 (much flatter than 2), and the 95% CI covers 1.5, so we would reach the opposite conclusion - there is no evidence that the relationship differs from 2/3 scaling.
The problem with using regression in allometry is regression to the mean, slopes are flatter than you might expect. Regression lines are designed for predicting Y from X and are great for that situation. But that is not what we are trying to do here - instead of minimising error predicting Y, we want to minimise the straight-line distance of points from the line, or something similiar).
WHAT DOES SMATR DO? The Major Axis (MA) minimises straight-line distance of points from the line, the Standardised Major Axis (SMA) first standardises data before doing so. SMA is scale invariant (doubling Y doubles the slope), MA is rotation invariant (who cares). Some have argued SMA is better if your variables are measured on different scales (e.g. weight in grams vs energy in Joules) hence their scaling is arbitrary. These methods are closely related to principal components analysis, and SMATR is designed for estimation and inference about (S)MA.
library(smatr)
## Loading required package: plyr
ft = sma(brain ~ body, data = Animals, log = "xy")
summary(ft)
## Call: sma(formula = brain ~ body, data = Animals, log = "xy")
##
## Fit using Standardized Major Axis
##
## These variables were log-transformed before fitting: xy
##
## Confidence intervals (CI) are at 95%
##
## ------------------------------------------------------------
## Coefficients:
## elevation slope
## estimate 0.8798 0.6363
## lower limit 0.4999 0.4956
## upper limit 1.2596 0.8170
##
## H0 : variables uncorrelated
## R-squared : 0.6076
## P-value : 1e-06
what happens if you reverse the axes? Do you get the same answer?
ft = sma(body ~ brain, data = Animals, log = "xy")
summary(ft)
## Call: sma(formula = body ~ brain, data = Animals, log = "xy")
##
## Fit using Standardized Major Axis
##
## These variables were log-transformed before fitting: xy
##
## Confidence intervals (CI) are at 95%
##
## ------------------------------------------------------------
## Coefficients:
## elevation slope
## estimate -1.3826 1.572
## lower limit -2.2585 1.224
## upper limit -0.5068 2.018
##
## H0 : variables uncorrelated
## R-squared : 0.6076
## P-value : 1e-06
to fit an MA instead:
ma(brain ~ body, data = Animals, log = "xy")
## Call: sma(formula = ..1, data = ..2, log = "xy", method = "MA")
##
## Fit using Major Axis
##
## These variables were log-transformed before fitting: xy
##
## Confidence intervals (CI) are at 95%
##
## ------------------------------------------------------------
## Coefficients:
## elevation slope
## estimate 0.9945 0.5662
## lower limit 0.5957 0.3930
## upper limit 1.3934 0.7697
##
## H0 : variables uncorrelated
## R-squared : 0.6076
## P-value : 1e-06
# TESTING FOR EVIDENCE AGAINST A SLOPE (of 2/3)
ft = sma(brain ~ body, data = Animals, log = "xy", slope.test = 2/3)
# USEFUL PLOTS
ft = ma(brain ~ body, data = Animals, log = "xy")
plot(ft) #plots the data with the line
plot(ft, which = "residual") #plots residuals
abline(a = 0, b = 0)
qqnorm(residuals(ft))
I've got to say I'm not happy about those outliers… do they have undue influence on the fit?
ROBUST SMATR
You can fit SMATR lines that are less sensitive to outliers. These methods were developed by Sara Taskinen (U Jyvaskyla, Finland) using Huber's M estimation (fancy method which downweights outliers).
ftRobust = sma(brain ~ body, data = Animals, log = "xy", robust = T)
summary(ftRobust)
## Call: sma(formula = brain ~ body, data = Animals, log = "xy", robust = T)
##
## Fit using Standardized Major Axis
##
## These variables were log-transformed before fitting: xy
##
## Confidence intervals (CI) are at 95%
##
## ------------------------------------------------------------
## Coefficients:
## elevation slope
## estimate 0.8368 0.7542
## lower limit 0.6057 0.6452
## upper limit 1.0679 0.8815
##
## H0 : variables uncorrelated
## R-squared : 0.6076
## P-value : 1e-06
note that while the interpretation hasn't changed here (2/3 still in interval) the fitted line is now much steeper and the CI is much narrower - it is getting thrown less by the outliers so gives a better estimate of the line
plot(ftRobust)
note that the line fits the bulk of the data better, it is not dragged down as much by the outliers
SMATR TO COMPARE LINES Sometimes you have multiple lines that you want to compare. It might be of interest to compare slopes, or to assume common slope and to compare elevations, or compare locations along a common axis. A cool graph of this, courtesy of Dan Falster (Macquarie):
Consider leaf longevity vs Leaf Mass per Area of a bunch of species at four sites with varying soil nutrients/rainfall:
data(leaflife)
ftComSlope = sma(longev ~ lma * site, log = "xy", data = leaflife)
summary(ftComSlope)
## Call: sma(formula = longev ~ lma * site, data = leaflife, log = "xy")
##
## Fit using Standardized Major Axis
##
## These variables were log-transformed before fitting: xy
##
## Confidence intervals (CI) are at 95%
##
## ------------------------------------------------------------
## Results of comparing lines among groups.
##
## H0 : slopes are equal.
## Likelihood ratio statistic : 9.382 with 3 degrees of freedom
## P-value : 0.025
## ------------------------------------------------------------
##
## Coefficients by group in variable "site"
##
## Group: 1
## elevation slope
## estimate -4.218 2.120
## lower limit -5.904 1.452
## upper limit -2.533 3.095
##
## H0 : variables uncorrelated.
## R-squared : 0.5039
## P-value : 0.0014
##
## Group: 2
## elevation slope
## estimate -2.322 1.1769
## lower limit -3.476 0.7632
## upper limit -1.168 1.8149
##
## H0 : variables uncorrelated.
## R-squared : 0.3407
## P-value : 0.014
##
## Group: 3
## elevation slope
## estimate -2.524 1.1825
## lower limit -3.135 0.9398
## upper limit -1.912 1.4879
##
## H0 : variables uncorrelated.
## R-squared : 0.7393
## P-value : 1.5e-07
##
## Group: 4
## elevation slope
## estimate -3.838 1.787
## lower limit -5.292 1.257
## upper limit -2.383 2.539
##
## H0 : variables uncorrelated.
## R-squared : 0.8065
## P-value : 0.00042
plot(ftComSlope)
Is there evidence that the slope of the lma-longevity SMA varies across sites?
Multiple comparisons - which sites differ from which in SMA slope? (using Sidak adjustment)
sma(longev ~ lma * site, log = "xy", data = leaflife, multcomp = T, multcompmethod = "adjust")
## Call: sma(formula = longev ~ lma * site, data = leaflife, log = "xy",
## multcomp = T, multcompmethod = "adjust")
##
## Fit using Standardized Major Axis
##
## These variables were log-transformed before fitting: xy
##
## Confidence intervals (CI) are at 95%
##
## ------------------------------------------------------------
## Results of comparing lines among groups.
##
## H0 : slopes are equal.
## Likelihood ratio statistic : 9.382 with 3 degrees of freedom
## P-value : 0.025
## ------------------------------------------------------------
##
## Results of multiple comparisons among groups.
##
## Test for pair-wise difference in slope :
## site_1 site_2 Pval TestStat
## 1 1 2 0.224 4.1606284
## 2 1 3 0.059 6.6183503
## 3 1 4 0.982 0.4852096
## 4 2 3 1.000 0.0003944
## 5 2 4 0.548 2.3667109
## 6 3 4 0.251 3.9426641
##
## ------------------------------------------------------------
## Use the summary() function to print coefficients by group.
Subset to sites 2 and 4 to compare elevation at these low soil P sites:
leaf.low.soilp <- subset(leaflife, soilp == "low")
ftElev <- sma(longev ~ lma + rain, log = "xy", data = leaf.low.soilp)
summary(ftElev)
## Call: sma(formula = longev ~ lma + rain, data = leaf.low.soilp, log = "xy")
##
## Fit using Standardized Major Axis
##
## These variables were log-transformed before fitting: xy
##
## Confidence intervals (CI) are at 95%
##
## ------------------------------------------------------------
## Results of comparing lines among groups.
##
## H0 : slopes are equal.
## Likelihood ratio statistic : 2.367 with 1 degrees of freedom
## P-value : 0.12
## ------------------------------------------------------------
##
## H0 : no difference in elevation.
## Wald statistic: 6.566 with 1 degrees of freedom
## P-value : 0.01
## ------------------------------------------------------------
##
## Coefficients by group in variable "rain"
##
## Group: high
## elevation slope
## estimate -3.141 1.551
## lower limit -4.080 1.109
## upper limit -2.202 2.012
##
## H0 : variables uncorrelated.
## R-squared : 0.3407
## P-value : 0.014
##
## Group: low
## elevation slope
## estimate -3.305 1.551
## lower limit -4.353 1.109
## upper limit -2.256 2.012
##
## H0 : variables uncorrelated.
## R-squared : 0.8065
## P-value : 0.00042
To test for evidence of shift along the SMA between sites:
ftShift <- sma(longev ~ lma + rain, log = "xy", type = "shift", data = leaf.low.soilp)
summary(ftShift)
## Call: sma(formula = longev ~ lma + rain, data = leaf.low.soilp, log = "xy",
## type = "shift")
##
## Fit using Standardized Major Axis
##
## These variables were log-transformed before fitting: xy
##
## Confidence intervals (CI) are at 95%
##
## ------------------------------------------------------------
## Results of comparing lines among groups.
##
## H0 : slopes are equal.
## Likelihood ratio statistic : 2.367 with 1 degrees of freedom
## P-value : 0.12
## ------------------------------------------------------------
##
## H0 : no shift along common axis.
## Wald statistic: 0.2091 with 1 degrees of freedom
## P-value : 0.65
## ------------------------------------------------------------
##
## Coefficients by group in variable "rain"
##
## Group: high
## elevation slope
## estimate -3.141 1.551
## lower limit -3.476 1.109
## upper limit -1.168 2.012
##
## H0 : variables uncorrelated.
## R-squared : 0.3407
## P-value : 0.014
##
## Group: low
## elevation slope
## estimate -3.305 1.551
## lower limit -5.292 1.109
## upper limit -2.383 2.012
##
## H0 : variables uncorrelated.
## R-squared : 0.8065
## P-value : 0.00042
There are robust versions of all of these tests too.
REFERENCES Warton et al (2006) Bivariate line-fitting methods for allometry, Biological Reviews http://onlinelibrary.wiley.com/doi/10.1017/S1464793106007007/abstract
Smith (2009) Use and misuse of the reduced major axis for line-fitting, American Journal of Physical Anthropology http://onlinelibrary.wiley.com/doi/10.1002/ajpa.21090/abstract
Warton et al (2012) smatr 3 - an R package for estimation and testing of allometric lines, Methods in Ecology and Evolution http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00153.x/full
Taskinen & Warton (2013) Robust tests for one or more allometric lines, Journal of Theoretical Biology http://www.sciencedirect.com/science/article/pii/S0022519313002257