Full Documentation For: An investigation of the relationship between long bone measurements and stature: Implications for estimating skeeltal stature in subadults.
Overview
This research compendium provides code and steps needed to replicate the results of the publication:
Chu, E.Y. & Stull, K.E. (In press). An investigation of the relationship between long bone measurements and stature: Implications for estimating skeletal stature in subadults. International Journal of Legal Medicine
To run the code for this paper locally, please ensure that R, and optionally RStudio, are installed on your local system.
Additionally, the following packages are used and should be installed
using install.packages("package_name"):
* tidyverse
* magrittr
* caret
* car
* gridExtra
How to cite
Please cite this compendium as:
Chu, E.Y. & Stull, K.E. (2024). Compendium of R code for “An investigation of the relationship between long bone measurements and stature: Implications for estimating skeletal stature in subadults”. Accessed [Current Date].
The Data
All data are from the Subadult Virtual Anthropology Database. Specifically, this projects uses data from the United States (U.S.) which can be downloaded HERE. For this specific analysis, the U.S. data were filtered for individuals who had at least one long bone measurement and had known stature. Additionally, the data were visually checked for outliers and removed. The final data are provided in a GitHub repository, which can be accessed at: https://github.com/ElaineYChu/chu-and-stull_nonlinear-stature. Instructions for “cloning the repository” can be found in the README.
Organization
This document is divided into three sections:
1. Data Manipulation
2. Data Analysis
3. Data Visualization
Setup
After cloning the repository to your own system, the following code is used to set up your local R/RStudio environment:
rm(list=ls()) # clears the global environment
## Check that the cloned repository is set was the working directory
if (!(grepl("chu-and-stull_nonlinear-stature", getwd()))) {
stop("Please check your working directory")
}
## Load required libraries
library(tidyverse)
library(magrittr)
library(caret)
library(car)
library(gridExtra)
library(ggrepel)
## Initialize some objects for visualizations
source("elaine_theme.R")
colors <- c("#073b4c","#0b6583","#43c4ef","#06efb1",
"#ffd166","#ef436b","#cccccc","#444444")
xlab <- "Age (years)"
## Initialize seed from random.org
seed <- 167416Data Manipulation
1. Import Data
- Located in data folder
- Stored as .rds file
2. Split into Training and Testing Sets
- 80% Training and 20% Testing
- Uses
createDataPartition()from package caret
base <- og %>% filter(!is.na(stature)) %>% droplevels() # must have stature
set.seed(seed) # for data reproducibility
idx <- createDataPartition(base$SEX, p=0.8, list=F)
train <- base[idx, ] # Pooled training set
test <- base[-idx, ] # Pooled testing set
train_F <- train %>% filter(SEX=="F") # Female training set
train_M <- train %>% filter(SEX=="M") # Male training set
test_F <- train %>% filter(SEX=="F") # Female training set
test_M <- train %>% filter(SEX=="M") # Male training set
## Store resulting sets in data folder
saveRDS(train, "data/train.rds")
saveRDS(test, "data/test.rds")
saveRDS(train_F, "data/train_F.rds")
saveRDS(train_M, "data/train_M.rds")
saveRDS(test_F, "data/test_F.rds")
saveRDS(test_M, "data/test_M.rds")3. Define Columns that Contain Long Bone Measurements
- Column numbers
- Give them better (more descriptive) labels
uni_vars <- 9:28
uni_labels <- c("Femur Length (mm)",
"Femur Midshaft Breadth (mm)",
"Femur Distal Breadth (mm)",
"Tibia Length (mm)",
"Tibia Proximal Breadth (mm)",
"Tibia Midshaft Breadth (mm)",
"Tibia Distal Breadth (mm)",
"Fibula Length (mm)",
"Humerus Length (mm)",
"Humerus Proximal Breadth (mm)",
"Humers Midshaft Breadth (mm)",
"Humerus Distal Breadth (mm)",
"Radius Length (mm)",
"Radius Proximal Breadth (mm)",
"Radius Midshaft Breadth (mm)",
"Radius Distal Breadth (mm)",
"Ulna Length (mm)",
"Ulna Midshaft Breadth (mm)",
"Upper Limb Length (mm)",
"Lower Limb Length (mm)")Assumptions Testing
A Shapiro-Wilks test was employed to test for normality for each long bone measurement due to the larger sample size:
sw_df <- as.data.frame(matrix(nrow=length(uni_vars), ncol=3))
names(sw_df) <- c("var", "W", "p")
for(i in 1:length(uni_vars)){
sw <- shapiro.test(data[[uni_vars[i]]])
sw_df[i, "var"] <- names(data)[uni_vars[i]]
sw_df[i, "W"] <- round(sw$statistic, 4)
sw_df[i, "p"] <- signif(sw$p.value, 4)
}
sw_df## var W p
## 1 FDL 0.8801 5.603e-25
## 2 FMSB 0.9396 5.024e-18
## 3 FDB 0.9358 1.198e-19
## 4 TDL 0.8815 5.154e-25
## 5 TPB 0.9236 1.596e-21
## 6 TMSB 0.9365 9.102e-19
## 7 TDB 0.9235 2.342e-21
## 8 FBDL 0.8818 4.949e-25
## 9 HDL 0.8675 3.428e-25
## 10 HPB 0.9486 6.048e-15
## 11 HMSB 0.9657 2.574e-13
## 12 HDB 0.9311 1.715e-19
## 13 RDL 0.8779 1.592e-24
## 14 RPB 0.9389 2.199e-18
## 15 RMSB 0.9625 1.057e-13
## 16 RDB 0.9373 4.985e-19
## 17 UDL 0.8850 6.195e-24
## 18 UMSB 0.9747 9.058e-11
## 19 upper 0.8617 2.894e-25
## 20 lower 0.8767 4.534e-25
Data are not normally distributed, which suggests the use of nonparametric tests.
Data Analysis
1. Calculate Correlation Coefficients (Kendall’s Tau)
Correlation coefficients (\(r\)) are calculated between stature and all diaphyseal dimensions. Kendall’s tau was used for a conservative estimate of \(r\), especially considering the genearlly non-parametric relationship between diaphyseal dimensions and stature during ontogeny.
cor_mat <- cor(data[uni_vars],data["stature"], method="kendall", use="complete.obs")
cor_mat <- as.data.frame(cor_mat)
cor_mat$var <- rownames(cor_mat)
colnames(cor_mat) <- c("r","var")
cor_mat <- cor_mat[order(cor_mat$r, decreasing=T),] # reorder by decreasing r
cor_mat$r <- signif(cor_mat$r, digits=3)
write.csv(cor_mat, "results/correlations.csv", row.names=F)
cor_mat## r var
## lower 0.907 lower
## FBDL 0.907 FBDL
## HDL 0.905 HDL
## FDL 0.905 FDL
## TDL 0.905 TDL
## upper 0.904 upper
## UDL 0.895 UDL
## RDL 0.895 RDL
## FDB 0.871 FDB
## TPB 0.869 TPB
## TDB 0.858 TDB
## HDB 0.854 HDB
## FMSB 0.839 FMSB
## HPB 0.830 HPB
## RPB 0.826 RPB
## HMSB 0.820 HMSB
## TMSB 0.819 TMSB
## RDB 0.810 RDB
## RMSB 0.808 RMSB
## UMSB 0.759 UMSB
All correlations yielded strong, positive relationships with stature.
From the full cor_mat, we can see that all of the
diaphyseal lengths have the strongest relationships, followed by lower
limb distal and proximal breadths, then upper breadths.
- The strongest relationship: 0.907, lower
- The weakest relationship: 0.759, UMSB
With these strong, positive relationships confirmed, I move onto generating both linear and non-linear models to estimate stature.
2. Stature Estimation Equations
First, let’s evaluate the type of bivariate relationships between femur length and distal breadth and stature:
plot1 <- ggplot(data, aes(x=FDL, y=stature)) + elaine_theme +
geom_point(aes(shape=meas_type), alpha=0.2, size=2, fill="grey85") +
geom_smooth(method="lm", lty="dashed", se=F, color="black") +
geom_smooth(method="loess", lty="solid", se=F, color="black") +
scale_shape_manual(values=c(24,21)) +
labs(x=uni_labels[1], y="Stature (cm)", title="a)") +
theme(legend.position="none", plot.title=element_text(hjust=0))
plot2 <- ggplot(data, aes(x=FDB, y=stature)) + elaine_theme +
geom_point(aes(shape=meas_type), alpha=0.2, size=2, fill="grey85") +
geom_smooth(method="lm", lty="dashed", se=F, color="black") +
geom_smooth(method="loess", lty="solid", se=F, color="black") +
scale_shape_manual(values=c(24,21)) +
labs(x=uni_labels[3], y="Stature (cm)", title="b)") +
theme(legend.position="none", plot.title=element_text(hjust=0))
grid.arrange(plot1, plot2, ncol=2)## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
Figure 1 description: squares represent diaphyseal length, whereas triangles represent maximum length which includes the epiphyses
From the plots, none of the relationships are quite linear. Traditionally, subadult (diaphyseal dimension) stature estimation equations have used linear regression (Ruff, 2007; Smith, 2007). The present data appears to follow a relatively linear pattern for the length measures, but there are individuals being “missed” in the linear regression at the lower end of lower limb length / stature, as well as the middle values of lower limb length, so both types of models will be explored. So let’s try the following:
- Linear: \(y=mx+b\)
- 3-parameter exponential (similar to power law): \(y=a−be^{−cx}\)
- 3-parameter logistic (sigmoidal): \(y=\frac{a}{(1+be^{−cx})}\)
Let’s go!
2a. Linear Models
In this section, two types of linear regression models are constructed:
- Univariate linear models
- Stepwise multiple linear models
2b. Nonlinear Models
Many many MANY nonlinear models were tested using nls()
on the training data and visually assessed for the most appropriate
models that fit the data. Equations and how to use nls()
pulled from: [https://data-flair.training/blogs/r-nonlinear-regression/]
3 parameter asymptotic exponential: \(y=a−be^{−cx}\) - closest for length (a=1500, b=1000, c=0.005) 3 parameter logistic: \(y=\frac{a}{(1+be^{−cx})}\) - great for breadth (a=1500, b=15, c=0.05)
Pooled
nls_uni <- list()
set.seed(seed)
for(i in uni_vars) {
var <- names(train)[i]
if (grepl("DL|upper|lower", var)) {
df <- data.frame(stature=train$stature, x=train[[var]])
nls_uni[[var]] <- tryCatch(
{
nls(stature~a-b*exp(-c*x), data=df,
start=list(a=1500, b=1000, c=0.005))
},
error=function(x) {
return(NA)
}
)
} else {
df <- data.frame(stature=train$stature, x=train[[var]])
nls_uni[[var]] <- tryCatch(
{
nls(stature~a/(1+b*exp(-c*x)), data=df,
start=list(a=1500, b=8, c=0.05))
},
error=function(x) {
return(NA)
}
)
}
}
saveRDS(nls_uni, "results/nonlinear_models.rds")Only two variables had unsuccessful fits: TDB and RDB
2c. Model Parameters
Extract the model coefficients, and residual standard deviation (“SEE”).
lm_params <- data.frame(matrix(nrow=length(lm_models), ncol=4))
names(lm_params) <- c("var","b","m","SEE")
for(i in 1:length(lm_models)) {
lm_params[i,"var"] <- names(lm_models)[i]
coefs <- lm_models[[i]]$coefficients
lm_params[i,2:(length(coefs)+1)] <- signif(coefs,3)
lm_params[i, "SEE"] <- signif(sigma(lm_models[[i]]), 3)
}
write.csv(lm_params, "results/linear_parameters.csv", row.names=F)nls_params <- data.frame(matrix(nrow=length(nls_models), ncol=5))
names(nls_params) <- c("var","a","b","c","SEE")
for(i in 1:length(nls_models)) {
if(names(nls_models)[i] %in% c("TDB","RDB")) {
next
}
nls_params[i,"var"] <- names(nls_models)[i]
coefs <- nls_models[[i]]$m$getAllPars()
nls_params[i,2:(length(coefs)+1)] <- signif(coefs, 3)
nls_params[i,"SEE"] <- signif(sigma(nls_models[[i]]), 3)
}
nls_params <- na.omit(nls_params)
write.csv(nls_params, "results/nonlinear_parameters.csv", row.names=F)3. Model Comparisons - Accuracy and Precision
Now that the models are generated, calculate performance metrics for each model:
- Testing accuracy
- Bias (residuals)
- RMSE
- SEE
- MAD
Linear Testing Performance
for (i in 1:length(lm_models)) {
var <- names(lm_models)[i]
df0 <- data.frame(matrix(nrow=nrow(test),ncol=5))
names(df0) <- c("medrec","stature","point","lo","hi")
df0$medrec <- test$SVAD_medrec
df0$stature <- test$stature
model <- lm_models[[i]]
pred <- predict(model, newdata=test[names(model$model)], interval="prediction")
df0$point <- pred[,1]
df0$lo <- pred[,2]
df0$hi <- pred[,3]
df0$resid <- df0$stature - df0$point
write.csv(df, paste0("results/",var,"_lm_predictions.csv"), row.names=F)
df <- na.omit(df0)
ks <- suppressWarnings(ks.test(df$point, df$stature))
perf_df$model[i] <- "linear"
perf_df$vars[i] <- ifelse(length(names(model$coefficients)[-1])>1,
paste(names(model$coefficients)[-1],collapse=", "),
names(model$coefficients)[-1])
correct <- which(df$stature <= df$hi & df$stature >= df$lo)
perf_df$test_acc[i] <- signif(length(correct) / nrow(df), 3)
perf_df$KS_stat[i] <- signif(ks$statistic, 3)
perf_df$KS_p[i] <- signif(ks$p.value, 3)
perf_df$MAD[i] <- signif(mad(df$resid, center=mean(df$resid)), 3)
}
write.csv(perf_df[1:length(lm_models),], "results/linear_performance.csv",
row.names=F)Nonlinear Testing Performance
set.seed(seed)
for(i in 1:length(nls_models)) {
var <- names(nls_models)[i]
test0 <- na.omit(test[c("SVAD_medrec","stature",var)])
df0 <- data.frame(matrix(nrow=nrow(test0),ncol=5))
names(df0) <- c("medrec","stature","point","lo","hi")
df0$medrec <- test0$SVAD_medrec
df0$stature <- test0$stature
model <- nls_models[[i]]
if(length(model)==1) {
next
}
gc() # clear memory
pred <- propagate::predictNLS(model=model, newdata=data.frame(x=test0[[var]]),
interval="prediction")
df0$point <- pred$summary[[1]]
df0$lo <- pred$summary[[5]]
df0$hi <- pred$summary[[6]]
df0$resid <- df0$stature - df0$point
write.csv(df0, paste0("results/",var,"_nls_predictions.csv"), row.names=F)
}
pred_files <- list.files("results","_nls_predictions.csv")
for(j in 1:length(pred_files)) {
df0 <- read.csv(paste0("results/", pred_files[j]))
var <- stringr::str_remove(pred_files[j], "_nls_predictions.csv")
df <- na.omit(df0)
ks <- suppressWarnings(ks.test(df$point, df$stature))
perf_df$model[j+20] <- "nonlinear"
perf_df$vars[j+20] <- var
correct <- which(df$stature <= df$hi & df$stature >= df$lo)
perf_df$test_acc[j+20] <- signif(length(correct) / nrow(df), 3)
perf_df$KS_stat[j+20] <- signif(ks$statistic, 3)
perf_df$KS_p[j+20] <- signif(ks$p.value, 3)
perf_df$MAD[j+20] <- signif(mad(df$resid, center=mean(df$resid)), 3)
}
write.csv(perf_df[perf_df$model=="nonlinear" &
!is.na(perf_df$vars),],
"results/nonlinear_performance.csv", row.names=F)
write.csv(perf_df[!is.na(perf_df$vars),],
"results/model_performance.csv", row.names=F)4. Model Evaluations
Now that all models have been constructed and performance metrics have been calculated, we can begin to provide overall recommendations for subadult stature estimation.
Full output of model performance:
## model vars KS_stat KS_p test_acc MAD
## 1 linear FDL 0.0778 0.648 0.961 5.10
## 2 linear FMSB 0.0889 0.476 0.956 7.31
## 3 linear FDB 0.0942 0.364 0.948 7.61
## 4 linear TDL 0.0829 0.563 0.978 5.65
## 5 linear TPB 0.0733 0.684 0.927 7.51
## 6 linear TMSB 0.0659 0.824 0.956 7.92
## 7 linear TDB 0.0785 0.598 0.958 8.34
## 8 linear FBDL 0.0820 0.570 0.973 4.85
## 9 linear HDL 0.0833 0.604 0.958 5.54
## 10 linear HPB 0.0544 0.981 0.966 8.31
## 11 linear HMSB 0.1140 0.206 0.972 10.20
## 12 linear HDB 0.0838 0.556 0.939 8.22
## 13 linear RDL 0.0888 0.519 0.988 6.14
## 14 linear RPB 0.0686 0.805 0.949 7.57
## 15 linear RMSB 0.0994 0.367 0.942 12.30
## 16 linear RDB 0.1000 0.329 0.933 9.04
## 17 linear UDL 0.0702 0.794 0.977 5.71
## 18 linear UMSB 0.1240 0.147 0.947 15.40
## 19 linear upper 0.0864 0.581 0.994 5.52
## 20 linear lower 0.0791 0.637 0.966 5.22
## 21 nonlinear FBDL 0.0601 0.896 0.967 3.65
## 22 nonlinear FDB 0.0733 0.684 0.958 6.96
## 23 nonlinear FDL 0.0500 0.978 0.956 4.06
## 24 nonlinear FMSB 0.0611 0.890 0.967 8.10
## 25 nonlinear HDB 0.0503 0.977 0.950 7.03
## 26 nonlinear HDL 0.0476 0.991 0.958 3.99
## 27 nonlinear HMSB 0.0909 0.461 0.943 9.98
## 28 nonlinear HPB 0.0544 0.981 0.959 7.75
## 29 nonlinear lower 0.0508 0.976 0.949 3.76
## 30 nonlinear RDL 0.0533 0.970 0.970 4.39
## 31 nonlinear RMSB 0.0819 0.615 0.930 11.90
## 32 nonlinear RPB 0.0629 0.880 0.914 7.84
## 33 nonlinear TDL 0.0552 0.945 0.961 3.82
## 34 nonlinear TMSB 0.0549 0.946 0.934 8.09
## 35 nonlinear TPB 0.0733 0.684 0.937 7.75
## 36 nonlinear UDL 0.0585 0.932 0.965 3.88
## 37 nonlinear UMSB 0.1010 0.360 0.953 15.90
## 38 nonlinear upper 0.0494 0.989 0.969 3.93
4a. Bland-Altman Plots
The goodness-of-fit test, via the KS test, demonstrate overall better “fit” of the nonlinear models compared to the linear. We may use the Bland-Altman plots to demonstrate these ideas further:
# Import Linear Predictions
hdl_lm <- read.csv('results/HDL_lm_predictions.csv') %>% mutate(var="HDL")
hpb_lm <- read.csv('results/HPB_lm_predictions.csv') %>% mutate(var="HPB")
hmsb_lm <- read.csv('results/HMSB_lm_predictions.csv') %>% mutate(var="HMSB")
hdb_lm <- read.csv('results/HDB_lm_predictions.csv') %>% mutate(var="HDB")
rdl_lm <- read.csv('results/RDL_lm_predictions.csv') %>% mutate(var="RDL")
rpb_lm <- read.csv('results/RPB_lm_predictions.csv') %>% mutate(var="RPB")
rmsb_lm <- read.csv('results/RMSB_lm_predictions.csv') %>% mutate(var="RMSB")
rdb_lm <- read.csv('results/RDB_lm_predictions.csv') %>% mutate(var="RDB")
udl_lm <- read.csv('results/UDL_lm_predictions.csv') %>% mutate(var="UDL")
umsb_lm <- read.csv('results/UMSB_lm_predictions.csv') %>% mutate(var="UMSB")
upper_lm <- read.csv('results/upper_lm_predictions.csv') %>% mutate(var="upper")
fdl_lm <- read.csv('results/FDL_lm_predictions.csv') %>% mutate(var="FDL")
fmsb_lm <- read.csv('results/FMSB_lm_predictions.csv') %>% mutate(var="FMSB")
fdb_lm <- read.csv('results/FDB_lm_predictions.csv') %>% mutate(var="FDB")
tdl_lm <- read.csv('results/TDL_lm_predictions.csv') %>% mutate(var=("TDL"))
tpb_lm <- read.csv('results/TPB_lm_predictions.csv') %>% mutate(var="TPB")
tmsb_lm <- read.csv('results/TMSB_lm_predictions.csv') %>% mutate(var="TMSB")
tdb_lm <- read.csv('results/TDB_lm_predictions.csv') %>% mutate(var="TDB")
fbdl_lm <- read.csv('results/FBDL_lm_predictions.csv') %>% mutate(var="FBDL")
lower_lm <- read.csv('results/lower_lm_predictions.csv') %>% mutate(var="lower")
lm_pred <- rbind(hdl_lm, hpb_lm, hmsb_lm, hdb_lm, rdl_lm, rpb_lm, rmsb_lm, rdb_lm, udl_lm, umsb_lm, upper_lm, fdl_lm, fmsb_lm, fdb_lm, tdl_lm, tpb_lm, tmsb_lm, tdb_lm, fbdl_lm, lower_lm)
lm_pred %<>% mutate(type="Linear")
# Import Nonlinear predictions
hdl_nls <- read.csv('results/HDL_nls_predictions.csv') %>% mutate(var="HDL")
hpb_nls <- read.csv('results/HPB_nls_predictions.csv') %>% mutate(var="HPB")
hmsb_nls <- read.csv('results/HMSB_nls_predictions.csv') %>% mutate(var="HMSB")
hdb_nls <- read.csv('results/HDB_nls_predictions.csv') %>% mutate(var="HDB")
rdl_nls <- read.csv('results/RDL_nls_predictions.csv') %>% mutate(var="RDL")
rpb_nls <- read.csv('results/RPB_nls_predictions.csv') %>% mutate(var="RPB")
rmsb_nls <- read.csv('results/RMSB_nls_predictions.csv') %>% mutate(var="RMSB")
udl_nls <- read.csv('results/UDL_nls_predictions.csv') %>% mutate(var="UDL")
umsb_nls <- read.csv('results/UMSB_nls_predictions.csv') %>% mutate(var="UMSB")
upper_nls <- read.csv('results/upper_nls_predictions.csv') %>% mutate(var="upper")
fdl_nls <- read.csv('results/FDL_nls_predictions.csv') %>% mutate(var="FDL")
fmsb_nls <- read.csv('results/FMSB_nls_predictions.csv') %>% mutate(var="FMSB")
fdb_nls <- read.csv('results/FDB_nls_predictions.csv') %>% mutate(var="FDB")
tdl_nls <- read.csv('results/TDL_nls_predictions.csv') %>% mutate(var=("TDL"))
tpb_nls <- read.csv('results/TPB_nls_predictions.csv') %>% mutate(var="TPB")
tmsb_nls <- read.csv('results/TMSB_nls_predictions.csv') %>% mutate(var="TMSB")
fbdl_nls <- read.csv('results/FBDL_nls_predictions.csv') %>% mutate(var="FBDL")
lower_nls <- read.csv('results/lower_nls_predictions.csv') %>% mutate(var="lower")
nls_pred <- rbind(hdl_nls, hpb_nls, hmsb_nls, hdb_nls, rdl_nls, rpb_nls, rmsb_nls, udl_nls, umsb_nls, upper_nls, fdl_nls, fmsb_nls, fdb_nls, tdl_nls, tpb_nls, tmsb_nls, fbdl_nls, lower_nls)
nls_pred %<>% mutate(type="Nonlinear")
# Combine datasets, save as .csv and .rds files
all_pred <- rbind(lm_pred, nls_pred)
all_pred %<>% mutate(mt = ifelse(grepl('DL$', var), "Length",
ifelse(grepl('PB$', var), "Proximal",
ifelse(grepl('MSB$', var), "Midshaft", "Distal"))))
all_pred$mt <- factor(all_pred$mt, levels=c("Length","Proximal","Midshaft","Distal"))
write.csv(all_pred, "results/all_predictions.csv", row.names=F)
saveRDS(all_pred, "results/all_predictions.rds")all_pred$avg <- rowMeans(all_pred[c("stature","point")])
mean_diff <- mean(all_pred$resid)
lwr <- mean_diff - 1.96*sd(all_pred$resid)
upr <- mean_diff + 1.96*sd(all_pred$resid)
names(all_pred)[1] <- "SVAD_medrec"
all_pred2 <- left_join(all_pred, test[c("SVAD_medrec","meas_type","SEX","agey")], by="SVAD_medrec", relationship='many-to-many')
ggplot(all_pred2) + elaine_theme +
geom_hline(yintercept=mean_diff, color="black") +
geom_hline(yintercept=lwr, lty="dotted") +
geom_hline(yintercept=upr, lty="dotted") +
geom_point(aes(x=avg, y=resid), size=2, pch=21, fill=colors[4]) +
facet_grid(mt ~ type) +
labs(x="Mean of Known and Predicted Stature (cm)",
y="Residuals (cm)")