Missing data are everywhere in real-world datasets and can easily mislead analyses if we ignore them. In this project I use R and several packages to explore how different missing-data strategies affect results.
First, I use the built-in airquality data and the
naniar package to visualize where values
are missing and to show that missingness can depend on time. Second, I
use the nhanes data from the
mice package to compare a complete-case
regression with a regression that uses multiple
imputation. Finally, I simulate an income–education dataset and
compare three simple methods (complete cases, overall mean imputation,
and regression imputation). I show how each method changes means and
distribution shapes.
The results highlight both how easy it is to mishandle missing data and how R tools can make better approaches (especially multiple imputation) much more accessible.
In almost every real dataset, some values are missing. Survey respondents skip questions, labs lose samples, and sensors malfunction. A common first reaction is to drop any row with a missing value, but this “complete-case” strategy often wastes information and can introduce serious bias.
This paper focuses on how to handle missing data in R, rather than on one particular scientific dataset. My main goals are:
naniar;mice;Conceptually, you can think of the main approaches in this paper as:
mice): create
several different completed versions of the data, analyze each one, and
then combine the results to reflect extra uncertainty about the missing
values.Throughout the paper I keep the code relatively simple and use small, well-known datasets when possible, so that someone with limited R experience can reproduce the analyses.
The airquality dataset is built into base R. It contains
daily air quality measurements in New York from May to September
1973.
head(airquality)
## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6
summary(airquality)
## Ozone Solar.R Wind Temp
## Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
## 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
## Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
## Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
## 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
## Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
## NA's :37 NA's :7
## Month Day
## Min. :5.000 Min. : 1.0
## 1st Qu.:6.000 1st Qu.: 8.0
## Median :7.000 Median :16.0
## Mean :6.993 Mean :15.8
## 3rd Qu.:8.000 3rd Qu.:23.0
## Max. :9.000 Max. :31.0
##
There are several NA values, especially in the
Ozone and Solar.R columns. To see which
variables are most affected, I use gg_miss_var() from the
naniar package.
Conceptually, naniar does not “fix” missing data; it
helps you see the holes. You can think of it as an
X-ray for a data frame: instead of plotting values, it plots where the
NAs are and how they are distributed across variables and groups.
p_air_var <- gg_miss_var(airquality) +
ggtitle("Number of missing values by variable in airquality") +
xlab("Variables") +
ylab("Number of missing values")
p_air_var
Figure 1. Number of missing values by variable in the
airquality dataset, created with
naniar::gg_miss_var().
Figure 1 shows that missingness is highly uneven across variables:
Ozone and Solar.R have many more missing
values than the others. In a real analysis this would be a warning sign
that these variables need special attention.
Next, I want to check whether missingness depends on month. If entire months have more missing values, complete-case analysis could distort trends over time.
airquality_month <- airquality %>%
mutate(Month = factor(Month))
p_air_month <- gg_miss_var(airquality_month, facet = Month) +
ggtitle("Number of missing values by variable, faceted by month") +
xlab("Variables") +
ylab("Number of missing values")
p_air_month
Figure 2. Number of missing values by variable, faceted by month. Some months have many more missing Ozone and Solar.R values than others.
Figure 2 adds another layer: some months have far more missing
Ozone and Solar.R values than others. This
suggests that a simple complete-case analysis could accidentally remove
certain months and distort any time trends.
miceThe mice package provides the
nhanes dataset, which contains simulated health data with
missing values.
nhanes
## age bmi hyp chl
## 1 1 NA NA NA
## 2 2 22.7 1 187
## 3 1 NA 1 187
## 4 3 NA NA NA
## 5 1 20.4 1 113
## 6 3 NA NA 184
## 7 1 22.5 1 118
## 8 1 30.1 1 187
## 9 2 22.0 1 238
## 10 2 NA NA NA
## 11 1 NA NA NA
## 12 2 NA NA NA
## 13 3 21.7 1 206
## 14 2 28.7 2 204
## 15 1 29.6 1 NA
## 16 1 NA NA NA
## 17 3 27.2 2 284
## 18 2 26.3 2 199
## 19 1 35.3 1 218
## 20 3 25.5 2 NA
## 21 1 NA NA NA
## 22 1 33.2 1 229
## 23 1 27.5 1 131
## 24 3 24.9 1 NA
## 25 2 27.4 1 186
summary(nhanes)
## age bmi hyp chl
## Min. :1.00 Min. :20.40 Min. :1.000 Min. :113.0
## 1st Qu.:1.00 1st Qu.:22.65 1st Qu.:1.000 1st Qu.:185.0
## Median :2.00 Median :26.75 Median :1.000 Median :187.0
## Mean :1.76 Mean :26.56 Mean :1.235 Mean :191.4
## 3rd Qu.:2.00 3rd Qu.:28.93 3rd Qu.:1.000 3rd Qu.:212.0
## Max. :3.00 Max. :35.30 Max. :2.000 Max. :284.0
## NA's :9 NA's :8 NA's :10
For illustration, I fit a linear regression of cholesterol
(chl) on age and BMI. I compare two approaches:
mice()
to create several imputed datasets, fit the model in each, and combine
the results with pool().Conceptually, multiple imputation in mice works like
this: it fills in the missing values several different
ways to create \(m\) complete
datasets (for example, \(m = 5\)); each
completed dataset has slightly different imputed values because the
algorithm adds randomness to reflect uncertainty; then you fit the same
model in each completed dataset and pool() combines the
estimates using both the variation within and between datasets.
# 1. Complete-case regression
fit_cc <- lm(chl ~ age + bmi, data = nhanes)
# 2. Multiple imputation with mice
imp <- mice(nhanes, m = 5, method = "pmm", seed = 123)
##
## iter imp variable
## 1 1 bmi hyp chl
## 1 2 bmi hyp chl
## 1 3 bmi hyp chl
## 1 4 bmi hyp chl
## 1 5 bmi hyp chl
## 2 1 bmi hyp chl
## 2 2 bmi hyp chl
## 2 3 bmi hyp chl
## 2 4 bmi hyp chl
## 2 5 bmi hyp chl
## 3 1 bmi hyp chl
## 3 2 bmi hyp chl
## 3 3 bmi hyp chl
## 3 4 bmi hyp chl
## 3 5 bmi hyp chl
## 4 1 bmi hyp chl
## 4 2 bmi hyp chl
## 4 3 bmi hyp chl
## 4 4 bmi hyp chl
## 4 5 bmi hyp chl
## 5 1 bmi hyp chl
## 5 2 bmi hyp chl
## 5 3 bmi hyp chl
## 5 4 bmi hyp chl
## 5 5 bmi hyp chl
fit_mi <- with(imp, lm(chl ~ age + bmi))
pool_fit <- pool(fit_mi)
sum_mi <- summary(pool_fit)
fit_cc
##
## Call:
## lm(formula = chl ~ age + bmi, data = nhanes)
##
## Coefficients:
## (Intercept) age bmi
## -80.194 53.069 6.884
sum_mi
## term estimate std.error statistic df p.value
## 1 (Intercept) 11.430196 72.560560 0.1575263 8.020422 0.87872369
## 2 age 27.505289 10.784999 2.5503285 9.748109 0.02938517
## 3 bmi 4.979979 2.322195 2.1445134 7.835031 0.06503537
To compare them visually, I tidy up the coefficient estimates and confidence intervals and plot them.
# Tidy complete-case coefficients
coef_cc <- broom::tidy(fit_cc) %>%
filter(term != "(Intercept)") %>%
mutate(method = "Complete cases")
# Tidy MI coefficients
coef_mi <- sum_mi %>%
filter(term != "(Intercept)") %>%
transmute(
term,
estimate = estimate,
std.error = `std.error`,
method = "Multiple imputation"
)
coef_all <- bind_rows(coef_cc, coef_mi)
ggplot(coef_all,
aes(x = term, y = estimate, color = method)) +
geom_point(position = position_dodge(width = 0.4)) +
geom_errorbar(aes(ymin = estimate - 2 * std.error,
ymax = estimate + 2 * std.error),
width = 0.1,
position = position_dodge(width = 0.4)) +
coord_flip() +
labs(
title = "Cholesterol regression: complete cases vs multiple imputation",
x = "Predictor",
y = "Estimated coefficient",
color = "Method"
) +
theme_minimal(base_size = 11)
Figure 3. Estimated coefficients for chl ~ age + bmi in the
nhanes data, comparing a complete-case model and a
multiple-imputation model using mice.
Figure 3 helps explain what multiple imputation is doing in practice.
The red points and intervals come from the complete-case regression,
which ignores any row where chl, age, or
bmi is missing. This throws away data and effectively
assumes that the missing values are not related to the outcome in a
problematic way. The blue points and intervals come from the
multiple-imputation model: mice fills in the missing values
several different ways, fits the regression in each completed dataset,
and then combines the estimates with pool().
Comparing the two sets of coefficients, we see that the multiple-imputation estimates are similar in size but not identical, and their confidence intervals are usually a bit wider. The small changes in the estimates show that including information from the partially observed cases can shift the fitted relationship, while the wider intervals reflect the extra uncertainty about what the missing values could have been. In other words, multiple imputation uses more of the available data while also being more honest about how much we do not know.
For the main part of the project, I simulate a simple income–education dataset and then apply several missing-data methods to it. In this artificial example I know the “truth,” so I can see how much each method distorts it.
I create a sample where:
set.seed(123)
n <- 2000
education_levels <- c("Less than HS", "HS", "Some college", "Bachelor+")
education <- sample(education_levels, size = n, replace = TRUE,
prob = c(0.2, 0.3, 0.3, 0.2))
education <- factor(education, levels = education_levels)
age <- round(rnorm(n, mean = 40, sd = 12))
# Base income by education group
base_income <- c(
"Less than HS" = 30000,
"HS" = 40000,
"Some college" = 55000,
"Bachelor+" = 80000
)
income_true <- base_income[as.character(education)] +
500 * (age - 40) + # small age effect
rnorm(n, mean = 0, sd = 10000)
sim_full <- data.frame(
education = education,
age = age,
income = income_true
)
head(sim_full)
## education age income
## 1 HS 28 32496.93
## 2 Bachelor+ 28 70722.43
## 3 Some college 40 40518.35
## 4 Less than HS 38 22027.15
## 5 Less than HS 9 40484.90
## 6 HS 52 45625.85
Now I introduce missing values in income. People with lower income are more likely to have missing values (for example, because of nonresponse).
data_biased <- sim_full %>%
mutate(
miss_prob = case_when(
income < 40000 ~ 0.40,
income < 60000 ~ 0.25,
TRUE ~ 0.10
),
is_miss = rbinom(n(), size = 1, prob = miss_prob),
income = ifelse(is_miss == 1, NA, income)
)
# Complete-case dataset
data_cc <- data_biased %>%
filter(!is.na(income), !is.na(education))
mean(is.na(data_biased$income))
## [1] 0.2555
In the rest of this section I compare four methods for handling the
missing incomes: complete cases, overall mean imputation, regression
imputation, and multiple imputation with mice.
Complete-case analysis drops any row where income is missing. Conceptually, this is like saying “if you skipped at least one question on the survey, we pretend your whole survey never existed.” This is very simple and often the default in R, but if some groups are more likely to be missing (for example, low-income people), it can systematically bias results.
cc_means <- data_cc %>%
group_by(education) %>%
summarise(
mean_income = mean(income),
.groups = "drop"
) %>%
mutate(method = "Complete cases")
cc_means
## # A tibble: 4 × 3
## education mean_income method
## <fct> <dbl> <chr>
## 1 Less than HS 29914. Complete cases
## 2 HS 40507. Complete cases
## 3 Some college 56534. Complete cases
## 4 Bachelor+ 79961. Complete cases
This is common in practice because it is the default behavior of many
R functions (lm(), cor(), etc.) when
na.action = na.omit.
Overall mean imputation replaces each missing income value with the overall mean of observed income. Conceptually, this is like saying “anyone who did not report income gets the average income.” This keeps all rows in the dataset, but it shrinks differences between groups and makes the data look more concentrated around the mean than it really is.
overall_mean <- mean(data_biased$income, na.rm = TRUE)
data_imp_mean <- data_biased %>%
mutate(
income_imp = ifelse(is.na(income), overall_mean, income)
)
mean_means <- data_imp_mean %>%
group_by(education) %>%
summarise(
mean_income = mean(income_imp),
.groups = "drop"
) %>%
mutate(method = "Overall mean impute")
mean_means
## # A tibble: 4 × 3
## education mean_income method
## <fct> <dbl> <chr>
## 1 Less than HS 38988. Overall mean impute
## 2 HS 44720. Overall mean impute
## 3 Some college 55902. Overall mean impute
## 4 Bachelor+ 77257. Overall mean impute
This method is simple but can strongly distort group differences by pulling low-income groups upward and high-income groups downward.
Regression imputation uses a regression model to predict income from education and age, and plugs the predictions in for missing values. Conceptually, this is like saying “for each person with missing income, we predict what they probably would have earned given their education and age, using a regression fitted on the people who did report income.” This usually preserves relationships between variables better than overall mean imputation, but because imputed values sit close to the regression line, it tends to underestimate variability.
# Fit regression using only complete cases
fit_reg <- lm(income ~ education + age, data = data_cc)
# Use predicted values for missing incomes
data_imp_reg <- data_biased %>%
mutate(
income_imp = ifelse(
is.na(income),
predict(fit_reg, newdata = data_biased),
income
)
)
reg_means <- data_imp_reg %>%
group_by(education) %>%
summarise(
mean_income = mean(income_imp),
.groups = "drop"
) %>%
mutate(method = "Regression impute")
reg_means
## # A tibble: 4 × 3
## education mean_income method
## <fct> <dbl> <chr>
## 1 Less than HS 29597. Regression impute
## 2 HS 40216. Regression impute
## 3 Some college 56247. Regression impute
## 4 Bachelor+ 80185. Regression impute
Regression imputation usually preserves relationships with predictors better than overall mean imputation, but it tends to underestimate variability because predicted values fall close to the regression line.
mice (simulated data)Finally, I apply multiple imputation to the simulated income data
using the mice package. Conceptually, this works the same
way as in the nhanes example: instead of filling in each
missing income once, mice creates several different
completed versions of the dataset, each with slightly different imputed
incomes. These imputations are based on a model that uses the observed
variables (income, education, and age) and include randomness to reflect
uncertainty.
For this tutorial, I use the first completed dataset from
mice to compute mean incomes by education and to compare
distribution shapes. In a full multiple-imputation workflow, you would
typically fit your model in each imputed dataset and then combine the
estimates with pool().
Here, the main goal is to see how a model-based method like
mice compares to complete cases, overall mean imputation,
and single regression imputation on the same simulated problem.
## Multiple imputation for income using mice on simulated data ----
# We only need the variables used in imputation
data_for_mice <- data_biased %>%
select(income, education, age)
# Run mice: m = 5 imputed datasets
imp_income <- mice(data_for_mice, m = 5, seed = 123)
##
## iter imp variable
## 1 1 income
## 1 2 income
## 1 3 income
## 1 4 income
## 1 5 income
## 2 1 income
## 2 2 income
## 2 3 income
## 2 4 income
## 2 5 income
## 3 1 income
## 3 2 income
## 3 3 income
## 3 4 income
## 3 5 income
## 4 1 income
## 4 2 income
## 4 3 income
## 4 4 income
## 4 5 income
## 5 1 income
## 5 2 income
## 5 3 income
## 5 4 income
## 5 5 income
# Take the first completed dataset for summaries/plots
data_imp_mi <- complete(imp_income, 1)
# Mean income by education under mice
mi_means <- data_imp_mi %>%
group_by(education) %>%
summarise(
mean_income = mean(income),
.groups = "drop"
) %>%
mutate(method = "Multiple imputation (mice)")
mi_means
## # A tibble: 4 × 3
## education mean_income method
## <fct> <dbl> <chr>
## 1 Less than HS 29746. Multiple imputation (mice)
## 2 HS 40419. Multiple imputation (mice)
## 3 Some college 56101. Multiple imputation (mice)
## 4 Bachelor+ 80292. Multiple imputation (mice)
First I compute the true mean income by education (no missing data):
true_means <- sim_full %>%
group_by(education) %>%
summarise(
mean_income = mean(income),
.groups = "drop"
) %>%
mutate(method = "True (no missing)")
# Add complete cases, mean impute, regression impute, and mice
all_means <- bind_rows(
true_means,
cc_means,
mean_means,
reg_means,
mi_means
)
all_means
## # A tibble: 20 × 3
## education mean_income method
## <fct> <dbl> <chr>
## 1 Less than HS 29474. True (no missing)
## 2 HS 39457. True (no missing)
## 3 Some college 55553. True (no missing)
## 4 Bachelor+ 79768. True (no missing)
## 5 Less than HS 29914. Complete cases
## 6 HS 40507. Complete cases
## 7 Some college 56534. Complete cases
## 8 Bachelor+ 79961. Complete cases
## 9 Less than HS 38988. Overall mean impute
## 10 HS 44720. Overall mean impute
## 11 Some college 55902. Overall mean impute
## 12 Bachelor+ 77257. Overall mean impute
## 13 Less than HS 29597. Regression impute
## 14 HS 40216. Regression impute
## 15 Some college 56247. Regression impute
## 16 Bachelor+ 80185. Regression impute
## 17 Less than HS 29746. Multiple imputation (mice)
## 18 HS 40419. Multiple imputation (mice)
## 19 Some college 56101. Multiple imputation (mice)
## 20 Bachelor+ 80292. Multiple imputation (mice)
Now I visualize the mean income for each education group and each method.
all_means$method <- factor(
all_means$method,
levels = c(
"True (no missing)",
"Complete cases",
"Overall mean impute",
"Regression impute",
"Multiple imputation (mice)"
)
)
ggplot(all_means,
aes(x = education, y = mean_income, fill = method)) +
geom_col(position = "dodge") +
scale_y_continuous(labels = dollar_format(prefix = "$")) +
scale_fill_manual(
values = c(
"True (no missing)" = "grey20",
"Complete cases" = "#D73027",
"Overall mean impute" = "#FC8D59",
"Regression impute" = "#4575B4",
"Multiple imputation (mice)" = "#1B9E77"
)
) +
labs(
title = "Average income by education under different methods",
x = "Education level",
y = "Mean income",
fill = "Method"
) +
theme_minimal(base_size = 12) +
theme(plot.title = element_text(face = "bold"))
Figure 4. Average income by education level under the true data (no missing) and four missing-data methods.
Figure 4 shows how each method changes the average income pattern across education groups. The true bars increase smoothly with education. The complete-case bars are all a little too high, because low-income people are more likely to be missing and get dropped. Overall mean imputation pulls the groups toward a single middle value, shrinking the gap between “Less than HS” and “Bachelor+”. Regression imputation and mice both recover a pattern close to the truth, with mice slightly smoothing the differences.
Interestingly, in this one simulated data set the complete-case means sometimes look as close to the truth as the mice means. This can happen when the amount of missingness is moderate and the selection bias is not huge, and also because I am summarizing mice with only one completed data set instead of pooling results across all (m) imputations. In a larger simulation study averaged over many repetitions, we would usually expect properly specified multiple imputation to perform at least as well as complete cases.
Means can hide important changes in the shape of the distribution. To see this, I focus on the “HS” group and compare the full income distributions under each method.
true_hs <- sim_full %>%
filter(education == "HS")
cc_hs <- data_cc %>%
filter(education == "HS")
mean_hs <- data_imp_mean %>%
filter(education == "HS")
reg_hs <- data_imp_reg %>%
filter(education == "HS")
# HS group from the first mice-completed dataset
mi_hs <- data_imp_mi %>%
filter(education == "HS") %>%
transmute(
income = income,
method = "Multiple imputation (mice)"
)
hs_all <- bind_rows(
true_hs %>%
select(income) %>%
mutate(method = "True (no missing)"),
cc_hs %>%
select(income) %>%
mutate(method = "Complete cases"),
mean_hs %>%
transmute(income = income_imp,
method = "Overall mean impute"),
reg_hs %>%
transmute(income = income_imp,
method = "Regression impute"),
mi_hs
)
hs_all$method <- factor(
hs_all$method,
levels = c(
"True (no missing)",
"Complete cases",
"Overall mean impute",
"Regression impute",
"Multiple imputation (mice)"
)
)
ggplot(hs_all,
aes(x = income, color = method, linetype = method)) +
geom_density(linewidth = 1) +
scale_color_manual(
values = c(
"True (no missing)" = "black",
"Complete cases" = "#D73027",
"Overall mean impute" = "#FC8D59",
"Regression impute" = "#4575B4",
"Multiple imputation (mice)" = "#1B9E77"
)
) +
labs(
title = "Income distribution for HS group under different methods",
x = "Income",
y = "Density",
color = "Method",
linetype = "Method"
) +
theme_minimal(base_size = 11)
Figure 5. Income distributions for the HS group under the true data and four missing-data methods.
Figure 5 focuses on the HS group and compares the whole income distribution under each method. The solid black curve is the true distribution. The complete-case curve is shifted slightly to the right, again showing that dropping missing incomes makes this group look richer than it really is. Overall mean imputation produces a tall, very narrow bump around the mean, because many missing values are set to the same number; this badly underestimates the spread of incomes. Regression imputation tracks the center reasonably well but has a sharper peak and thinner tails than the true curve, reflecting the fact that predicted values sit close to the regression line. The mice curve is a bit rough (since it is based on one imputed dataset) but stays much closer to the true shape than overall mean or regression imputation, especially in the tails.
In a real dataset we never know the true means or regression coefficients, so there is no single “best” method for every situation. A useful way to think about missing data is:
naniar
to see how much is missing, which variables are affected, and whether
missingness depends on time or groups.Table 1. Summary of methods and when a beginner might use them
| Situation / goal | Method | R tools / functions | Simple description | Main pros | Main cons |
|---|---|---|---|---|---|
| Very small amount of missing data, looks random | Complete cases | na.omit(), default in lm() and many
functions |
Drop any row with an NA. |
Very easy, already built in. | Throws away data; can bias results if some groups are missing more than others. |
| Need a quick fill just to run some code | Overall mean imputation | mutate(... = ifelse(is.na(x), mean(x), x)) |
Replace each missing value with the overall mean of that variable. | Simple and keeps all rows. | Destroys variability and shrinks differences between groups; usually not recommended. |
| Missingness depends on predictors, single step | Regression imputation | lm(), predict() |
Predict missing values from a regression using other variables, then plug in those predictions. | Uses extra information; preserves relationships better. | Underestimates variability; still only one completed dataset (single imputation). |
| Serious analysis with several variables missing | Multiple imputation (mice) |
mice(), with(), pool() |
Create several completed datasets with random variation in imputations, fit models, then combine. | Uses all data; more realistic standard errors; flexible. | More complex; requires more decisions and computing than single imputation. |
This project shows how different missing-data strategies, all easily implemented in R, can produce very different answers.
From the airquality and naniar example, I
learned that visualizing missingness is an important first step. Missing
values were not uniformly spread across months, which means that simple
complete-case analysis would change the apparent seasonal pattern.
From the nhanes example, I saw how mice can
perform multiple imputation with only a few lines of code. The
coefficient plot (Figure 3) highlights that using all the data through
multiple imputation can both change the point estimates and better
reflect uncertainty.
In the simulated income example, where the true pattern is known, complete-case analysis consistently overestimated income in all education groups. Overall mean imputation severely distorted group differences and distribution shapes, while regression imputation did better but still underestimated variation.
Overall, the main lessons are:
naniar).mice) when the missingness mechanism
is not completely random.Future work could include adding a full multiple-imputation analysis to the simulated dataset, using more realistic missingness mechanisms, and comparing results across many simulated datasets rather than a single run.
R Core Team (2025). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
van Buuren, S. and Groothuis-Oudshoorn, K. (2011). “mice: Multivariate Imputation by Chained Equations in R.” Journal of Statistical Software, 45(3), 1–67.
Tierney, N. J. and Cook, D. (2018). “naniar: data structures for missing data in R.” Journal of Open Source Software, 3(26), 642.