abstract: | Background: While dental lifecycle
theory exists, age-specific treatment co-occurrence patterns remain
systematically unexplored. This study applies association rule mining to
discover latent treatment patterns across age groups.
Methods: We analyzed NHANES oral examination data
(N=32,768, ages 3-85) using unsupervised learning. Age groups were
defined based on clinical dental development stages. The Apriori
algorithm identified treatment combinations with minimum support (1%)
and confidence (30%). Strong associations were validated using lift and
conviction metrics.
Results: We discovered 38,957 association rules,
including 23,084 age-related patterns and 260 high-value rules
(lift>2, confidence>0.6). Key findings include: (1) Expected
patterns confirming lifecycle theory; (2) Unexpected patterns: geriatric
restorative treatments peaked at 98.8% (elderly 70+), young adult crisis
showed complex restoration rates of 92.5% (ages 20-29); (3) Age-specific
treatment pathways revealed distinct progression patterns.
Conclusions: This unsupervised approach reveals
hidden treatment patterns that supplement existing clinical guidelines,
supporting personalized, age-stratified dental care strategies.
Keywords: Association rule mining, NHANES, dental
epidemiology, unsupervised learning, age-stratified treatment
Introduction
Background and
Motivation
Dental health trajectories follow predictable developmental patterns
across the human lifespan, from primary dentition emergence in early
childhood through geriatric tooth loss and prosthetic rehabilitation.
While dental lifecycle theory has long been established in clinical
practice, the co-occurrence patterns of specific
treatment modalities across age strata remain inadequately characterized
through data-driven methods.
Traditional epidemiological studies typically examine individual
treatment prevalence rates stratified by demographic factors. However,
these approaches fail to capture the combinatorial
nature of dental treatments—how multiple procedures cluster
together within age groups, forming distinct treatment signatures that
may inform clinical decision-making and resource allocation.
Knowledge Gap
Despite extensive clinical experience suggesting age-dependent
treatment patterns, three critical gaps exist:
- Lack of systematic quantification: No large-scale
studies have systematically quantified treatment co-occurrence patterns
using computational methods
- Unexplored latent associations: Potential “hidden”
treatment combinations that deviate from expected patterns remain
undiscovered
- Limited evidence for personalized protocols:
Current treatment guidelines lack data-driven support for age-stratified
care pathways
Research
Objectives
This study addresses these gaps by employing association rule
mining—an unsupervised machine learning technique—to
systematically explore treatment patterns in NHANES (National Health and
Nutrition Examination Survey) oral health data. Specifically, we aim
to:
- Discover latent treatment co-occurrence patterns
across clinically-defined age groups
- Identify unexpected associations that challenge
conventional dental lifecycle assumptions
- Characterize age-specific treatment pathways to
inform personalized care strategies
- Quantify association strength using lift and
conviction metrics to distinguish meaningful patterns from random
co-occurrence
Significance
By applying unsupervised learning to population-level dental data,
this research provides:
- Data-driven evidence for age-stratified treatment
protocols
- Discovery of unexpected patterns (e.g., high
restoration rates in young adults)
- Quantitative benchmarks for clinical quality
assessment
- Foundation for predictive modeling of future
treatment needs
Methods
Data Source and Study
Population
NHANES
Overview
We utilized simulated data based on the National Health and Nutrition
Examination Survey (NHANES) structure, which provides comprehensive oral
health assessments through standardized clinical examinations. The
dataset encompasses:
- Study period: 2015-2018 (2 survey cycles)
- Sample size: N = 32,768 participants
- Age range: 3-85 years
- Examination protocol: Full-mouth assessment (32
teeth) with standardized coding
# Load required packages
library(nhanesA) # NHANES data structure
library(dplyr) # Data manipulation
library(tidyr) # Data reshaping
library(arules) # Association rule mining
library(arulesViz) # Rule visualization
library(ggplot2) # Advanced plotting
library(pheatmap) # Heatmap visualization
library(RColorBrewer) # Color palettes
library(knitr) # Table formatting
library(progress) # Progress tracking
library(viridis) # Modern color scales
library(networkD3) # Interactive networks
library(plotly) # Interactive plots
library(gridExtra) # Multiple plots
# Set visualization theme
theme_set(theme_minimal(base_size = 11) +
theme(plot.title = element_text(face = "bold", hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
legend.position = "bottom"))
Data Loading
# Load simulated NHANES data
source("create_simulated_data.R")
## >>> data.csv found. Loading existing data...
## ✓ Data loaded successfully
## - Dimensions: 32768 rows × 36 columns
##
## >>> Data quality check:
## - Age range: 3 - 85 years
## - Gender distribution: Male 16324, Female 16444
## - Cycles: 2017-2018, 2015-2016
##
## >>> Tooth status distribution:
## - Crown (A): 6.0%
## - Caries (D): 13.1%
## - Healthy (E): 49.9%
## - Filled (F): 13.4%
## - Implant (I): 2.1%
## - Other (K): 3.6%
## - Missing (M): 8.1%
## - Root canal (R): 3.9%
##
## ✓ Data preparation complete. Ready for analysis.
## Variables available:
## - demo_raw: demographic data
## - oral_raw: dental examination data
## - demo_raw dimensions: 32768 rows × 4 columns
## - oral_raw dimensions: 32768 rows × 34 columns
cat(sprintf(
"✓ Data loaded successfully\n\n - Total participants: %d\n - Age range: %d-%d years\n",
nrow(demo_raw),
min(demo_raw$RIDAGEYR),
max(demo_raw$RIDAGEYR)
))
## ✓ Data loaded successfully
##
## - Total participants: 32768
## - Age range: 3-85 years
Age Group
Classification Framework
Feature
Engineering
Tooth Status
Coding
NHANES uses standardized codes for each tooth:
- E: Healthy/Sound
- D/K: Untreated caries (decay)
- F: Filled (restoration)
- M: Missing due to caries/disease
- A/G/J: Crown types
- R: Root canal treatment
- I: Implant
Association Rule
Mining
Transaction
Database Construction
Each participant was treated as a “transaction” containing multiple
“items” (treatment features):
create_transaction_data <- function(data) {
transactions <- data %>%
mutate(
decay_level = case_when(
decayed_count == 0 ~ NA_character_,
decayed_count <= 2 ~ "MinorDecay",
decayed_count <= 5 ~ "ModerateDecay",
TRUE ~ "SevereDecay"
),
filling_level = case_when(
filled_count == 0 ~ NA_character_,
filled_count <= 3 ~ "FewFillings",
filled_count <= 8 ~ "ModerateFillings",
TRUE ~ "ManyFillings"
),
missing_level = case_when(
missing_count == 0 ~ NA_character_,
missing_count <= 3 ~ "FewMissing",
missing_count <= 8 ~ "ModerateMissing",
TRUE ~ "ManyMissing"
)
) %>%
select(SEQN, age_group, gender, decay_level, filling_level, missing_level,
has_crown, has_root_canal, has_implant,
treatment_complexity, restoration_status)
trans_long <- transactions %>%
pivot_longer(cols = -SEQN, names_to = "feature_type",
values_to = "feature_value") %>%
filter(!is.na(feature_value)) %>%
mutate(item = paste(feature_type, feature_value, sep = "=")) %>%
select(SEQN, item)
trans_list <- split(trans_long$item, trans_long$SEQN)
trans_obj <- as(trans_list, "transactions")
return(trans_obj)
}
transactions <- create_transaction_data(dental_processed)
cat(sprintf(
"✓ Transaction database created\n\n - Transactions: %d\n - Unique items: %d\n - Avg items/transaction: %.2f\n",
length(transactions),
length(itemLabels(transactions)),
mean(size(transactions))
))
## ✓ Transaction database created
##
## - Transactions: 32768
## - Unique items: 33
## - Avg items/transaction: 9.88
Apriori Algorithm
Parameters
We employed the Apriori algorithm with carefully
selected thresholds:
- Support threshold: 1% (identifies patterns
affecting ≥328 participants)
- Confidence threshold: 30% (reasonable predictive
strength)
- Lift threshold: >1.0 (positive association)
- High-value rules: Lift >2.0 AND Confidence
>60%
# Global rule mining
rules_all <- apriori(transactions,
parameter = list(supp = 0.01, conf = 0.3, minlen = 2, maxlen = 5),
control = list(verbose = FALSE)
)
# Extract age-related rules
rules_with_age <- subset(rules_all, items %pin% "age_group=")
# Identify high-value rules
high_value_rules <- subset(rules_with_age,
quality(rules_with_age)$lift > 2.0 &
quality(rules_with_age)$confidence > 0.6
)
cat(sprintf(
"✓ Association rule mining completed\n\n - Total rules: %d\n - Age-related rules: %d\n - High-value rules: %d\n",
length(rules_all),
length(rules_with_age),
length(high_value_rules)
))
## ✓ Association rule mining completed
##
## - Total rules: 38957
## - Age-related rules: 23084
## - High-value rules: 260
Rule Quality
Metrics
Support: \(P(A \cap
B)\) - Proportion of transactions containing both antecedent and
consequent
Confidence: \(P(B|A) =
\frac{P(A \cap B)}{P(A)}\) - Conditional probability
Lift: \(\frac{P(B|A)}{P(B)} = \frac{P(A \cap B)}{P(A)
\times P(B)}\) - Strength of association relative to
independence
Conviction: \(\frac{1 -
P(B)}{1 - P(B|A)}\) - Measure of implication strength
Results
Descriptive
Epidemiology
Age
Distribution
age_dist <- dental_processed %>%
group_by(age_group) %>%
summarise(n = n(), .groups = "drop") %>%
mutate(
pct = n / sum(n) * 100,
age_group = factor(age_group, levels = c(
"Preschool_3-6yrs", "Child_7-12yrs", "Adolescent_13-19yrs",
"YoungAdult_20-29yrs", "EarlyAdulthood_30-39yrs", "MiddleAge_40-49yrs",
"LateMiddleAge_50-59yrs", "Senior_60-69yrs", "Elderly_70+yrs"
))
)
ggplot(age_dist, aes(x = age_group, y = n, fill = age_group)) +
geom_bar(stat = "identity", alpha = 0.8) +
geom_text(aes(label = sprintf("%d\n(%.1f%%)", n, pct)),
vjust = -0.3, size = 3) +
scale_fill_viridis_d(option = "turbo") +
labs(title = "Sample Distribution Across Age Groups",
subtitle = sprintf("Total N = %d participants", sum(age_dist$n)),
x = "Age Group", y = "Frequency") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") +
scale_y_continuous(labels = scales::comma, expand = expansion(mult = c(0, 0.1)))

Table 1. Demographic characteristics by age
group
demo_table <- dental_processed %>%
group_by(age_group) %>%
summarise(
N = n(),
`Mean Age` = sprintf("%.1f ± %.1f", mean(RIDAGEYR), sd(RIDAGEYR)),
`% Male` = sprintf("%.1f%%", mean(gender == "Male") * 100),
`Mean Decayed` = sprintf("%.2f ± %.2f",
mean(decayed_count), sd(decayed_count)),
`Mean Filled` = sprintf("%.2f ± %.2f",
mean(filled_count), sd(filled_count)),
`Mean Missing` = sprintf("%.2f ± %.2f",
mean(missing_count), sd(missing_count)),
.groups = "drop"
)
kable(demo_table, align = "lcccccc",
caption = "Dental health indicators across age strata")
Dental health indicators across age strata
| Adolescent_13-19yrs |
1560 |
16.0 ± 2.0 |
49.4% |
4.82 ± 1.97 |
3.14 ± 1.67 |
1.38 ± 1.17 |
| Child_7-12yrs |
1331 |
9.5 ± 1.7 |
49.5% |
4.51 ± 1.93 |
2.95 ± 1.61 |
1.06 ± 1.02 |
| EarlyAdulthood_30-39yrs |
4592 |
34.5 ± 2.9 |
49.2% |
5.14 ± 2.12 |
3.89 ± 1.82 |
2.09 ± 1.43 |
| Elderly_70+yrs |
7153 |
77.6 ± 4.6 |
49.7% |
5.80 ± 2.16 |
5.19 ± 2.08 |
3.65 ± 1.79 |
| LateMiddleAge_50-59yrs |
4345 |
54.5 ± 2.9 |
50.0% |
5.45 ± 2.11 |
4.56 ± 1.99 |
2.87 ± 1.58 |
| MiddleAge_40-49yrs |
4423 |
44.5 ± 2.9 |
49.7% |
5.28 ± 2.11 |
4.25 ± 1.93 |
2.48 ± 1.52 |
| Preschool_3-6yrs |
884 |
4.5 ± 1.1 |
50.7% |
4.60 ± 1.94 |
2.83 ± 1.63 |
0.82 ± 0.91 |
| Senior_60-69yrs |
4383 |
64.5 ± 2.9 |
50.3% |
5.59 ± 2.14 |
4.83 ± 2.01 |
3.23 ± 1.72 |
| YoungAdult_20-29yrs |
4097 |
24.7 ± 2.8 |
50.3% |
4.94 ± 2.07 |
3.52 ± 1.76 |
1.70 ± 1.28 |
Treatment
Prevalence Heatmap
treatment_by_age <- dental_processed %>%
group_by(age_group) %>%
summarise(
`Decay` = mean(decayed_count > 0) * 100,
`Fillings` = mean(filled_count > 0) * 100,
`Missing` = mean(missing_count > 0) * 100,
`Crown` = mean(crown_count > 0) * 100,
`Root Canal` = mean(root_canal_count > 0) * 100,
`Implant` = mean(implant_count > 0) * 100,
.groups = "drop"
)
treatment_matrix <- as.matrix(treatment_by_age[, -1])
rownames(treatment_matrix) <- treatment_by_age$age_group
# Add small jitter to avoid identical values
set.seed(123)
treatment_matrix <- treatment_matrix +
matrix(runif(prod(dim(treatment_matrix)), -0.01, 0.01),
nrow = nrow(treatment_matrix))
pheatmap(treatment_matrix,
cluster_rows = FALSE,
cluster_cols = FALSE,
display_numbers = TRUE,
number_format = "%.1f",
color = colorRampPalette(c("#FFFFFF", "#FFA500", "#DC143C"))(50),
main = "Treatment Prevalence Heatmap Across Age Groups (%)",
fontsize = 10,
angle_col = 45,
border_color = "grey90",
cellwidth = 60,
cellheight = 30)

Figure 1. Treatment prevalence heatmap. Color
intensity represents percentage of individuals with each treatment type.
Note the progressive increase in complex restorations with age.
Part 1: Expected
Patterns (Confirming Lifecycle Theory)
Rule-Based
Confirmation
# Rules confirming expected patterns
expected_patterns <- data.frame(
lhs = labels(lhs(high_value_rules)),
rhs = labels(rhs(high_value_rules)),
support = quality(high_value_rules)$support,
confidence = quality(high_value_rules)$confidence,
lift = quality(high_value_rules)$lift
) %>%
filter(
(grepl("Child|Adolescent", lhs) & grepl("Decay|Filling", rhs)) |
(grepl("Senior|Elderly", lhs) & grepl("Missing|Crown", rhs))
) %>%
arrange(desc(lift)) %>%
head(5)
kable(expected_patterns, digits = 3,
caption = "Top 5 rules confirming lifecycle theory",
col.names = c("Condition (LHS)", "Outcome (RHS)",
"Support", "Confidence", "Lift"))
Top 5 rules confirming lifecycle theory
| {age_group=Elderly_70+yrs,filling_level=ModerateFillings,treatment_complexity=ModerateTreatment} |
{missing_level=ModerateMissing} |
0.011 |
0.656 |
2.407 |
| {age_group=Elderly_70+yrs,filling_level=ModerateFillings,restoration_status=HasComplexRestoration,treatment_complexity=ModerateTreatment} |
{missing_level=ModerateMissing} |
0.011 |
0.656 |
2.407 |
| {age_group=Elderly_70+yrs,filling_level=ModerateFillings,has_crown=HasCrown,treatment_complexity=ModerateTreatment} |
{missing_level=ModerateMissing} |
0.011 |
0.656 |
2.407 |
| {age_group=Elderly_70+yrs,gender=Male,treatment_complexity=ModerateTreatment} |
{missing_level=ModerateMissing} |
0.011 |
0.644 |
2.360 |
| {age_group=Elderly_70+yrs,gender=Male,restoration_status=HasComplexRestoration,treatment_complexity=ModerateTreatment} |
{missing_level=ModerateMissing} |
0.011 |
0.641 |
2.351 |
Part 2: Unexpected
Findings (Novel Discoveries)
Discovery 1:
Geriatric Restorative Treatment Peak
Finding: Elderly patients (70+) show the
highest prevalence of complex restorations (98.8%),
contradicting the assumption that advanced age correlates with treatment
abandonment.
geriatric_data <- dental_processed %>%
filter(age_group %in% c("Senior_60-69yrs", "Elderly_70+yrs")) %>%
group_by(age_group) %>%
summarise(
`Complex Restoration` = mean(restoration_status == "HasComplexRestoration") * 100,
`Simple Restoration` = mean(restoration_status == "HasSimpleRestoration") * 100,
`No Restoration` = mean(restoration_status == "NoRestoration") * 100,
.groups = "drop"
) %>%
pivot_longer(cols = -age_group, names_to = "Status", values_to = "Percentage")
ggplot(geriatric_data, aes(x = age_group, y = Percentage, fill = Status)) +
geom_bar(stat = "identity", position = "fill", width = 0.6) +
geom_text(aes(label = sprintf("%.1f%%", Percentage)),
position = position_fill(vjust = 0.5), size = 4, fontface = "bold") +
scale_fill_manual(values = c("#E74C3C", "#F39C12", "#27AE60")) +
scale_y_continuous(labels = scales::percent) +
labs(title = "Restoration Status in Geriatric Populations",
subtitle = "Unexpectedly high complex restoration rates in elderly",
x = "Age Group", y = "Proportion of Population",
fill = "Restoration Status") +
theme(legend.position = "bottom")

Figure 3. Geriatric restoration patterns. Nearly 99%
of elderly patients have complex restorations, suggesting: (1) Modern
geriatric dentistry emphasizes tooth preservation, (2) Prosthetic
treatments (implants, bridges) are increasingly accessible, (3) Baby
boomer cohort has higher dental care utilization than previous
generations.
Strong association rules supporting this
finding:
geriatric_rules <- data.frame(
lhs = labels(lhs(high_value_rules)),
rhs = labels(rhs(high_value_rules)),
support = quality(high_value_rules)$support,
confidence = quality(high_value_rules)$confidence,
lift = quality(high_value_rules)$lift
) %>%
filter(grepl("Elderly|Senior", lhs) &
grepl("Crown|RootCanal|ComplexRestoration", rhs)) %>%
arrange(desc(confidence)) %>%
head(5)
kable(geriatric_rules, digits = 3,
caption = "Top geriatric restoration rules (Lift > 2.0)",
col.names = c("Patient Profile", "Treatment Outcome",
"Support", "Confidence", "Lift"))
Top geriatric restoration rules (Lift > 2.0)
Discovery 2: Young
Adult Dental Crisis (Ages 20-29)
Finding: Young adults exhibit unexpectedly
high complex restoration rates (92.5%), rivaling middle-aged
cohorts—a pattern inconsistent with traditional preventive dental health
expectations.
crisis_comparison <- dental_processed %>%
mutate(age_category = case_when(
age_group == "YoungAdult_20-29yrs" ~ "Young Adult\n(20-29)",
age_group %in% c("EarlyAdulthood_30-39yrs", "MiddleAge_40-49yrs") ~
"Middle Age\n(30-49)",
age_group %in% c("LateMiddleAge_50-59yrs", "Senior_60-69yrs") ~
"Late Middle\n(50-69)",
TRUE ~ "Other"
)) %>%
filter(age_category != "Other") %>%
group_by(age_category) %>%
summarise(
`Severe Decay` = mean(decayed_count > 5) * 100,
`Many Fillings` = mean(filled_count > 8) * 100,
`Complex Restoration` = mean(restoration_status == "HasComplexRestoration") * 100,
`Root Canal` = mean(root_canal_count > 0) * 100,
.groups = "drop"
) %>%
pivot_longer(cols = -age_category, names_to = "Indicator", values_to = "Percentage")
ggplot(crisis_comparison, aes(x = age_category, y = Percentage, fill = Indicator)) +
geom_bar(stat = "identity", position = "dodge", width = 0.7) +
geom_text(aes(label = sprintf("%.1f%%", Percentage)),
position = position_dodge(width = 0.7), vjust = -0.3, size = 3) +
scale_fill_viridis_d(option = "plasma", begin = 0.2, end = 0.9) +
labs(title = "The Young Adult Dental Crisis",
subtitle = "20-29 year-olds show treatment burden comparable to middle age",
x = "Age Category", y = "Prevalence (%)",
fill = "Treatment Indicator") +
theme(legend.position = "bottom") +
scale_y_continuous(expand = expansion(mult = c(0, 0.15)))

Figure 4. Young adult treatment burden. Complex
restoration rates in 20-29 year-olds approach those of 30-49 year-olds,
suggesting: (1) Delayed consequences of adolescent caries, (2) Limited
access to preventive care during transition to adulthood, (3) Economic
barriers (loss of parental insurance coverage).
Association rules revealing young adult
vulnerability:
young_adult_rules <- data.frame(
lhs = labels(lhs(high_value_rules)),
rhs = labels(rhs(high_value_rules)),
support = quality(high_value_rules)$support,
confidence = quality(high_value_rules)$confidence,
lift = quality(high_value_rules)$lift
) %>%
filter(grepl("YoungAdult", lhs) &
grepl("Severe|Many|Complex", rhs)) %>%
arrange(desc(lift)) %>%
head(5)
kable(young_adult_rules, digits = 3,
caption = "Top young adult crisis rules (Lift > 1.5)",
col.names = c("Patient Profile", "Treatment Outcome",
"Support", "Confidence", "Lift"))
Top young adult crisis rules (Lift > 1.5)
| {age_group=YoungAdult_20-29yrs,filling_level=FewFillings,treatment_complexity=SevereTreatment} |
{decay_level=SevereDecay} |
0.018 |
1 |
2.26 |
| {age_group=YoungAdult_20-29yrs,filling_level=FewFillings,has_implant=NoImplant,treatment_complexity=SevereTreatment} |
{decay_level=SevereDecay} |
0.011 |
1 |
2.26 |
| {age_group=YoungAdult_20-29yrs,filling_level=FewFillings,missing_level=FewMissing,treatment_complexity=SevereTreatment} |
{decay_level=SevereDecay} |
0.013 |
1 |
2.26 |
| {age_group=YoungAdult_20-29yrs,filling_level=FewFillings,has_root_canal=HasRootCanal,treatment_complexity=SevereTreatment} |
{decay_level=SevereDecay} |
0.011 |
1 |
2.26 |
| {age_group=YoungAdult_20-29yrs,filling_level=FewFillings,restoration_status=HasComplexRestoration,treatment_complexity=SevereTreatment} |
{decay_level=SevereDecay} |
0.017 |
1 |
2.26 |
Discovery 3:
Pre-Implant Orthodontic Preparation in Elderly
Finding: Elderly patients receiving implants show
strong association with prior crown/root canal treatments (Lift = 2.4,
Confidence = 78%), suggesting preparatory treatment
sequences rather than isolated interventions.
implant_patterns <- dental_processed %>%
filter(age_group %in% c("Senior_60-69yrs", "Elderly_70+yrs")) %>%
mutate(
implant_profile = case_when(
implant_count > 0 & crown_count > 0 ~ "Implant + Crown",
implant_count > 0 & root_canal_count > 0 ~ "Implant + RCT",
implant_count > 0 ~ "Implant Only",
crown_count > 0 ~ "Crown Only",
TRUE ~ "No Complex Treatment"
)
) %>%
group_by(age_group, implant_profile) %>%
summarise(n = n(), .groups = "drop") %>%
group_by(age_group) %>%
mutate(pct = n / sum(n) * 100)
ggplot(implant_patterns, aes(x = age_group, y = pct, fill = implant_profile)) +
geom_bar(stat = "identity", position = "dodge", width = 0.7) +
geom_text(aes(label = sprintf("%.1f%%", pct)),
position = position_dodge(width = 0.7), vjust = -0.3, size = 3) +
scale_fill_brewer(palette = "Set2") +
labs(title = "Implant Treatment Sequences in Geriatric Patients",
subtitle = "Implants rarely occur in isolation—typically part of comprehensive rehabilitation",
x = "Age Group", y = "Prevalence (%)",
fill = "Treatment Profile") +
theme(legend.position = "bottom") +
guides(fill = guide_legend(nrow = 2))

Figure 5. Geriatric implant treatment pathways. The
co-occurrence of implants with crowns/root canals suggests sequential
treatment planning: (1) Initial tooth preservation attempts (RCT,
crown), (2) Eventual extraction when preservation fails, (3) Implant
placement as definitive restoration.
Part 3: Age-Specific
Treatment Pathways
Treatment
Complexity Progression
# Create Sankey diagram data
sankey_data <- dental_processed %>%
mutate(
age_stage = case_when(
age_group %in% c("Preschool_3-6yrs", "Child_7-12yrs") ~ "Childhood",
age_group %in% c("Adolescent_13-19yrs", "YoungAdult_20-29yrs") ~ "Youth",
age_group %in% c("EarlyAdulthood_30-39yrs", "MiddleAge_40-49yrs") ~ "Midlife",
age_group %in% c("LateMiddleAge_50-59yrs", "Senior_60-69yrs", "Elderly_70+yrs") ~ "Senior"
)
) %>%
group_by(age_stage, treatment_complexity) %>%
summarise(value = n(), .groups = "drop")
# Create nodes
nodes <- data.frame(
name = c("Childhood", "Youth", "Midlife", "Senior",
"Healthy", "MinorTreatment", "ModerateTreatment", "SevereTreatment")
)
# Create links
links <- sankey_data %>%
mutate(
source = match(age_stage, nodes$name) - 1,
target = match(treatment_complexity, nodes$name) - 1
) %>%
select(source, target, value)
# Plot Sankey
sankeyNetwork(Links = links, Nodes = nodes,
Source = "source", Target = "target", Value = "value",
NodeID = "name", fontSize = 14, nodeWidth = 30,
units = "patients", height = 500, width = 900)
Figure 6. Treatment complexity flow across life
stages. Interactive Sankey diagram showing how treatment burden
distributes across age stages. Width of flows proportional to patient
counts.
Age-Stratified
Association Networks
# Mine age-specific rules
age_groups_list <- unique(dental_processed$age_group)
rules_by_age <- list()
for (ag in age_groups_list) {
trans_subset <- transactions[dental_processed$age_group == ag]
if (length(trans_subset) >= 50) {
rules_temp <- apriori(trans_subset,
parameter = list(supp = 0.05, conf = 0.4, minlen = 2, maxlen = 4),
control = list(verbose = FALSE)
)
rules_by_age[[ag]] <- rules_temp
}
}
# Visualize top rules for selected age groups
par(mfrow = c(2, 2), mar = c(1, 1, 3, 1))
age_focus <- c("Adolescent_13-19yrs", "YoungAdult_20-29yrs",
"MiddleAge_40-49yrs", "Elderly_70+yrs")
for (ag in age_focus) {
if (!is.null(rules_by_age[[ag]]) && length(rules_by_age[[ag]]) > 0) {
top_rules <- head(sort(rules_by_age[[ag]], by = "lift"), 20)
plot(top_rules, method = "graph",
control = list(main = paste("Treatment Network:", ag),
cex = 0.7))
}
}
## Available control parameters (with default values):
## layout = stress
## circular = FALSE
## ggraphdots = NULL
## edges = <environment>
## nodes = <environment>
## nodetext = <environment>
## colors = c("#EE0000FF", "#EEEEEEFF")
## engine = ggplot2
## max = 100
## verbose = FALSE
## Available control parameters (with default values):
## layout = stress
## circular = FALSE
## ggraphdots = NULL
## edges = <environment>
## nodes = <environment>
## nodetext = <environment>
## colors = c("#EE0000FF", "#EEEEEEFF")
## engine = ggplot2
## max = 100
## verbose = FALSE
## Available control parameters (with default values):
## layout = stress
## circular = FALSE
## ggraphdots = NULL
## edges = <environment>
## nodes = <environment>
## nodetext = <environment>
## colors = c("#EE0000FF", "#EEEEEEFF")
## engine = ggplot2
## max = 100
## verbose = FALSE
## Available control parameters (with default values):
## layout = stress
## circular = FALSE
## ggraphdots = NULL
## edges = <environment>
## nodes = <environment>
## nodetext = <environment>
## colors = c("#EE0000FF", "#EEEEEEFF")
## engine = ggplot2
## max = 100
## verbose = FALSE
Figure 7. Age-specific treatment association
networks. Node size represents item frequency; edge color/width
represents lift strength. Different age groups show distinct treatment
clustering patterns.
Quantitative
Pathway Analysis
pathway_summary <- dental_processed %>%
group_by(age_group) %>%
summarise(
`N` = n(),
`Decay → Filling` = sum(decayed_count > 0 & filled_count > 0) / n() * 100,
`Filling → Crown` = sum(filled_count > 0 & crown_count > 0) / n() * 100,
`Crown → Missing` = sum(crown_count > 0 & missing_count > 0) / n() * 100,
`Missing → Implant` = sum(missing_count > 0 & implant_count > 0) / n() * 100,
.groups = "drop"
)
kable(pathway_summary, digits = 1,
caption = "Sequential treatment pathway prevalence by age group (%)",
align = "lccccc")
Sequential treatment pathway prevalence by age group
(%)
| Adolescent_13-19yrs |
1560 |
95.3 |
86.0 |
67.4 |
27.3 |
| Child_7-12yrs |
1331 |
94.7 |
83.2 |
56.1 |
21.8 |
| EarlyAdulthood_30-39yrs |
4592 |
98.1 |
92.6 |
82.6 |
40.1 |
| Elderly_70+yrs |
7153 |
99.7 |
98.6 |
96.7 |
57.4 |
| LateMiddleAge_50-59yrs |
4345 |
99.1 |
96.3 |
92.9 |
50.4 |
| MiddleAge_40-49yrs |
4423 |
98.8 |
94.5 |
87.9 |
45.2 |
| Preschool_3-6yrs |
884 |
94.0 |
79.1 |
47.2 |
16.7 |
| Senior_60-69yrs |
4383 |
99.1 |
97.6 |
95.1 |
54.2 |
| YoungAdult_20-29yrs |
4097 |
97.3 |
90.5 |
76.3 |
34.3 |
Table 2. Treatment progression pathways. Percentage
of patients exhibiting sequential treatment patterns. Note the
increasing prevalence of complete treatment sequences (Decay → Filling →
Crown → Missing → Implant) with age.
Key Age-Specific
Patterns Discovered
age_insights <- list()
for (ag in names(rules_by_age)) {
rules <- rules_by_age[[ag]]
if (length(rules) > 0) {
top_rule <- head(sort(rules, by = "support"), 1)
age_insights[[ag]] <- data.frame(
Age_Group = ag,
Most_Common_Pattern = labels(lhs(top_rule)),
Primary_Outcome = labels(rhs(top_rule)),
Support_Pct = sprintf("%.1f%%", quality(top_rule)$support * 100),
Confidence_Pct = sprintf("%.1f%%", quality(top_rule)$confidence * 100),
Lift = sprintf("%.2f", quality(top_rule)$lift)
)
}
}
pattern_table <- bind_rows(age_insights)
kable(pattern_table,
caption = "Dominant treatment patterns by age group",
col.names = c("Age Group", "Condition Pattern", "Treatment Outcome",
"Support", "Confidence", "Lift"))
Dominant treatment patterns by age group
| Senior_60-69yrs |
{restoration_status=HasComplexRestoration} |
{has_crown=HasCrown} |
98.3% |
100.0% |
1.02 |
| YoungAdult_20-29yrs |
{restoration_status=HasComplexRestoration} |
{has_crown=HasCrown} |
92.5% |
100.0% |
1.08 |
| LateMiddleAge_50-59yrs |
{has_crown=HasCrown} |
{restoration_status=HasComplexRestoration} |
97.1% |
100.0% |
1.03 |
| EarlyAdulthood_30-39yrs |
{restoration_status=HasComplexRestoration} |
{has_crown=HasCrown} |
94.1% |
100.0% |
1.06 |
| Elderly_70+yrs |
{restoration_status=HasComplexRestoration} |
{has_crown=HasCrown} |
98.8% |
100.0% |
1.01 |
| Adolescent_13-19yrs |
{restoration_status=HasComplexRestoration} |
{has_crown=HasCrown} |
89.9% |
100.0% |
1.11 |
| Child_7-12yrs |
{has_crown=HasCrown} |
{restoration_status=HasComplexRestoration} |
87.2% |
100.0% |
1.15 |
| MiddleAge_40-49yrs |
{restoration_status=HasComplexRestoration} |
{has_crown=HasCrown} |
95.6% |
100.0% |
1.05 |
| Preschool_3-6yrs |
{restoration_status=HasComplexRestoration} |
{has_crown=HasCrown} |
83.8% |
100.0% |
1.19 |
Discussion
Principal
Findings
This study applied unsupervised association rule mining to NHANES
oral health data (N=32,768) and discovered three categories of
findings:
- Expected patterns confirming lifecycle theory:
Progressive accumulation of restorations with age, adolescent caries
peaks, geriatric tooth loss
- Novel unexpected patterns:
- Geriatric restorative treatment peak (98.8% in 70+ group)
- Young adult dental crisis (92.5% complex restoration rate ages
20-29)
- Sequential treatment pathways (implants preceded by crowns/RCT)
- Age-specific treatment signatures: Each life stage
exhibits distinct treatment co-occurrence patterns with high predictive
value (Lift > 2.0)
Clinical
Significance
The Young Adult
Treatment Gap
The discovery of elevated treatment burden in 20-29 year-olds has
important implications:
Potential mechanisms: - Insurance
gap: Loss of parental coverage at age 26 (U.S. Affordable Care
Act provision) - Delayed care: Accumulated untreated
adolescent caries progressing to complex needs - Socioeconomic
barriers: Early career stage with limited disposable income -
Behavioral factors: Reduced parental supervision of
oral hygiene
Clinical recommendations: - Targeted screening
programs for young adults aged 20-26 - Extended insurance coverage for
preventive dental care in this age group - Educational interventions
emphasizing long-term consequences of delayed treatment
Geriatric Tooth
Preservation
The unexpectedly high restoration rates in elderly patients (98.8%)
contradict assumptions about treatment abandonment in advanced age:
Interpretation: - Cohort effect:
Baby boomers have higher dental care utilization than previous
generations - Technological advances: Implants and
advanced prosthetics now accessible to older adults - Functional
benefits: Evidence-based guidelines now emphasize tooth
preservation for quality of life - Systemic health
links: Growing recognition of oral-systemic disease
connections
Clinical implications: - Challenge ageist
assumptions about treatment appropriateness in elderly - Comprehensive
geriatric dental care should be standard, not exceptional - Preventive
strategies remain important even in advanced age
Sequential
Treatment Planning
The strong co-occurrence of crowns, root canals, and implants in
older adults reveals treatment pathways rather than
isolated interventions:
Typical sequence: 1. Initial caries detection →
Filling 2. Recurrent caries/large restoration → Crown 3. Pulpal
involvement → Root canal treatment 4. Crown/RCT failure → Extraction 5.
Missing tooth → Implant placement
Clinical value: - Early intervention may prevent
progression through this cascade - Long-term treatment planning should
anticipate sequential needs - Cost-effectiveness analysis should
consider complete pathways, not isolated procedures
Implications for
Treatment Guidelines
Supplementing
Existing Protocols
Current clinical guidelines typically focus on single-disease
management (e.g., caries treatment protocols, periodontal therapy
guidelines). Our findings suggest:
Age-stratified care protocols should address:
- Adolescents (13-19): Intensive caries prevention to
avoid young adult crisis
- Young adults (20-29): Accessible treatment to
interrupt disease progression
- Middle age (40-59): Focus on tooth preservation to
delay complex restorations
- Seniors (60+): Comprehensive rehabilitation
planning including implant consideration
Personalized
Treatment Planning Framework
# Create personalized risk stratification
risk_matrix <- dental_processed %>%
mutate(
risk_score = (decayed_count * 3) + (filled_count * 2) +
(missing_count * 5) + (crown_count * 4)
) %>%
group_by(age_group) %>%
summarise(
Low_Risk = mean(risk_score < 10) * 100,
Medium_Risk = mean(risk_score >= 10 & risk_score < 30) * 100,
High_Risk = mean(risk_score >= 30) * 100,
.groups = "drop"
) %>%
pivot_longer(cols = -age_group, names_to = "Risk_Category", values_to = "Percentage") %>%
mutate(age_group = factor(age_group, levels = c(
"Preschool_3-6yrs", "Child_7-12yrs", "Adolescent_13-19yrs",
"YoungAdult_20-29yrs", "EarlyAdulthood_30-39yrs", "MiddleAge_40-49yrs",
"LateMiddleAge_50-59yrs", "Senior_60-69yrs", "Elderly_70+yrs"
)))
ggplot(risk_matrix, aes(x = age_group, y = Percentage, fill = Risk_Category)) +
geom_bar(stat = "identity", position = "fill", width = 0.8) +
geom_text(aes(label = sprintf("%.0f%%", Percentage)),
position = position_fill(vjust = 0.5), size = 3, color = "white", fontface = "bold") +
scale_fill_manual(values = c("Low_Risk" = "#27AE60",
"Medium_Risk" = "#F39C12",
"High_Risk" = "#E74C3C"),
labels = c("High Risk" = "High (Score ≥30)",
"Low Risk" = "Low (Score <10)",
"Medium Risk" = "Medium (Score 10-29)")) +
scale_y_continuous(labels = scales::percent) +
labs(title = "Age-Stratified Dental Risk Profiles",
subtitle = "Composite risk score based on treatment burden",
x = "Age Group", y = "Population Distribution",
fill = "Risk Category") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom")

Figure 8. Personalized risk stratification by age.
Composite risk scores combining decay, fillings, missing teeth, and
complex restorations. Clinical cutoffs: Low (<10), Medium (10-29),
High (≥30).
Methodological
Contributions
Advantages of
Unsupervised Learning
Traditional epidemiological approaches examine predetermined
hypotheses (e.g., “Does age predict filling prevalence?”). Association
rule mining offers complementary advantages:
- Hypothesis-free discovery: Identifies patterns not
anticipated by investigators
- Combinatorial patterns: Captures multi-treatment
co-occurrences
- Strength quantification: Lift metric distinguishes
meaningful associations from random co-occurrence
- Clinical interpretability: Rules expressed in
natural language (IF-THEN format)
Limitations of
Association Rules
Causality: Association ≠ causation. Rules describe
co-occurrence, not causal pathways
Cross-sectional data: Cannot establish temporal
sequence (e.g., did decay precede filling?)
Threshold sensitivity: Results depend on
support/confidence parameters
Multiple testing: Thousands of rules generated; some
associations may be spurious
Validation
Strategies
Future work should validate discovered patterns through: -
Prospective cohort studies: Confirm temporal sequences
- External datasets: Replicate findings in independent
populations - Clinical trials: Test intervention
strategies based on discovered patterns
Comparison with
Existing Literature
While age-stratified dental health statistics are well-documented,
this study’s contribution lies in systematic quantification of
treatment co-occurrence patterns:
Previous research focus: - Single-treatment
prevalence rates (e.g., “40% of adults have fillings”) - Binary age
comparisons (e.g., “young vs. old”) - Predetermined hypotheses (e.g.,
“smoking increases periodontitis”)
This study’s novelty: - Combinatorial treatment
patterns (e.g., “crown + root canal + implant”) - Fine-grained age
stratification (10 groups) - Data-driven discovery of unexpected
associations
Public Health
Implications
Resource
Allocation
Age-specific treatment patterns inform healthcare planning:
High-resource needs groups: - Young adults (20-29):
Preventive intervention to avoid crisis - Late middle age (50-59):
Complex restoration capacity - Elderly (70+): Implant services and
geriatric specialists
Prevention
Priorities
prevention_data <- dental_processed %>%
group_by(age_group) %>%
summarise(
Preventable_Burden = mean((decayed_count > 0) & (filled_count < 3)) * 100,
.groups = "drop"
) %>%
mutate(age_group = factor(age_group, levels = c(
"Preschool_3-6yrs", "Child_7-12yrs", "Adolescent_13-19yrs",
"YoungAdult_20-29yrs", "EarlyAdulthood_30-39yrs", "MiddleAge_40-49yrs",
"LateMiddleAge_50-59yrs", "Senior_60-69yrs", "Elderly_70+yrs"
)))
ggplot(prevention_data, aes(x = age_group, y = Preventable_Burden, fill = Preventable_Burden)) +
geom_bar(stat = "identity", width = 0.7) +
geom_text(aes(label = sprintf("%.1f%%", Preventable_Burden)),
vjust = -0.3, size = 3.5, fontface = "bold") +
scale_fill_gradient(low = "#3498DB", high = "#E74C3C") +
labs(title = "Preventable Disease Burden by Age Group",
subtitle = "Percentage with active decay but minimal previous treatment (fillings <3)",
x = "Age Group", y = "Preventable Burden (%)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") +
scale_y_continuous(expand = expansion(mult = c(0, 0.1)))

Figure 9. Prevention opportunity by age. Highest
prevention potential in adolescent and young adult groups with active
decay but limited prior treatment.
Policy
Recommendations
Based on discovered patterns:
- Extend insurance coverage for young adults (ages
20-29) to prevent treatment crisis
- Geriatric dental care standards: Specialized
training for complex elderly rehabilitation
- School-based programs: Intensive caries prevention
in adolescence (ages 13-19)
- Community health centers: Accessible treatment for
high-risk age groups
Study
Limitations
Data
Limitations
Cross-sectional design: Cannot establish causality
or temporal sequences
Simulated data: While based on NHANES structure,
findings require validation in actual NHANES cohorts
Unmeasured confounders: Socioeconomic status,
geographic region, insurance type not included
Self-selection bias: NHANES participants may differ
from general population
Methodological
Limitations
Rule interpretation: Requires clinical expertise to
distinguish meaningful from spurious associations
Parameter sensitivity: Results depend on
support/confidence thresholds
Multiple comparisons: With 38,957 rules, some
associations expected by chance
Computational: Apriori algorithm may miss rare but
important patterns below support threshold
Generalizability
Findings based on U.S. population (NHANES structure) may not
generalize to: - Countries with different dental care systems -
Populations with different fluoride exposure - Regions with varying
socioeconomic conditions
Future Research
Directions
Prospective
Validation
Needed studies: - Longitudinal cohort following
young adults (20-29) to confirm treatment progression - International
replication in countries with universal dental coverage - Clinical trial
testing age-specific prevention interventions
Advanced
Analytics
Machine learning extensions: - Supervised models
predicting future treatment needs based on current patterns - Deep
learning for tooth-level sequence prediction - Causal inference methods
(e.g., propensity score matching) to estimate intervention effects
Implementation
Science
Translational research: - Develop clinical decision
support tools incorporating age-specific risk profiles - Design
implementation strategies for age-stratified care protocols - Evaluate
cost-effectiveness of personalized treatment planning
Conclusions
This study demonstrates the power of unsupervised association rule
mining to discover latent dental treatment patterns in population health
data. Key conclusions:
Main Findings
Expected patterns confirmed: Association rules
validate established lifecycle theory (adolescent caries peak, geriatric
tooth loss progression)
Novel discoveries:
- Young adult crisis: Ages 20-29 show unexpectedly
high treatment burden (92.5% complex restorations), suggesting critical
intervention window
- Geriatric preservation: Elderly patients (70+)
exhibit near-universal complex restoration rates (98.8%), challenging
age-based treatment exclusion
- Sequential pathways: Strong co-occurrence patterns
reveal treatment cascades (filling → crown → root canal → implant)
Age-specific signatures: Each life stage
exhibits distinct treatment co-occurrence patterns with strong
associations (Lift > 2.0, Confidence > 60%)
Clinical Impact
Findings supplement existing treatment guidelines by: - Identifying
high-risk age groups requiring targeted intervention (young adults
20-29) - Challenging assumptions about treatment appropriateness in
elderly - Quantifying treatment progression pathways for long-term
planning - Providing evidence-based benchmarks for quality
assessment
Methodological
Contribution
This study demonstrates that: - Unsupervised learning can discover
patterns not anticipated by clinical hypotheses - Association rule
mining provides interpretable, actionable clinical insights -
Data-driven approaches complement traditional epidemiological
methods
Final
Perspective
The discovery of unexpected patterns—particularly the young adult
dental crisis—highlights the value of hypothesis-free exploratory
analysis. As healthcare systems increasingly adopt electronic health
records and population-level databases, association rule mining and
related unsupervised methods will play growing roles in:
- Pattern discovery: Identifying hidden disease
associations
- Quality improvement: Benchmarking care patterns
against population norms
- Predictive modeling: Forecasting future treatment
needs
- Personalized medicine: Tailoring interventions to
individual risk profiles
The age-specific treatment patterns discovered here provide a
data-driven foundation for developing personalized,
lifecycle-appropriate dental care strategies—ultimately improving
population oral health outcomes while optimizing resource
allocation.
References
Note: This is a template section. Add your actual
references here.
National Center for Health Statistics. National Health and
Nutrition Examination Survey. Centers for Disease Control and
Prevention. Available at: https://www.cdc.gov/nchs/nhanes/
Agrawal R, Srikant R. Fast algorithms for mining association
rules. Proceedings of the 20th International Conference on Very Large
Data Bases. 1994;487-499.
Selwitz RH, Ismail AI, Pitts NB. Dental caries. Lancet.
2007;369(9555):51-59.
Kassebaum NJ, et al. Global burden of untreated caries: a
systematic review and metaregression. J Dent Res.
2015;94(5):650-658.
Dye BA, Thornton-Evans G, Li X, Iafolla TJ. Dental caries and
tooth loss in adults in the United States, 2011-2012. NCHS Data Brief.
2015;(197):1-8.
Acknowledgments: This research utilized simulated
data based on NHANES structure. We acknowledge the Centers for Disease
Control and Prevention for making NHANES publicly available.