dietaryindex R PackageWelcome, students! In public health nutrition and dietetics, understanding dietary intake is fundamental. While analyzing individual nutrients or food items is valuable, a growing body of evidence highlights the importance of studying dietary patterns. Dietary patterns reflect the overall eating habits and combinations of foods consumed, providing a more holistic view of diet-disease relationships.
However, calculating complex dietary indices (like HEI, DASH, aMED) can be time-consuming, prone to error, and challenging to standardize across studies. This is where computational tools become invaluable.
Today, we will introduce you to dietaryindex, a powerful
and user-friendly R package designed to streamline and standardize the
compilation of dietary intake data into various index-based dietary
patterns. This package has been peer-reviewed and published in the
American Journal of Clinical Nutrition, ensuring its scientific rigor
and reliability.
dietaryindex R PackageThe dietaryindex package is an R tool developed to
simplify and standardize the process of calculating various dietary
pattern indices. It aims to: * Provide user-friendly, streamlined
methods. * Enable consistent assessment of adherence to dietary patterns
in epidemiologic and clinical studies. * Reduce manual calculation
errors and improve research efficiency.
dietaryindex Works (Conceptual Overview)The package performs calculations in two main steps:
dietaryindex Calculate?The dietaryindex package is versatile and can calculate
a wide range of commonly used dietary pattern indices, including:
dietaryindex in RSince dietaryindex is not yet on CRAN (the official R
package repository), you need to install it directly from its GitHub
repository using the devtools package.
Step 1: Install devtools (if you don’t have
it)
# Check if devtools is already installed. If not, install it.
if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}
# Load the devtools package
library(devtools)
Step 2: Install dietaryindex from
GitHub
# Install the dietaryindex package from its GitHub repository
devtools::install_github("jamesjiadazhan/dietaryindex")
Troubleshooting Installation: If you encounter a
prompt asking to update packages (e.g., “These packages have more recent
versions available…”), it’s generally recommended to: 1. Try entering
1 to update all packages. 2. If that doesn’t work, try
entering 2 to update CRAN packages only. This process might
take some time, especially if you are a new R user with many outdated
packages.
Step 3: Verify the Installation
After the installation process completes without errors, you can verify that the package is installed and ready to use by loading it:
# Load the dietaryindex package
library(dietaryindex)
# View the help documentation for the package
help(package = "dietaryindex")
dietaryindex with R CodeOnce installed, the general workflow involves loading the package, preparing your dietary intake data, calculating serving sizes, and then computing the desired dietary index.
Important Note on Data Input: The
dietaryindex package is designed to be flexible, working
with various dietary assessment tools (FFQs, 24-hour recalls, food
records). However, the quality of your output heavily depends on the
quality and format of your input data. You will need to ensure your data
is structured appropriately, with clear identification of food items,
amounts, and relevant nutrient information.
The package developers provide two crucial Excel files that detail
the definitions and scoring: *
dietaryindex_SERVING_SIZE_DEFINITION.xlsx *
dietaryindex_SCORING_ALGORITHM.xlsx These files are
essential for understanding how the package interprets and scores
dietary components.
In this section, we will walk through a practical example. Since we
don’t have actual raw dietary intake data for this lecture, we will
simulate a dataset that represents the output of the
dietaryindex package’s first step (serving size
calculation). This allows us to focus on the subsequent steps
of index calculation and visualization, which are the core contributions
of the dietaryindex package.
Disclaimer: The HEI2015 scoring logic implemented
here is a simplified, conceptual representation for demonstration
purposes. In a real scenario, the dietaryindex package’s
dedicated functions would handle the precise scoring according to the
official HEI2015 guidelines. You would use the package’s functions
directly, rather than manually implementing the scoring.
Let’s imagine we have already used the dietaryindex
package (or a similar process) to convert raw food intake into
standardized servings for each HEI2015 component, along with total
energy intake. We’ll also include some demographic data.
# For reproducibility
set.seed(42)
n_individuals <- 100
# Simulate demographic data
demographics <- data.frame(
ID = 1:n_individuals,
AgeGroup = sample(c("18-30", "31-50", "51-70", "70+"), n_individuals, replace = TRUE, prob = c(0.25, 0.35, 0.25, 0.15)),
Sex = sample(c("Male", "Female"), n_individuals, replace = TRUE, prob = c(0.48, 0.52))
)
# Simulate HEI2015 component servings/amounts
# These values are illustrative and represent typical ranges.
simulated_servings_data <- data.frame(
ID = 1:n_individuals,
TotalEnergy_kcal = round(rnorm(n_individuals, mean = 2000, sd = 400), 0), # Total energy intake (kcal)
TotalFruits_serv = pmax(0, rnorm(n_individuals, 1.5, 0.8)), # Total Fruits (cups)
WholeFruits_serv = pmax(0, rnorm(n_individuals, 0.8, 0.5)), # Whole Fruits (cups)
TotalVegetables_serv = pmax(0, rnorm(n_individuals, 2.0, 1.0)), # Total Vegetables (cups)
GreensBeans_serv = pmax(0, rnorm(n_individuals, 0.3, 0.2)), # Greens & Beans (cups)
WholeGrains_serv = pmax(0, rnorm(n_individuals, 3.0, 1.5)), # Whole Grains (oz eq)
Dairy_serv = pmax(0, rnorm(n_individuals, 2.0, 1.0)), # Dairy (cup eq)
TotalProteinFoods_serv = pmax(0, rnorm(n_individuals, 5.5, 2.0)), # Total Protein Foods (oz eq)
SeafoodPlantProteins_serv = pmax(0, rnorm(n_individuals, 1.5, 1.0)), # Seafood & Plant Proteins (oz eq)
FattyAcids_ratio = pmax(0, pmin(10, rnorm(n_individuals, 2.5, 1.5))), # Ratio of Unsaturated to Saturated Fatty Acids
RefinedGrains_oz = pmax(0, rnorm(n_individuals, 4.0, 2.0)), # Refined Grains (oz eq)
Sodium_mg = pmax(0, rnorm(n_individuals, 3500, 1000)), # Sodium (mg)
AddedSugar_g = pmax(0, rnorm(n_individuals, 50, 25)), # Added Sugars (g)
SaturatedFat_g = pmax(0, rnorm(n_individuals, 25, 10)), # Saturated Fat (g)
TotalFat_g = pmax(0, rnorm(n_individuals, 70, 20)) # Total Fat (g)
) %>%
# Ensure some values are within realistic HEI ranges
mutate(
TotalFruits_serv = pmin(TotalFruits_serv, 5),
TotalVegetables_serv = pmin(TotalVegetables_serv, 5),
WholeGrains_serv = pmin(WholeGrains_serv, 10),
Dairy_serv = pmin(Dairy_serv, 5),
TotalProteinFoods_serv = pmin(TotalProteinFoods_serv, 15),
SeafoodPlantProteins_serv = pmin(SeafoodPlantProteins_serv, 10),
RefinedGrains_oz = pmin(RefinedGrains_oz, 15)
)
# Combine demographics with serving data
diet_data_processed <- left_join(demographics, simulated_servings_data, by = "ID")
# Display a sample of the simulated data
print("Simulated Pre-processed Dietary Data (Servings/Amounts per day):")
## [1] "Simulated Pre-processed Dietary Data (Servings/Amounts per day):"
head(diet_data_processed)
## ID AgeGroup Sex TotalEnergy_kcal TotalFruits_serv WholeFruits_serv
## 1 1 70+ Male 2480 0.0000000 0.7976896
## 2 2 70+ Female 2418 1.7670218 1.1801211
## 3 3 31-50 Female 1599 2.4370601 0.8194955
## 4 4 18-30 Female 2739 3.1476314 1.1675361
## 5 5 18-30 Male 1733 0.3985107 0.7267637
## 6 6 51-70 Male 2042 0.5793155 0.7710563
## TotalVegetables_serv GreensBeans_serv WholeGrains_serv Dairy_serv
## 1 3.334913 0.5058281 2.627276 2.2946924
## 2 1.130728 0.4829550 3.633481 2.3927413
## 3 2.055487 0.2995087 4.481480 0.9991563
## 4 2.049067 0.3272019 4.253352 1.6742729
## 5 1.421644 0.1559693 2.009217 0.9916512
## 6 1.001261 0.2603751 5.346104 1.3645685
## TotalProteinFoods_serv SeafoodPlantProteins_serv FattyAcids_ratio
## 1 6.877616 2.441924 5.987588
## 2 6.950166 1.251386 3.286183
## 3 5.934760 1.596479 3.956100
## 4 5.096687 1.066069 3.065460
## 5 2.768620 3.678668 1.006100
## 6 4.882125 0.000000 1.603776
## RefinedGrains_oz Sodium_mg AddedSugar_g SaturatedFat_g TotalFat_g
## 1 2.699431 2753.484 20.36133 33.772947 57.97234
## 2 1.993634 3536.606 33.54027 7.266286 67.28368
## 3 2.929772 3823.310 77.23770 24.543127 50.25454
## 4 3.779171 3879.676 62.71965 21.051277 86.63850
## 5 5.200860 4376.556 46.60233 23.719437 54.09881
## 6 4.831690 4433.388 47.28043 35.962377 76.80929
Now, let’s conceptually apply the HEI2015 scoring rules. Remember,
the dietaryindex package would have a function (e.g.,
calculate_hei2015()) that takes this kind of data and
performs these calculations automatically. We are doing it manually here
to illustrate the underlying logic.
HEI2015 has 13 components, 9 adequacy components (higher intake = higher score) and 4 moderation components (lower intake = higher score). Each component is scored from 0 to 5 or 0 to 10.
# Define HEI2015 scoring parameters (simplified for demonstration)
# These are based on HEI2015 guidelines, but simplified for this example.
# The actual dietaryindex package would have these built-in.
hei_scores <- diet_data_processed %>%
mutate(
# Adequacy Components (score 0-5 or 0-10)
# All values are per 1000 kcal, so we need to adjust our simulated absolute values
# For simplicity, we'll use absolute values for this demo, but note the real HEI is density-based.
# The dietaryindex package handles this density calculation.
# 1. Total Fruits (5 points, target >= 0.8 cups/1000 kcal)
# For demo, let's assume a target of 1.6 cups/day for 2000 kcal
Score_TotalFruits = pmin(TotalFruits_serv / 1.6, 1) * 5,
# 2. Whole Fruits (5 points, target >= 0.4 cups/1000 kcal)
# For demo, let's assume a target of 0.8 cups/day for 2000 kcal
Score_WholeFruits = pmin(WholeFruits_serv / 0.8, 1) * 5,
# 3. Total Vegetables (5 points, target >= 1.1 cups/1000 kcal)
# For demo, let's assume a target of 2.2 cups/day for 2000 kcal
Score_TotalVegetables = pmin(TotalVegetables_serv / 2.2, 1) * 5,
# 4. Greens and Beans (5 points, target >= 0.2 cups/1000 kcal)
# For demo, let's assume a target of 0.4 cups/day for 2000 kcal
Score_GreensBeans = pmin(GreensBeans_serv / 0.4, 1) * 5,
# 5. Whole Grains (10 points, target >= 1.5 oz eq/1000 kcal)
# For demo, let's assume a target of 3.0 oz eq/day for 2000 kcal
Score_WholeGrains = pmin(WholeGrains_serv / 3.0, 1) * 10,
# 6. Dairy (10 points, target >= 1.3 cup eq/1000 kcal)
# For demo, let's assume a target of 2.6 cup eq/day for 2000 kcal
Score_Dairy = pmin(Dairy_serv / 2.6, 1) * 10,
# 7. Total Protein Foods (5 points, target >= 1.3 oz eq/1000 kcal)
# For demo, let's assume a target of 2.6 oz eq/day for 2000 kcal
Score_TotalProteinFoods = pmin(TotalProteinFoods_serv / 2.6, 1) * 5,
# 8. Seafood and Plant Proteins (5 points, target >= 0.4 oz eq/1000 kcal)
# For demo, let's assume a target of 0.8 oz eq/day for 2000 kcal
Score_SeafoodPlantProteins = pmin(SeafoodPlantProteins_serv / 0.8, 1) * 5,
# 9. Fatty Acids (10 points, ratio of unsaturated to saturated fat, target >= 2.5)
Score_FattyAcids = pmin(FattyAcids_ratio / 2.5, 1) * 10,
# Moderation Components (score 0-10, lower intake = higher score)
# For these, 0 points at max intake, 10 points at min intake.
# We'll use a linear scale for demonstration.
# 10. Refined Grains (10 points, target <= 2.0 oz eq/1000 kcal)
# For demo, let's assume a max of 4.0 oz eq/day for 2000 kcal (0 points)
# and a min of 0 oz eq/day (10 points)
Score_RefinedGrains = pmax(0, 10 - (RefinedGrains_oz / 4.0) * 10),
# 11. Sodium (10 points, target <= 1.1 mg/kcal, or 2300 mg/day for 2000 kcal)
# For demo, let's assume a max of 4600 mg/day (0 points) and a min of 0 mg/day (10 points)
Score_Sodium = pmax(0, 10 - (Sodium_mg / 4600) * 10),
# 12. Added Sugars (10 points, target <= 0.05 of total kcal, or 25g for 2000 kcal)
# For demo, let's assume a max of 50g/day (0 points) and a min of 0g/day (10 points)
Score_AddedSugar = pmax(0, 10 - (AddedSugar_g / 50) * 10),
# 13. Saturated Fats (10 points, target <= 0.10 of total kcal, or 22g for 2000 kcal)
# For demo, let's assume a max of 44g/day (0 points) and a min of 0g/day (10 points)
Score_SaturatedFat = pmax(0, 10 - (SaturatedFat_g / 44) * 10)
) %>%
rowwise() %>%
mutate(
Total_HEI2015_Score = sum(
Score_TotalFruits, Score_WholeFruits, Score_TotalVegetables, Score_GreensBeans,
Score_WholeGrains, Score_Dairy, Score_TotalProteinFoods, Score_SeafoodPlantProteins,
Score_FattyAcids, Score_RefinedGrains, Score_Sodium, Score_AddedSugar,
Score_SaturatedFat, na.rm = TRUE
)
) %>%
ungroup()
# Display the final scores
print("Simulated HEI2015 Scores per Individual:")
## [1] "Simulated HEI2015 Scores per Individual:"
head(hei_scores %>% select(ID, AgeGroup, Sex, Total_HEI2015_Score))
## # A tibble: 6 × 4
## ID AgeGroup Sex Total_HEI2015_Score
## <int> <chr> <chr> <dbl>
## 1 1 70+ Male 68.1
## 2 2 70+ Female 75.7
## 3 3 31-50 Female 61.0
## 4 4 18-30 Female 62.5
## 5 5 18-30 Male 41.3
## 6 6 51-70 Male 41.6
Now let’s create some plots to visualize our simulated HEI2015 scores.
A histogram helps us understand the overall distribution of dietary quality in our simulated population.
ggplot(hei_scores, aes(x = Total_HEI2015_Score)) +
geom_histogram(binwidth = 5, fill = "steelblue", color = "black", alpha = 0.7) +
geom_vline(aes(xintercept = mean(Total_HEI2015_Score)), color = "red", linetype = "dashed", size = 1) +
labs(
title = "Distribution of Simulated Total HEI2015 Scores",
x = "Total HEI2015 Score (Max 100)",
y = "Number of Individuals"
) +
theme_minimal() +
annotate("text", x = mean(hei_scores$Total_HEI2015_Score) + 10, y = 15,
label = paste("Mean =", round(mean(hei_scores$Total_HEI2015_Score), 2)),
color = "red")
Figure 1: Distribution of Simulated Total HEI2015 Scores
Figure 1 shows that our simulated population has a range of HEI2015 scores, with a mean score around 56.71. This indicates varying levels of adherence to healthy eating guidelines.
We can explore how dietary quality might differ across demographic characteristics.
ggplot(hei_scores, aes(x = Sex, y = Total_HEI2015_Score, fill = Sex)) +
geom_boxplot(alpha = 0.7) +
labs(
title = "Simulated Total HEI2015 Scores by Sex",
x = "Sex",
y = "Total HEI2015 Score"
) +
theme_minimal() +
scale_fill_brewer(palette = "Pastel1")
Figure 2: Simulated Total HEI2015 Scores by Sex
Figure 2 illustrates the distribution of HEI2015 scores for males and females in our simulated data. We can observe if there are any apparent differences in median scores or variability between the groups.
ggplot(hei_scores, aes(x = AgeGroup, y = Total_HEI2015_Score, fill = AgeGroup)) +
geom_boxplot(alpha = 0.7) +
labs(
title = "Simulated Total HEI2015 Scores by Age Group",
x = "Age Group",
y = "Total HEI2015 Score"
) +
theme_minimal() +
scale_fill_brewer(palette = "Set3")
Figure 3: Simulated Total HEI2015 Scores by Age Group
Figure 3 presents the HEI2015 scores across different age groups. Such visualizations can help identify specific demographic segments that might benefit from targeted nutritional interventions.
Understanding which components contribute most or least to the total score can provide insights into specific dietary strengths and weaknesses.
# Select only the score columns and pivot longer
component_scores_long <- hei_scores %>%
select(ID, starts_with("Score_")) %>%
pivot_longer(cols = starts_with("Score_"), names_to = "Component", values_to = "Score") %>%
mutate(Component = gsub("Score_", "", Component)) # Clean up component names
# Calculate average score per component
average_component_scores <- component_scores_long %>%
group_by(Component) %>%
summarise(MeanScore = mean(Score, na.rm = TRUE)) %>%
arrange(desc(MeanScore))
ggplot(average_component_scores, aes(x = reorder(Component, MeanScore), y = MeanScore, fill = MeanScore)) +
geom_bar(stat = "identity") +
coord_flip() + # Flip coordinates for better readability of component names
labs(
title = "Average Simulated HEI2015 Component Scores",
x = "HEI2015 Component",
y = "Average Score (Max 5 or 10)"
) +
theme_minimal() +
scale_fill_gradient(low = "lightcoral", high = "darkgreen")
Figure 4: Average Simulated HEI2015 Component Scores
Figure 4 displays the average scores for each HEI2015 component. Components with lower average scores (e.g., RefinedGrains, AddedSugar, SaturatedFat in our simulated data) indicate areas where dietary intake in the population could be improved. This type of analysis is crucial for developing targeted public health nutrition strategies.
The dietaryindex package has undergone thorough
validation to ensure its accuracy and reliability. * It has been
compared against manually computed dietary index results using
simulation datasets. * For HEI2015, results were compared with
SAS-calculated results from the National Cancer Institute using NHANES
2017-2018 data, ASA24 example data, and DHQ3 example data. * The HEI2015
results showed a 100% match with SAS results after
rounding to two decimal places. This rigorous validation process gives
confidence in the package’s outputs.
While dietaryindex is a powerful tool, it’s crucial to
understand its limitations and follow best practices:
dietaryindex_SERVING_SIZE_DEFINITION.xlsx and
dietaryindex_SCORING_ALGORITHM.xlsx files). Understand what
each component means and how it’s scored.dietaryindex in
your research, please cite the original work:
The dietaryindex R package is an invaluable tool for MPH
students and researchers in Nutrition and Dietetics. It simplifies the
complex process of calculating dietary pattern indices, promotes
standardization, and enhances the efficiency and reliability of your
research. By understanding its capabilities, workflow, and limitations,
you can effectively leverage this package to advance your studies on
diet and health outcomes.
Please feel free to ask any questions you may have. We can also discuss potential applications of this package in your current or future research projects. ```