1 1. Introduction: The Importance of Dietary Patterns in Nutrition Research

Welcome, students! In public health nutrition and dietetics, understanding dietary intake is fundamental. While analyzing individual nutrients or food items is valuable, a growing body of evidence highlights the importance of studying dietary patterns. Dietary patterns reflect the overall eating habits and combinations of foods consumed, providing a more holistic view of diet-disease relationships.

However, calculating complex dietary indices (like HEI, DASH, aMED) can be time-consuming, prone to error, and challenging to standardize across studies. This is where computational tools become invaluable.

Today, we will introduce you to dietaryindex, a powerful and user-friendly R package designed to streamline and standardize the compilation of dietary intake data into various index-based dietary patterns. This package has been peer-reviewed and published in the American Journal of Clinical Nutrition, ensuring its scientific rigor and reliability.

2 2. Why Study Dietary Patterns?

  • Holistic View: Captures the synergistic and antagonistic effects of various food components.
  • Real-world Relevance: Reflects how people actually eat, rather than isolated nutrients.
  • Predictive Power: Often better predictors of chronic disease risk than single nutrients.
  • Public Health Messaging: Easier to translate into actionable dietary guidelines for the public.

3 3. Introducing the dietaryindex R Package

The dietaryindex package is an R tool developed to simplify and standardize the process of calculating various dietary pattern indices. It aims to: * Provide user-friendly, streamlined methods. * Enable consistent assessment of adherence to dietary patterns in epidemiologic and clinical studies. * Reduce manual calculation errors and improve research efficiency.

4 4. How dietaryindex Works (Conceptual Overview)

The package performs calculations in two main steps:

  1. Computation of the serving size of each food and nutrient category: This step translates raw dietary intake data (e.g., grams of food, nutrient amounts) into standardized serving sizes relevant to the specific dietary index.
  2. Computation of the individual dietary index: Using the calculated serving size information, the package then applies the specific scoring algorithm for the chosen dietary index to derive a total score for each individual.

5 5. What Dietary Indices Can dietaryindex Calculate?

The dietaryindex package is versatile and can calculate a wide range of commonly used dietary pattern indices, including:

  • Healthy Eating Index 2020 (HEI2020 & HEI-Toddlers-2020)
  • Healthy Eating Index 2015 (HEI2015)
  • Alternative Healthy Eating Index (AHEI)
  • Dietary Approaches to Stop Hypertension Index (DASH)
  • DASH Index in serving sizes from the DASH trial (DASHI)
  • Alternate Mediterranean Diet Score (aMED)
  • MED Index in serving sizes from the PREDIMED trial (MEDI)
  • Dietary Inflammation Index (DII)
  • American Cancer Society 2020 diet score (ACS2020_V1 and ACS2020_V2)
  • Planetary Health Diet Index from the EAT-Lancet Commission (PHDI)

6 6. Installation of dietaryindex in R

Since dietaryindex is not yet on CRAN (the official R package repository), you need to install it directly from its GitHub repository using the devtools package.

Step 1: Install devtools (if you don’t have it)

# Check if devtools is already installed. If not, install it.
if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools")
}

# Load the devtools package
library(devtools)

Step 2: Install dietaryindex from GitHub

# Install the dietaryindex package from its GitHub repository
devtools::install_github("jamesjiadazhan/dietaryindex")

Troubleshooting Installation: If you encounter a prompt asking to update packages (e.g., “These packages have more recent versions available…”), it’s generally recommended to: 1. Try entering 1 to update all packages. 2. If that doesn’t work, try entering 2 to update CRAN packages only. This process might take some time, especially if you are a new R user with many outdated packages.

Step 3: Verify the Installation

After the installation process completes without errors, you can verify that the package is installed and ready to use by loading it:

# Load the dietaryindex package
library(dietaryindex)

# View the help documentation for the package
help(package = "dietaryindex")

7 7. Core Workflow: Using dietaryindex with R Code

Once installed, the general workflow involves loading the package, preparing your dietary intake data, calculating serving sizes, and then computing the desired dietary index.

Important Note on Data Input: The dietaryindex package is designed to be flexible, working with various dietary assessment tools (FFQs, 24-hour recalls, food records). However, the quality of your output heavily depends on the quality and format of your input data. You will need to ensure your data is structured appropriately, with clear identification of food items, amounts, and relevant nutrient information.

The package developers provide two crucial Excel files that detail the definitions and scoring: * dietaryindex_SERVING_SIZE_DEFINITION.xlsx * dietaryindex_SCORING_ALGORITHM.xlsx These files are essential for understanding how the package interprets and scores dietary components.


8 8. Practical Example: Calculating a Simulated Healthy Eating Index 2015 (HEI2015)

In this section, we will walk through a practical example. Since we don’t have actual raw dietary intake data for this lecture, we will simulate a dataset that represents the output of the dietaryindex package’s first step (serving size calculation). This allows us to focus on the subsequent steps of index calculation and visualization, which are the core contributions of the dietaryindex package.

Disclaimer: The HEI2015 scoring logic implemented here is a simplified, conceptual representation for demonstration purposes. In a real scenario, the dietaryindex package’s dedicated functions would handle the precise scoring according to the official HEI2015 guidelines. You would use the package’s functions directly, rather than manually implementing the scoring.

8.0.1 8.1. Simulate “Pre-processed” Serving Size Data

Let’s imagine we have already used the dietaryindex package (or a similar process) to convert raw food intake into standardized servings for each HEI2015 component, along with total energy intake. We’ll also include some demographic data.

# For reproducibility
set.seed(42)

n_individuals <- 100

# Simulate demographic data
demographics <- data.frame(
  ID = 1:n_individuals,
  AgeGroup = sample(c("18-30", "31-50", "51-70", "70+"), n_individuals, replace = TRUE, prob = c(0.25, 0.35, 0.25, 0.15)),
  Sex = sample(c("Male", "Female"), n_individuals, replace = TRUE, prob = c(0.48, 0.52))
)

# Simulate HEI2015 component servings/amounts
# These values are illustrative and represent typical ranges.
simulated_servings_data <- data.frame(
  ID = 1:n_individuals,
  TotalEnergy_kcal = round(rnorm(n_individuals, mean = 2000, sd = 400), 0), # Total energy intake (kcal)
  TotalFruits_serv = pmax(0, rnorm(n_individuals, 1.5, 0.8)), # Total Fruits (cups)
  WholeFruits_serv = pmax(0, rnorm(n_individuals, 0.8, 0.5)), # Whole Fruits (cups)
  TotalVegetables_serv = pmax(0, rnorm(n_individuals, 2.0, 1.0)), # Total Vegetables (cups)
  GreensBeans_serv = pmax(0, rnorm(n_individuals, 0.3, 0.2)), # Greens & Beans (cups)
  WholeGrains_serv = pmax(0, rnorm(n_individuals, 3.0, 1.5)), # Whole Grains (oz eq)
  Dairy_serv = pmax(0, rnorm(n_individuals, 2.0, 1.0)), # Dairy (cup eq)
  TotalProteinFoods_serv = pmax(0, rnorm(n_individuals, 5.5, 2.0)), # Total Protein Foods (oz eq)
  SeafoodPlantProteins_serv = pmax(0, rnorm(n_individuals, 1.5, 1.0)), # Seafood & Plant Proteins (oz eq)
  FattyAcids_ratio = pmax(0, pmin(10, rnorm(n_individuals, 2.5, 1.5))), # Ratio of Unsaturated to Saturated Fatty Acids
  RefinedGrains_oz = pmax(0, rnorm(n_individuals, 4.0, 2.0)), # Refined Grains (oz eq)
  Sodium_mg = pmax(0, rnorm(n_individuals, 3500, 1000)), # Sodium (mg)
  AddedSugar_g = pmax(0, rnorm(n_individuals, 50, 25)), # Added Sugars (g)
  SaturatedFat_g = pmax(0, rnorm(n_individuals, 25, 10)), # Saturated Fat (g)
  TotalFat_g = pmax(0, rnorm(n_individuals, 70, 20)) # Total Fat (g)
) %>%
  # Ensure some values are within realistic HEI ranges
  mutate(
    TotalFruits_serv = pmin(TotalFruits_serv, 5),
    TotalVegetables_serv = pmin(TotalVegetables_serv, 5),
    WholeGrains_serv = pmin(WholeGrains_serv, 10),
    Dairy_serv = pmin(Dairy_serv, 5),
    TotalProteinFoods_serv = pmin(TotalProteinFoods_serv, 15),
    SeafoodPlantProteins_serv = pmin(SeafoodPlantProteins_serv, 10),
    RefinedGrains_oz = pmin(RefinedGrains_oz, 15)
  )

# Combine demographics with serving data
diet_data_processed <- left_join(demographics, simulated_servings_data, by = "ID")

# Display a sample of the simulated data
print("Simulated Pre-processed Dietary Data (Servings/Amounts per day):")
## [1] "Simulated Pre-processed Dietary Data (Servings/Amounts per day):"
head(diet_data_processed)
##   ID AgeGroup    Sex TotalEnergy_kcal TotalFruits_serv WholeFruits_serv
## 1  1      70+   Male             2480        0.0000000        0.7976896
## 2  2      70+ Female             2418        1.7670218        1.1801211
## 3  3    31-50 Female             1599        2.4370601        0.8194955
## 4  4    18-30 Female             2739        3.1476314        1.1675361
## 5  5    18-30   Male             1733        0.3985107        0.7267637
## 6  6    51-70   Male             2042        0.5793155        0.7710563
##   TotalVegetables_serv GreensBeans_serv WholeGrains_serv Dairy_serv
## 1             3.334913        0.5058281         2.627276  2.2946924
## 2             1.130728        0.4829550         3.633481  2.3927413
## 3             2.055487        0.2995087         4.481480  0.9991563
## 4             2.049067        0.3272019         4.253352  1.6742729
## 5             1.421644        0.1559693         2.009217  0.9916512
## 6             1.001261        0.2603751         5.346104  1.3645685
##   TotalProteinFoods_serv SeafoodPlantProteins_serv FattyAcids_ratio
## 1               6.877616                  2.441924         5.987588
## 2               6.950166                  1.251386         3.286183
## 3               5.934760                  1.596479         3.956100
## 4               5.096687                  1.066069         3.065460
## 5               2.768620                  3.678668         1.006100
## 6               4.882125                  0.000000         1.603776
##   RefinedGrains_oz Sodium_mg AddedSugar_g SaturatedFat_g TotalFat_g
## 1         2.699431  2753.484     20.36133      33.772947   57.97234
## 2         1.993634  3536.606     33.54027       7.266286   67.28368
## 3         2.929772  3823.310     77.23770      24.543127   50.25454
## 4         3.779171  3879.676     62.71965      21.051277   86.63850
## 5         5.200860  4376.556     46.60233      23.719437   54.09881
## 6         4.831690  4433.388     47.28043      35.962377   76.80929

8.0.2 8.2. Conceptual HEI2015 Scoring

Now, let’s conceptually apply the HEI2015 scoring rules. Remember, the dietaryindex package would have a function (e.g., calculate_hei2015()) that takes this kind of data and performs these calculations automatically. We are doing it manually here to illustrate the underlying logic.

HEI2015 has 13 components, 9 adequacy components (higher intake = higher score) and 4 moderation components (lower intake = higher score). Each component is scored from 0 to 5 or 0 to 10.

# Define HEI2015 scoring parameters (simplified for demonstration)
# These are based on HEI2015 guidelines, but simplified for this example.
# The actual dietaryindex package would have these built-in.

hei_scores <- diet_data_processed %>%
  mutate(
    # Adequacy Components (score 0-5 or 0-10)
    # All values are per 1000 kcal, so we need to adjust our simulated absolute values
    # For simplicity, we'll use absolute values for this demo, but note the real HEI is density-based.
    # The dietaryindex package handles this density calculation.

    # 1. Total Fruits (5 points, target >= 0.8 cups/1000 kcal)
    # For demo, let's assume a target of 1.6 cups/day for 2000 kcal
    Score_TotalFruits = pmin(TotalFruits_serv / 1.6, 1) * 5,

    # 2. Whole Fruits (5 points, target >= 0.4 cups/1000 kcal)
    # For demo, let's assume a target of 0.8 cups/day for 2000 kcal
    Score_WholeFruits = pmin(WholeFruits_serv / 0.8, 1) * 5,

    # 3. Total Vegetables (5 points, target >= 1.1 cups/1000 kcal)
    # For demo, let's assume a target of 2.2 cups/day for 2000 kcal
    Score_TotalVegetables = pmin(TotalVegetables_serv / 2.2, 1) * 5,

    # 4. Greens and Beans (5 points, target >= 0.2 cups/1000 kcal)
    # For demo, let's assume a target of 0.4 cups/day for 2000 kcal
    Score_GreensBeans = pmin(GreensBeans_serv / 0.4, 1) * 5,

    # 5. Whole Grains (10 points, target >= 1.5 oz eq/1000 kcal)
    # For demo, let's assume a target of 3.0 oz eq/day for 2000 kcal
    Score_WholeGrains = pmin(WholeGrains_serv / 3.0, 1) * 10,

    # 6. Dairy (10 points, target >= 1.3 cup eq/1000 kcal)
    # For demo, let's assume a target of 2.6 cup eq/day for 2000 kcal
    Score_Dairy = pmin(Dairy_serv / 2.6, 1) * 10,

    # 7. Total Protein Foods (5 points, target >= 1.3 oz eq/1000 kcal)
    # For demo, let's assume a target of 2.6 oz eq/day for 2000 kcal
    Score_TotalProteinFoods = pmin(TotalProteinFoods_serv / 2.6, 1) * 5,

    # 8. Seafood and Plant Proteins (5 points, target >= 0.4 oz eq/1000 kcal)
    # For demo, let's assume a target of 0.8 oz eq/day for 2000 kcal
    Score_SeafoodPlantProteins = pmin(SeafoodPlantProteins_serv / 0.8, 1) * 5,

    # 9. Fatty Acids (10 points, ratio of unsaturated to saturated fat, target >= 2.5)
    Score_FattyAcids = pmin(FattyAcids_ratio / 2.5, 1) * 10,

    # Moderation Components (score 0-10, lower intake = higher score)
    # For these, 0 points at max intake, 10 points at min intake.
    # We'll use a linear scale for demonstration.

    # 10. Refined Grains (10 points, target <= 2.0 oz eq/1000 kcal)
    # For demo, let's assume a max of 4.0 oz eq/day for 2000 kcal (0 points)
    # and a min of 0 oz eq/day (10 points)
    Score_RefinedGrains = pmax(0, 10 - (RefinedGrains_oz / 4.0) * 10),

    # 11. Sodium (10 points, target <= 1.1 mg/kcal, or 2300 mg/day for 2000 kcal)
    # For demo, let's assume a max of 4600 mg/day (0 points) and a min of 0 mg/day (10 points)
    Score_Sodium = pmax(0, 10 - (Sodium_mg / 4600) * 10),

    # 12. Added Sugars (10 points, target <= 0.05 of total kcal, or 25g for 2000 kcal)
    # For demo, let's assume a max of 50g/day (0 points) and a min of 0g/day (10 points)
    Score_AddedSugar = pmax(0, 10 - (AddedSugar_g / 50) * 10),

    # 13. Saturated Fats (10 points, target <= 0.10 of total kcal, or 22g for 2000 kcal)
    # For demo, let's assume a max of 44g/day (0 points) and a min of 0g/day (10 points)
    Score_SaturatedFat = pmax(0, 10 - (SaturatedFat_g / 44) * 10)
  ) %>%
  rowwise() %>%
  mutate(
    Total_HEI2015_Score = sum(
      Score_TotalFruits, Score_WholeFruits, Score_TotalVegetables, Score_GreensBeans,
      Score_WholeGrains, Score_Dairy, Score_TotalProteinFoods, Score_SeafoodPlantProteins,
      Score_FattyAcids, Score_RefinedGrains, Score_Sodium, Score_AddedSugar,
      Score_SaturatedFat, na.rm = TRUE
    )
  ) %>%
  ungroup()

# Display the final scores
print("Simulated HEI2015 Scores per Individual:")
## [1] "Simulated HEI2015 Scores per Individual:"
head(hei_scores %>% select(ID, AgeGroup, Sex, Total_HEI2015_Score))
## # A tibble: 6 × 4
##      ID AgeGroup Sex    Total_HEI2015_Score
##   <int> <chr>    <chr>                <dbl>
## 1     1 70+      Male                  68.1
## 2     2 70+      Female                75.7
## 3     3 31-50    Female                61.0
## 4     4 18-30    Female                62.5
## 5     5 18-30    Male                  41.3
## 6     6 51-70    Male                  41.6

8.0.3 8.3. Visualizing the Results

Now let’s create some plots to visualize our simulated HEI2015 scores.

8.0.3.1 8.3.1. Distribution of Total HEI2015 Scores

A histogram helps us understand the overall distribution of dietary quality in our simulated population.

ggplot(hei_scores, aes(x = Total_HEI2015_Score)) +
  geom_histogram(binwidth = 5, fill = "steelblue", color = "black", alpha = 0.7) +
  geom_vline(aes(xintercept = mean(Total_HEI2015_Score)), color = "red", linetype = "dashed", size = 1) +
  labs(
    title = "Distribution of Simulated Total HEI2015 Scores",
    x = "Total HEI2015 Score (Max 100)",
    y = "Number of Individuals"
  ) +
  theme_minimal() +
  annotate("text", x = mean(hei_scores$Total_HEI2015_Score) + 10, y = 15,
           label = paste("Mean =", round(mean(hei_scores$Total_HEI2015_Score), 2)),
           color = "red")
Figure 1: Distribution of Simulated Total HEI2015 Scores

Figure 1: Distribution of Simulated Total HEI2015 Scores

Figure 1 shows that our simulated population has a range of HEI2015 scores, with a mean score around 56.71. This indicates varying levels of adherence to healthy eating guidelines.

8.0.3.2 8.3.2. HEI2015 Scores by Demographic Groups (Sex and Age Group)

We can explore how dietary quality might differ across demographic characteristics.

ggplot(hei_scores, aes(x = Sex, y = Total_HEI2015_Score, fill = Sex)) +
  geom_boxplot(alpha = 0.7) +
  labs(
    title = "Simulated Total HEI2015 Scores by Sex",
    x = "Sex",
    y = "Total HEI2015 Score"
  ) +
  theme_minimal() +
  scale_fill_brewer(palette = "Pastel1")
Figure 2: Simulated Total HEI2015 Scores by Sex

Figure 2: Simulated Total HEI2015 Scores by Sex

Figure 2 illustrates the distribution of HEI2015 scores for males and females in our simulated data. We can observe if there are any apparent differences in median scores or variability between the groups.

ggplot(hei_scores, aes(x = AgeGroup, y = Total_HEI2015_Score, fill = AgeGroup)) +
  geom_boxplot(alpha = 0.7) +
  labs(
    title = "Simulated Total HEI2015 Scores by Age Group",
    x = "Age Group",
    y = "Total HEI2015 Score"
  ) +
  theme_minimal() +
  scale_fill_brewer(palette = "Set3")
Figure 3: Simulated Total HEI2015 Scores by Age Group

Figure 3: Simulated Total HEI2015 Scores by Age Group

Figure 3 presents the HEI2015 scores across different age groups. Such visualizations can help identify specific demographic segments that might benefit from targeted nutritional interventions.

8.0.3.3 8.3.3. Average Component Scores

Understanding which components contribute most or least to the total score can provide insights into specific dietary strengths and weaknesses.

# Select only the score columns and pivot longer
component_scores_long <- hei_scores %>%
  select(ID, starts_with("Score_")) %>%
  pivot_longer(cols = starts_with("Score_"), names_to = "Component", values_to = "Score") %>%
  mutate(Component = gsub("Score_", "", Component)) # Clean up component names

# Calculate average score per component
average_component_scores <- component_scores_long %>%
  group_by(Component) %>%
  summarise(MeanScore = mean(Score, na.rm = TRUE)) %>%
  arrange(desc(MeanScore))

ggplot(average_component_scores, aes(x = reorder(Component, MeanScore), y = MeanScore, fill = MeanScore)) +
  geom_bar(stat = "identity") +
  coord_flip() + # Flip coordinates for better readability of component names
  labs(
    title = "Average Simulated HEI2015 Component Scores",
    x = "HEI2015 Component",
    y = "Average Score (Max 5 or 10)"
  ) +
  theme_minimal() +
  scale_fill_gradient(low = "lightcoral", high = "darkgreen")
Figure 4: Average Simulated HEI2015 Component Scores

Figure 4: Average Simulated HEI2015 Component Scores

Figure 4 displays the average scores for each HEI2015 component. Components with lower average scores (e.g., RefinedGrains, AddedSugar, SaturatedFat in our simulated data) indicate areas where dietary intake in the population could be improved. This type of analysis is crucial for developing targeted public health nutrition strategies.


9 9. Validation and Reliability

The dietaryindex package has undergone thorough validation to ensure its accuracy and reliability. * It has been compared against manually computed dietary index results using simulation datasets. * For HEI2015, results were compared with SAS-calculated results from the National Cancer Institute using NHANES 2017-2018 data, ASA24 example data, and DHQ3 example data. * The HEI2015 results showed a 100% match with SAS results after rounding to two decimal places. This rigorous validation process gives confidence in the package’s outputs.

10 10. Limitations and Best Practices

While dietaryindex is a powerful tool, it’s crucial to understand its limitations and follow best practices:

  • Data Quality is Paramount: The accuracy of the calculated indices is entirely dependent on the quality and accuracy of your raw dietary intake data. “Garbage in, garbage out” applies here.
  • Understanding Index Definitions: Always refer to the original definitions and scoring algorithms of each dietary index (e.g., the dietaryindex_SERVING_SIZE_DEFINITION.xlsx and dietaryindex_SCORING_ALGORITHM.xlsx files). Understand what each component means and how it’s scored.
  • Food Categorization: Accurately mapping your specific food items to the general food/nutrient categories required by the indices is the most critical and often challenging step. The package provides some help, but you may need to develop your own mapping strategy based on your dietary assessment tool.
  • Specific Components: Be aware that some very specific dietary index components (e.g., low-fat dairy products, sugar-sweetened beverages) can be difficult to assess accurately from general food intake data. The package makes estimations (e.g., SSB serving based on added sugar content), but use your judgment to determine if these estimations are appropriate for your research.
  • Consult Original Guidelines: Always cross-reference with the original guidelines and publications for each dietary index to ensure your interpretation and application are correct.

11 11. Further Resources

12 12. Conclusion

The dietaryindex R package is an invaluable tool for MPH students and researchers in Nutrition and Dietetics. It simplifies the complex process of calculating dietary pattern indices, promotes standardization, and enhances the efficiency and reliability of your research. By understanding its capabilities, workflow, and limitations, you can effectively leverage this package to advance your studies on diet and health outcomes.

13 13. Questions and Discussion

Please feel free to ask any questions you may have. We can also discuss potential applications of this package in your current or future research projects. ```