In this project, I analyze a nutrition dataset that contains information on various food items. The dataset includes nutritional information such as carbohydrate content, protein content, and calorie values. The dataset contains both quantitative variables (such as carbohydrate and protein content) and categorical variables (such as food category).
The variables analyzed include:
Data.Kilocalories (calories): The number of calories in a food item.
Data.Carbohydrate (carbohydrates): The amount of carbohydrates in grams.
Data.Protein (protein): The amount of protein in grams.
Category: The type or category of the food (e.g., dairy, meat, grains).
I chose this dataset to explore the relationship between macronutrients (carbohydrates and protein) and their caloric content, which is a key aspect of nutrition analysis.
Background Research
Nutrition plays a crucial role in maintaining health and preventing disease. Carbohydrates and protein are two of the main macronutrients that provide energy to the body. The relationship between carbohydrate intake and protein content in foods is important because these nutrients support different physiological functions. While carbohydrates are the body’s primary energy source, proteins are essential for tissue repair and muscle growth.
The dataset used in this project provides an opportunity to explore how these macronutrients correlate with the calorie content of various food items, helping us understand food composition better.
To filter the data, I first looked at the 10 most common categories in the dataset. I counted how often each category appeared and then selected the top 10. After that, I filtered the dataset to keep only the rows that correspond to these top 10 categories. This way, I focused the analysis and the scatter plot just on the most frequent food categories, ensuring that the results were representative of the most common items in the dataset.
The categories that appear in the dataset are as follows, with the number of occurrences for each:
BEEF: 457 occurrences
PORK: 304 occurrences
CEREALS RTE: 291 occurrences
BABYFOOD: 221 occurrences
LAMB: 198 occurrences
No Category: 198 occurrences
SOUP: 188 occurrences
CHICKEN: 163 occurrences
CEREALS: 132 occurrences
FAST FOODS: 130 occurrences
These category counts highlight the most common food types in the dataset. The “No Category” group likely represents items that didn’t fit into a specific category or were missing category information. The scatter plot’s points, grouped by these categories, provide insights into how food types with different macronutrient compositions are distributed across carbohydrates and protein.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 7413 Columns: 48
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): Category, Description, Data.Household Weights.1st Household Weight...
dbl (44): Nutrient Data Bank Number, Data.Alpha Carotene, Data.Ash, Data.Bet...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
top_categories <-c("BEEF", "PORK", "CEREALS RTE", "BABYFOOD", "LAMB", "No Category", "SOUP", "CHICKEN", "CEREALS", "FAST FOODS")data_filtered <- data %>%filter(Category %in% top_categories)ggplot(data_filtered, aes(x =`Data.Carbohydrate`, y =`Data.Protein`, color = Category)) +geom_point() +# Add points to the plotgeom_smooth(method ="lm", se =FALSE, color ="red") +labs(title ="Scatter Plot of Carbohydrates vs Protein with Regression Line (Top 10 Categories)",x ="Carbohydrates (g)",y ="Protein (g)",color ="Food Category" ) +theme_minimal()
`geom_smooth()` using formula = 'y ~ x'
In the scatter plot, each point represents a food item, with the x-axis showing the carbohydrate content and the y-axis showing the protein content. The points are distributed across the graph in such a way that there is a noticeable moderate negative correlation between the two variables. This means that as the carbohydrate content increases, the protein content tends to decrease slightly. The points don’t form a perfect line, but there is a general downward trend, suggesting that foods high in carbs (like grains and starchy foods) often have lower protein content, while protein-rich foods (such as meats) tend to have lower carbohydrate content.
# Calculate the correlation coefficient between Carbohydrates and Proteincorrelation <-cor(data_filtered$`Data.Carbohydrate`, data_filtered$`Data.Protein`, use ="complete.obs")# Print the correlation coefficientcorrelation
[1] -0.4315321
Conclusion
The scatter plot revealed a moderate negative correlation between Carbohydrates and Protein (correlation coefficient = -0.43). This suggests that, generally, as the carbohydrate content in food items increases, the protein content tends to decrease, but the relationship is not particularly strong. In short, while there is a bit of a trend in the data showing that foods with more carbohydrates tend to have less protein, the relationship isn’t super strong. The correlation we found is moderate (around -0.43), which means there’s a slight inverse relationship, higher carbs usually mean lower protein, but it’s not a strict rule. This kind of makes sense when you think about it because things like grains and starchy foods often have more carbs and less protein, while foods like meats are packed with protein but don’t have as much carbs. However, the data points in the scatter plot are pretty spread out, which tells us that other factors—like the type of food (e.g., dairy, meat, grains) could have a bigger impact on the balance of carbs and protein.So, in the end, while there’s some connection between carbs and protein, it’s not strong enough to say you can predict one based on the other. It’s clear that other things are going on that affect the nutritional makeup of these foods, and that could be an interesting area to dig deeper into.
Bibliography
American Dietetic Association. (2021). The importance of macronutrients in human health. Retrieved from https://www.eatright.org/
Harvard T.H. Chan School of Public Health. (2021). The Nutrition Source: Protein. Retrieved from https://www.hsph.harvard.edu/nutritionsource/protein/
NutritionData.com. (2021). Nutrition and food composition analysis. Retrieved from https://www.nutritiondata.com/