This data set is comprised of 5 variables, 2 of them are categorical and 3 of them are numeric. Two of the numeric variables are measurements of energy; Calories which is the imperial equivalent to kilojules (KJ), and the third numeric variable is just a base measurement of 100 grams for all the food categories recorded. Although the data within this csv is compiled from calories.info, fdc.nal.uda.gov/food-search provides an official source to which you can search any food, and have a more in-depth summary of said foods components. My plan is to explore 12 satiating foods that are both filling and low in calories compared to it’s weight from a variety of food categories.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)
calories <- readr::read_csv("calories.csv")
Rows: 2225 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): FoodCategory, FoodItem, per100grams, Cals_per100grams, KJ_per100grams
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
This chunk filters through the specific categories of food that I wanted to look at so that I could narrow down specific food items I would choose. Lines 35-36 turn two of the categorical variables into numeric variables, the original syntax of 100cal for example wasn’t registering as numerical so I had to convert them.
This chunk mutates the two variables to be graphed in ascending order.
ggplot(s2, aes(x = FoodItem, y = Cals_per100grams, fill = FoodCategory)) +geom_col(width = .9, alpha =1) +scale_color_brewer() +theme_bw(base_size =10, base_family ="Georgia") +labs(x ="Food Items", y ="Calories Per 100 Grams", title ="12 Saciating Foods Per 100 Grams", fill ="Food Categories", caption ="Source: calories.info",subtitle ="A Variety of Filling Foods from Different Food Categories") +theme(plot.title =element_text(size =16, hjust =0.5),plot.subtitle =element_text(size =8, hjust =0.5),axis.text.x =element_text(angle =30, hjust =1, size =8))
This chunk plots the specific food items on the x-axis and plots the calories per 100 grams on the y-axis. I had the fill of the graph be the Food Category and then ploted for a bar graph that went from least to most calories. I increased the size of the base of the theme and changed the font to Georgia, I then adjusted all the font sizes and adjusted the x-axis labels so they were not clutered.
Essay
The main way that I cleaned my data set was to filter for the specific categories of food that I wanted to look at, and then I choose 12 generic food items that I personally researched based on macro nutrients, fiber content and protein. My visualization is a simple bar graph that shows the ascending order of food items from lowest calories to highest calories, I specifically chose foods that I’ve researched and have personally eaten as dieting and understanding what I’m consuming is important. From s1 (Which is just Satiating1), I researched foods categories that were healthy but also filling, meaning they were low in “energy density”. These foods were high in fiber, protein, and in volume (water or air weight). Then is s2 I chose 12 foods that I felt were consumed by the general public, then graphed them in ascending order using the mutate chunk.
My personal gripe with the data set that I should have realized earlier was that I wasn’t able to create a linear regression, or correlate a multitude of factors in relation to the food types. For example, I wish there were more than just 5 categories and two of them are honestly redundant (being calories and kilojules), they’re essentially the same thing just different measurement systems. I also wish the person that collected the data for the data set included macro nutrients like; protein, carbohydrates, fats (unsaturated, and saturated), sodium, etc. This could have given myself and the audience a better understanding of the differences in maybe animal protein content vs. plant protein, or maybe correlating fattier cuts of meat with increased health risks depending on the type of fat content, etc. Certain cuts of meat that are leaner like chicken breast (172 calories), are high in protein leaving you fuller for longer and are lower in calories in comparison to fattier cuts like chicken thighs (229 calories).
From what I researched, boiled potatoes for example are one of the most filling foods for it’s calorie content in relation to 100 grams (100grams is equivalent to 1/2 cups or 0.22 pounds). Now the big factor which most people don’t consider is that how you prepare your food is just as important as what the food is. Baked potato on its own may be relatively health for example, but when you start to add certain oils, toppings, sauces, etc. that’s where the unhealthy increase in calories might come in. Seasonings like; salt, pepper, paprika, garlic powder, etc. have zero calories, so as long as your maintaining particular macros, and lower calories, healthier foods can still be flavorful and tasty.