Which Cheese is the healthiest and which is the least healthy?
The purpose of this document is to rank cheese using fats, carbs, protein, and calories. Using these we will create weights and for each based on how important it is and assign each cheese a score. The cheese we are using comes from a html table on the website https://www.fatsecret.com/calories-nutrition/food/cheese. We will be using a scraping method that will allow us to take the data from the table on the web and recreate it on R for analysis. The list we are using has 28 different cheeses on it. These are all popular cheeses so it will be interesting to see which, according to our criteria, is the healthiest!
To start our analysis of cheese, we need to start with loading in the data.
#load libraryslibrary(tidyverse) # The tidyverse collection of packages
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.3 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.3 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(httr) # Useful for web authenticationlibrary(rvest) # Useful tools for working with HTML and XML
Attaching package: 'rvest'
The following object is masked from 'package:readr':
guess_encoding
library(readr) # Help loading in the datalibrary(dplyr) # Useful for Analysis#load data setcheese_table <-read.csv("https://myxavier-my.sharepoint.com/:x:/g/personal/galuppoj_xavier_edu/EZX4f-klYnxGjmjcDHLq5uoBxMRTchhvYZil07wcEi3Oeg?download=1")
Next we are going to create our criteria for rating the healthiest cheese. To being we are going to assign fats, carbs, and proteins with values to add up and then rank them in order of these values. We want to assign proteins with the highest score because protein is the most important for our health. Next we want carbs to be assigned the second highest score because is the second most important for our health. Then we want fats be the second least scored because they are the not beneficial for our health. Finally, we want to eat the least amount of calories when we eat cheese because cheese is not the highest quality food to eat
cheese_table %>%arrange(desc(Score)) %>%ggplot(aes(x =reorder(Popular.Types.of.Cheese............1.slice.or.1.oz.serving., -Score), y = Score, fill = Score)) +geom_bar(stat ="identity", color ="black") +scale_fill_viridis_c() +theme_minimal() +labs(x ="Cheese Types", y ="Score", title ="Cheese Ranking Based on Scores") +theme(axis.text.x =element_text(angle =45, hjust =1, vjust =1))
We can see based on this table and graph that the best cheese to eat in terms of our health criteria is…. Goat cheese! It contains high proteins with moderate fats and calories per serving. This would definitely be a good cheese to eat if you are looking for energy to go.
Ricotta Cheese has the lowest score with low protein, fats, carbs, and calories. If you are looking to indulge in something that will give you long lasting energy without having to eat a ton of it, you should choose something other than Ricotta!
This graph shows us how the popular cheeses look based on our criteria. There are no outliers in this data which is to be expected but there definitely is a difference from between healthy cheese and less healthy cheese. Make sure to come back to this if you find yourself indulging in one of these popular cheeses.