Week 7- Project 1

Author

Betty Liu

Intro

Within this nutritional data-set, we find a collection of more than 300 familiar food items, each accompanied by their respective macro-nutrient information. This includes data on calories, fats (including subcategories such as saturated fats), protein, carbohydrates (including subcategories for dietary fiber). Today we will look specially at calories, fats, and proteins from typical protein sources such as fish, seafood, meat, poultry, seeds and, nuts.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
setwd("C:/Users/It's Me Betty/Documents/MC. Data 110")
nutritionBASED <-read_csv("nutrients.csv", show_col_types = FALSE)
#Here we filter the catergorys where we have the common protein source.
proteins <- nutritionBASED |>
  filter(Category %in% c("Meat, Poultry", "Seeds and Nuts", "Fish, Seafood"))
#A few varibles values were characters, we mutated into numberic values.
numproteins <- proteins|>
  mutate(Calories = as.numeric(Calories),
         Fat = as.numeric(Fat),
         Protein = as.numeric(Protein))

#Removed data based on my fitness and nutrition background.
outlie <- c(1, 12, 41,54) 
finalnum <- numproteins[-outlie,]
ggplot(finalnum, aes(x = Fat, y = Protein, color = Category, size = Grams )) +
  geom_point(alpha = .5) +  
  labs(
    title = "Protein: Fat Ratio",
    subtitle = "From typical protein sources",
    x = "Fat (g)",
    y = "Protein (g)",
    color = "Protein Source",
    caption = "Data Soruce:  https://en.wikipedia.org/wiki/Table_of_food_nutrients "
  ) +
  scale_color_manual(values = c("Meat, Poultry" = "#ED4242", "Seeds and Nuts" =    "#599E0E", "Fish, Seafood"= "#4696CC")) +
  scale_size_continuous(name = "Serving in Grams") +
  theme_classic()+
  theme(
    legend.title = element_text(face = "bold", size = 10, color = "#000000"),
    legend.background = element_rect(fill = "#EDEDED", color = "#000000", 
                                     linewidth = .1),
    plot.title = element_text(face = "bold", size = 18 ),
    plot.background = element_rect(fill = "#DBEDFC"),
    plot.caption = element_text(size = 8, hjust = 0 ),
    plot.subtitle = element_text(size = 10),
    panel.background = element_rect(fill = "#EDEDED", color = "#000000"),
    axis.title.x = element_text(size = 9),
    axis.title.y = element_text(size = 10) 
  )

A Brief Essay

After driving into the data-set, I took a few steps to clean the data. Initially, I focused on a subset of variables, using the ‘filter’ function to narrow down the protein sources. Following this, I used the ‘mutate’ function to convert the variables into numeric format. Additionally, I removed outliers by identifying and excluding rows with known data issues.

This data visualization illustrates the fat-to-protein ratios of the selected foods. It reveals that, in terms of this ratio, fish and seafood are naturally leaner than red meats and poultry, whereas seeds and nuts are higher in fat compared to the other options. Toward the end of this document, you’ll find an interactive chart of the same data, which highlights that seeds and nuts provide more calories for a smaller serving size.

One aspect I aspired to address but couldn’t was the modification of the plot title and side titles’ font. Although I managed to make a few changes, I desired more variety. I also aimed to alter the color of the legend circles, which remained pending.

#This graph is an interactive plot where you can see which point what what protein and there calories with grams.
#This is a secondary grapgh for this Project.

Final_ <- ggplot(finalnum,   aes(x = Fat, y = Protein, text = Food, size = Calories,  label = Grams, color = Category)) +
  geom_point(alpha = 0.8) +  
  
 labs(
    title = "Protein to Fat Ratio",
    x = "Fat (g)",
    y = "Protein (g)",
    color = "Protein Source"
  ) +
  
 scale_color_manual(values = c("Meat, Poultry" = "#ED4242", "Seeds and Nuts" = "#599E0E", "Fish, Seafood"= "#4696CC")) +
  scale_size_continuous(name = "Welcomed") +
  theme_classic()+
  theme(
    legend.title = element_text(face = "bold", size = 15, color = "#021166"),
    legend.background = element_rect(fill = "#EDEDED", color = "#000000", linewidth = .1),
    plot.title = element_text(face = "bold", size = 18 ),
    plot.background = element_rect(fill = "#D7E8F5"),
    plot.caption = element_text( size = 30 ),
    plot.subtitle = element_text(size = 8),
    panel.background = element_rect(fill = "#EDEDED"),
    axis.title.x = element_text(size = 9),
    axis.title.y = element_text(size = 10) 
  )

Finally<- ggplotly(Final_)
Finally