Proj#2

Author

ZIHAO YU

1.How will I tackle the problem?

Upload the dataset to GitHub and export it. After cleaning and analyzing the data, generate visualizations through code and draw conclusions.

2.What data challenges do I anticipate?

Data cleaning may be challenging. If the data is complex and requires creating appropriate charts, I would utilize an LLM to assist with this task.

source: “https://github.com/XxY-coder/data607-Proj.2Y/raw/refs/heads/main/food_coded.csv” “https://github.com/XxY-coder/data607-Proj.2Y/raw/refs/heads/main/4243802.csv” “https://github.com/XxY-coder/data607-Proj.2Y/raw/refs/heads/main/wide_format_co2_emission_dataset.csv”

3. Data source

The dataset is from Kaggle. “https://www.kaggle.com/datasets/borapajo/food-choices?select=food_coded.csv”

I used the janitor package to clean and standardize the column names.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ ggplot2   4.0.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
data_1 <- read.csv("https://github.com/XxY-coder/data607-Proj.2Y/raw/refs/heads/main/food_coded.csv") |>
  clean_names() |>
  mutate(id = row_number(), .before = 1)
names(data_1)
 [1] "id"                           "gpa"                         
 [3] "gender"                       "breakfast"                   
 [5] "calories_chicken"             "calories_day"                
 [7] "calories_scone"               "coffee"                      
 [9] "comfort_food"                 "comfort_food_reasons"        
[11] "comfort_food_reasons_coded"   "cook"                        
[13] "comfort_food_reasons_coded_1" "cuisine"                     
[15] "diet_current"                 "diet_current_coded"          
[17] "drink"                        "eating_changes"              
[19] "eating_changes_coded"         "eating_changes_coded1"       
[21] "eating_out"                   "employment"                  
[23] "ethnic_food"                  "exercise"                    
[25] "father_education"             "father_profession"           
[27] "fav_cuisine"                  "fav_cuisine_coded"           
[29] "fav_food"                     "food_childhood"              
[31] "fries"                        "fruit_day"                   
[33] "grade_level"                  "greek_food"                  
[35] "healthy_feeling"              "healthy_meal"                
[37] "ideal_diet"                   "ideal_diet_coded"            
[39] "income"                       "indian_food"                 
[41] "italian_food"                 "life_rewarding"              
[43] "marital_status"               "meals_dinner_friend"         
[45] "mother_education"             "mother_profession"           
[47] "nutritional_check"            "on_off_campus"               
[49] "parents_cook"                 "pay_meal_out"                
[51] "persian_food"                 "self_perception_weight"      
[53] "soup"                         "sports"                      
[55] "thai_food"                    "tortilla_calories"           
[57] "turkey_calories"              "type_sports"                 
[59] "veggies_day"                  "vitamins"                    
[61] "waffle_calories"              "weight"                      

4.Raw data structure

dim(data_1)
[1] 125  62

There are 125 rows and 62 columns.

head(data_1)
  id   gpa gender breakfast calories_chicken calories_day calories_scone coffee
1  1   2.4      2         1              430          NaN            315      1
2  2 3.654      1         1              610            3            420      2
3  3   3.3      1         1              720            4            420      2
4  4   3.2      1         1              430            3            420      2
5  5   3.5      1         1              720            2            420      2
6  6  2.25      1         1              610            3            980      2
                      comfort_food
1                             none
2      chocolate, chips, ice cream
3  frozen yogurt, pizza, fast food
4 Pizza, Mac and cheese, ice cream
5     Ice cream, chocolate, chips 
6        Candy, brownies and soda.
                                         comfort_food_reasons
1                                       we dont have comfort 
2                                        Stress, bored, anger
3                                             stress, sadness
4                                                     Boredom
5                                  Stress, boredom, cravings 
6 None, i don't eat comfort food. I just eat when i'm hungry.
  comfort_food_reasons_coded cook comfort_food_reasons_coded_1 cuisine
1                          9    2                            9     NaN
2                          1    3                            1       1
3                          1    1                            1       3
4                          2    2                            2       2
5                          1    1                            1       2
6                          4    3                            4     NaN
                                                                                                                                                               diet_current
1                                                                                                                                                     eat good and exercise
2          I eat about three times a day with some snacks. I try to eat healthy but it doesn't always work out that- sometimes eat fast food and mainly eat at Laker/ Egan 
3                                                        toast and fruit for breakfast, salad for lunch, usually grilled chicken and veggies (or some variation) for dinner
4                                                                      College diet, cheap and easy foods most nights. Weekends traditionally, cook better homemade meals  
5 I try to eat healthy but often struggle because of living on campus. I still try to keep the choices I do make balanced with fruits and vegetables and limit the sweats. 
6                                                            My current diet is terrible. I barely have time to eat a meal in a day. When i do eat it's mostly not healthy.
  diet_current_coded drink
1                  1     1
2                  2     2
3                  3     1
4                  2     2
5                  2     2
6                  2     2
                                                                                                                                        eating_changes
1                                                                                                                                          eat faster 
2                                                                                                                          I eat out more than usual. 
3                                                                        sometimes choosing to eat fast food instead of cooking simply for convenience
4                                                                                                       Accepting cheap and premade/store bought foods
5 I have eaten generally the same foods but I do find myself eating the same food frequently due to what I have found I like from egan and the laker. 
6                                                                                                     Eating rice everyday. Eating less homemade food.
  eating_changes_coded eating_changes_coded1 eating_out employment ethnic_food
1                    1                     1          3          3           1
2                    1                     2          2          2           4
3                    1                     3          2          3           5
4                    1                     3          2          3           5
5                    3                     4          2          2           4
6                    1                     3          1          3           4
  exercise father_education father_profession    fav_cuisine fav_cuisine_coded
1        1                5         profesor  Arabic cuisine                 3
2        1                2    Self employed         Italian                 1
3        2                2     owns business        italian                 1
4        3                2         Mechanic        Turkish                  3
5        1                4                IT       Italian                  1
6        2                1       Taxi Driver        African                 6
  fav_food                               food_childhood fries fruit_day
1        1                           rice  and chicken      2         5
2        1 chicken and biscuits, beef soup, baked beans     1         4
3        3                 mac and cheese, pizza, tacos     1         5
4        1                Beef stroganoff, tacos, pizza     2         4
5        3                Pasta, chicken tender, pizza      1         4
6        3                Fries, plaintain & fried fish     1         2
  grade_level greek_food healthy_feeling
1           2          5               2
2           4          4               5
3           3          5               6
4           4          5               7
5           4          4               6
6           2          2               4
                                                                                    healthy_meal
1                                                                                looks not oily 
2             Grains, Veggies, (more of grains and veggies), small protein and fruit with dairy 
3                                        usually includes natural ingredients; nonprocessed food
4                                                       Fresh fruits& vegetables, organic meats 
5 A lean protein such as grilled chicken, green vegetables and  brown rice or other whole grain 
6                                                   Requires veggies, fruits and a cooked meal. 
                                                                                                           ideal_diet
1                                                                                                      being healthy 
2 Try to eat 5-6 small meals a day. While trying to properly distribute carbs, protein, fruits, veggies, and dairy.  
3                                                                        i would say my ideal diet is my current diet
4                                                                      Healthy, fresh veggies/fruits & organic foods 
5                                   Ideally I would like to be able to eat healthier foods in order to loose weight. 
6                               My ideal diet is to eat 3 times a day including breakfast on time. Eat healthy food. 
  ideal_diet_coded income indian_food italian_food life_rewarding
1                8      5           5            5              1
2                3      4           4            4              1
3                6      6           5            5              7
4                2      6           5            5              2
5                2      6           2            5              1
6                2      1           5            5              4
  marital_status
1              1
2              2
3              2
4              2
5              1
6              2
                                                                                                     meals_dinner_friend
1                                                                                                   rice, chicken,  soup
2                                                                                                 Pasta, steak, chicken 
3                                                      chicken and rice with veggies, pasta, some kind of healthy recipe
4                                                                       Grilled chicken \nStuffed Shells\nHomemade Chili
5                                                                Chicken Parmesan, Pulled Pork, Spaghetti and meatballs 
6 Anything they'd want. I'd ask them before hand what they want to eat and it depends on which type of friend is coming.
  mother_education         mother_profession nutritional_check on_off_campus
1                1                unemployed                 5             1
2                4                 Nurse RN                  4             1
3                2             owns business                 4             2
4                4 Special Education Teacher                 2             1
5                5  Substance Abuse Conselor                 3             1
6                1              Hair Braider                 1             1
  parents_cook pay_meal_out persian_food self_perception_weight soup sports
1            1            2            5                      3    1      1
2            1            4            4                      3    1      1
3            1            3            5                      6    1      2
4            1            2            5                      5    1      2
5            1            4            2                      4    1      1
6            2            5            5                      5    1      2
  thai_food tortilla_calories turkey_calories type_sports veggies_day vitamins
1         1              1165             345  car racing           5        1
2         2               725             690 Basketball            4        2
3         5              1165             500        none           5        1
4         5               725             690         nan           3        1
5         4               940             500    Softball           4        2
6         4               940             345       None.           1        2
  waffle_calories                   weight
1            1315                      187
2             900                      155
3             900 I'm not answering this. 
4            1315            Not sure, 240
5             760                      190
6            1315                      190

5.Transformation steps

new_data <- 
  data_1 %>%
  select(
    id,
    gender,
    calories_chicken,
    calories_scone,
    tortilla_calories,
    turkey_calories,
    waffle_calories
)  %>%
  mutate(
    gender = case_when(
      gender == 1 ~ "F",
      gender == 2 ~ "M",
      TRUE ~ NA_character_),
)

head(new_data)
  id gender calories_chicken calories_scone tortilla_calories turkey_calories
1  1      M              430            315              1165             345
2  2      F              610            420               725             690
3  3      F              720            420              1165             500
4  4      F              430            420               725             690
5  5      F              720            420               940             500
6  6      F              610            980               940             345
  waffle_calories
1            1315
2             900
3             900
4            1315
5             760
6            1315
dim(new_data)
[1] 125   7

There are 125 rows and 7 columns.

The data was reshaped from wide to long format, and missing values in the selected variables were removed. I have removed the missing values from the relevant columns. The removal of these missing values will not affect subsequent aggregation, visualization, and analysis.

food_calories <-
  new_data %>%
  pivot_longer(
    cols = c(
      calories_chicken,
      calories_scone,
      tortilla_calories,
      turkey_calories,
      waffle_calories
),
    names_to = "food_item",
    values_to = "calories"
) %>%
  mutate(
    food_item = recode(
      food_item,
      "calories_chicken" = "chicken",
      "calories_scone" = "scone",
      "tortilla_calories" = "tortilla",
      "turkey_calories" = "turkey",
      "waffle_calories" = "waffle"
    ) 
) %>%
  filter(!is.na(calories), !is.na(gender))

food_calories
# A tibble: 623 × 4
      id gender food_item calories
   <int> <chr>  <chr>        <dbl>
 1     1 M      chicken        430
 2     1 M      scone          315
 3     1 M      tortilla      1165
 4     1 M      turkey         345
 5     1 M      waffle        1315
 6     2 F      chicken        610
 7     2 F      scone          420
 8     2 F      tortilla       725
 9     2 F      turkey         690
10     2 F      waffle         900
# ℹ 613 more rows

6.Analysis and Conclusions

Compare the calories of 5 foods to see which has the highest calories, which has the lowest, etc.

calorie_summary <- 
  food_calories %>%
  group_by(food_item) %>%
  summarise(
    mean_calories = mean(calories, na.rm = TRUE),
    median_calories = median(calories, na.rm = TRUE),
    sd_calories = sd(calories, na.rm = TRUE),
    min_calories = min(calories, na.rm = TRUE),
    max_calories = max(calories, na.rm = TRUE),
    n = sum(!is.na(calories))
)

calorie_summary
# A tibble: 5 × 7
  food_item mean_calories median_calories sd_calories min_calories max_calories
  <chr>             <dbl>           <dbl>       <dbl>        <dbl>        <dbl>
1 chicken            577.             610        131.          265          720
2 scone              505.             420        231.          315          980
3 tortilla           948.             940        202.          580         1165
4 turkey             555.             500        152.          345          850
5 waffle            1073.             900        249.          575         1315
# ℹ 1 more variable: n <int>

Plot a bar chart

ggplot(
  calorie_summary, 
  aes(x = food_item, y = mean_calories, fill = food_item)
) +
  geom_col() +
  geom_text(aes(label = round(mean_calories, 1)), vjust = -0.3) +
  labs(
    title = "Average Calories",
    x = "Food Item",
    y = "Average Calories"
) +
  theme_minimal()

Waffles have the highest calorie count at 1073.4 calories, with a median of 900 and a maximum of 1315. Tortillas rank second highest at 947.6 calories. Scones have the lowest average calorie count at 505.2 calories.

Although turkey and chicken are different poultry types, the differences in metrics derived from this data are not significant.


I plan to divide participants into male and female groups to examine whether there are differences in food calorie intake between genders.

calories_by_gender <- 
  food_calories %>%
  group_by(gender, food_item) %>%
  summarise(
    Gmean_calories = mean(calories, na.rm = TRUE),
    Gmedian_calories = median(calories, na.rm = TRUE),
    n = sum(!is.na(calories)),
    .groups = "drop"
)

calories_by_gender
# A tibble: 10 × 5
   gender food_item Gmean_calories Gmedian_calories     n
   <chr>  <chr>              <dbl>            <dbl> <int>
 1 F      chicken             588.              610    76
 2 F      scone               476.              420    75
 3 F      tortilla            912.              940    75
 4 F      turkey              537.              500    76
 5 F      waffle             1044.              900    76
 6 M      chicken             561.              610    49
 7 M      scone               549.              420    49
 8 M      tortilla           1002.              940    49
 9 M      turkey              584.              500    49
10 M      waffle             1119.             1315    49

Although the average calorie intake was slightly higher among males, but the counts of females is more than males, resulting in an imbalance between the two groups. Therefore, this dataset does not support a clear conclusion about gender differences.