Burger Dataset Project 1

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.1
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library (dplyr)

 setwd("C:/Users/COCO3/Downloads")
burger<-read_csv("burger.csv")

## Rows: 500 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): best_burger_place, gender
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Introduction:

Question: Which burger place is most popular amongst female and males? This dataset asks 500 males and females what their favorite burger place is. The burger place options consist of Five Guys, In N Out, Fat Burger, Tommys Hamburgers, Umami Burger, Other, and Unsure. I actually found this data set when I was searching for a bird related one, but I typed the letters “b” and “u” on accident, and I saw the dataset named “burger”. It was a very simple name and I found it a little funny, so I decided to check it out. I stuck with it though because the straight forwardness. But continuing on, I will be using all 500 answers, and both columns: Gender and Burger place.To find out what burger place is most popular out of both genders, I will separate specific burger places to their corresponding gender, then I will try and add male and females with the same burger place preference. This dataset is from SurveryUSA, and it was collected in 2010. So, let’s get into it.

Data analysis:

Starting off, we are going to use str and head to check the structer and head for the dataset. I wrote a little about all of the codes used in the comments, but to expand on it, structure and head tells us how to titles and columns look. We use this to find out if there are any unneccessary or hard-to-read things in the dataset. Like, if the original author had used a different number or term instead of N/A, or if the author used different quotations to indicate spaces, and so on. In this particular dataset, there were no mistakes with that. Just to be sure though, I used col.sums(is.na) to check for N/A’s.The code showed me that there were no N/A’s. For the next part, I used table to get a total of all of the burger places versus what gender chose them. For example, Five Guys was chosen by 6 Females and 5 Males. For the next part, I used sum to find the added total of votes for each burger place, not based on gender. This adds all of the votes for a burger place, and does not look at the gender column. For the last part, I used AI to make a bar graph, and it shows how many people (based on gender) prefer what burger place. It also made sure to add a legend so you know what color is for what gender.

This is to check the structure and the head of the dataset. As we can see, so far, the titles of the dataset “best_burger_place” and “gender” both are fine when it comes to spaces, capitalization, trailing underscores, or other issues. Because of this, there isn’t anything to clean from the title part of it.

str (burger)

## spc_tbl_ [500 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ best_burger_place: chr [1:500] "Five Guys Burgers" "Five Guys Burgers" "Five Guys Burgers" "Five Guys Burgers" ...
##  $ gender           : chr [1:500] "Male" "Male" "Male" "Male" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   best_burger_place = col_character(),
##   ..   gender = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

head (burger)

## # A tibble: 6 × 2
##   best_burger_place gender
##   <chr>             <chr> 
## 1 Five Guys Burgers Male  
## 2 Five Guys Burgers Male  
## 3 Five Guys Burgers Male  
## 4 Five Guys Burgers Male  
## 5 Five Guys Burgers Male  
## 6 Five Guys Burgers Female

## Checking if there are any N/A's in the dataset

colSums(is.na(burger))

## best_burger_place            gender 
##                 0                 0

This creates a table between all of the burger places versus the genders. From this table, we can see how many people based on their gender, prefers a specific burger place. For example, we see that In-N-Out has the highest female and male numbers.

burger %>%
  group_by(best_burger_place, gender) %>%
  summarize(votes_by_gender=n())

## `summarise()` has grouped output by 'best_burger_place'. You can override using
## the `.groups` argument.

## # A tibble: 14 × 3
## # Groups:   best_burger_place [7]
##    best_burger_place  gender votes_by_gender
##    <chr>              <chr>            <int>
##  1 Fat Burger         Female              12
##  2 Fat Burger         Male                10
##  3 Five Guys Burgers  Female               6
##  4 Five Guys Burgers  Male                 5
##  5 In-N-Out Burger    Female             181
##  6 In-N-Out Burger    Male               162
##  7 Not Sure           Female               5
##  8 Not Sure           Male                13
##  9 Other              Female              20
## 10 Other              Male                26
## 11 Tommy's Hamburgers Female              27
## 12 Tommy's Hamburgers Male                27
## 13 Umami Burger       Female               1
## 14 Umami Burger       Male                 5

From using table in the chunk above, it seems that In-N-Out has the most numbers for both genders. To make sure of that, I asked for a summary of the burger place column for each burger place. This gives me the total number for each burger place. And, as we did earlier, it seems as In-N-Out still has the highest number.

burger %>%
  group_by(best_burger_place)%>%
  summarize (total_votes=n())

## # A tibble: 7 × 2
##   best_burger_place  total_votes
##   <chr>                    <int>
## 1 Fat Burger                  22
## 2 Five Guys Burgers           11
## 3 In-N-Out Burger            343
## 4 Not Sure                    18
## 5 Other                       46
## 6 Tommy's Hamburgers          54
## 7 Umami Burger                 6

# I used AI for this, basically asked it to make a graph based on the gender and burger place.


barplot(table(burger$gender, burger$best_burger_place),
        beside = TRUE,
        main = "Burger Place Preferences by Gender",
        xlab = "Burger Place",
        ylab = "Count",
        col = c("lightblue", "lightpink"),
        las = 2,
        cex.names = 0.7)

legend("topright", 
       legend = unique(burger$gender),
       fill = c("pink", "lightblue"))

Conclusion:

So, after all, we find out that the answer to our question is In-N-Out Burgers. They have the most votes between both female and male. We can also see on the graph that they have the highest bar. So not only were we able to tell from the “sum(burger)” but we could also see the physical difference compared to the other burger places when it comes to which gender prefers which burger place. Other things I noticed with this is that Umami Burger has less votes than “Other” or “Not sure” which was surprising to me, because that must mean Umami Burger is not popular amongst neither genders. In the future, I want to learn how to shorten the code for the sum() part. There is probably a shorter way to find the summary of each of the columnn (without using table). I just forgot it now… Also getting better at the bar graphs would be helpful.

References:

I used our other in class assignments and homework, mostly Week 4 HW and the air quality data analysis.