library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.1
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library (dplyr)
setwd("C:/Users/COCO3/Downloads")
burger<-read_csv("burger.csv")
## Rows: 500 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): best_burger_place, gender
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Question: Which burger place is most popular amongst female and males? This dataset asks 500 males and females what their favorite burger place is. The burger place options consist of Five Guys, In N Out, Fat Burger, Tommys Hamburgers, Umami Burger, Other, and Unsure. I actually found this data set when I was searching for a bird related one, but I typed the letters “b” and “u” on accident, and I saw the dataset named “burger”. It was a very simple name and I found it a little funny, so I decided to check it out. I stuck with it though because the straight forwardness. But continuing on, I will be using all 500 answers, and both columns: Gender and Burger place.To find out what burger place is most popular out of both genders, I will separate specific burger places to their corresponding gender, then I will try and add male and females with the same burger place preference. This dataset is from SurveryUSA, and it was collected in 2010. So, let’s get into it.
Starting off, we are going to use str and head to check the structer and head for the dataset. I wrote a little about all of the codes used in the comments, but to expand on it, structure and head tells us how to titles and columns look. We use this to find out if there are any unneccessary or hard-to-read things in the dataset. Like, if the original author had used a different number or term instead of N/A, or if the author used different quotations to indicate spaces, and so on. In this particular dataset, there were no mistakes with that. Just to be sure though, I used col.sums(is.na) to check for N/A’s.The code showed me that there were no N/A’s. For the next part, I used table to get a total of all of the burger places versus what gender chose them. For example, Five Guys was chosen by 6 Females and 5 Males. For the next part, I used sum to find the added total of votes for each burger place, not based on gender. This adds all of the votes for a burger place, and does not look at the gender column. For the last part, I used AI to make a bar graph, and it shows how many people (based on gender) prefer what burger place. It also made sure to add a legend so you know what color is for what gender.
str (burger)
## spc_tbl_ [500 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ best_burger_place: chr [1:500] "Five Guys Burgers" "Five Guys Burgers" "Five Guys Burgers" "Five Guys Burgers" ...
## $ gender : chr [1:500] "Male" "Male" "Male" "Male" ...
## - attr(*, "spec")=
## .. cols(
## .. best_burger_place = col_character(),
## .. gender = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
head (burger)
## # A tibble: 6 × 2
## best_burger_place gender
## <chr> <chr>
## 1 Five Guys Burgers Male
## 2 Five Guys Burgers Male
## 3 Five Guys Burgers Male
## 4 Five Guys Burgers Male
## 5 Five Guys Burgers Male
## 6 Five Guys Burgers Female
## Checking if there are any N/A's in the dataset
colSums(is.na(burger))
## best_burger_place gender
## 0 0
burger %>%
group_by(best_burger_place, gender) %>%
summarize(votes_by_gender=n())
## `summarise()` has grouped output by 'best_burger_place'. You can override using
## the `.groups` argument.
## # A tibble: 14 × 3
## # Groups: best_burger_place [7]
## best_burger_place gender votes_by_gender
## <chr> <chr> <int>
## 1 Fat Burger Female 12
## 2 Fat Burger Male 10
## 3 Five Guys Burgers Female 6
## 4 Five Guys Burgers Male 5
## 5 In-N-Out Burger Female 181
## 6 In-N-Out Burger Male 162
## 7 Not Sure Female 5
## 8 Not Sure Male 13
## 9 Other Female 20
## 10 Other Male 26
## 11 Tommy's Hamburgers Female 27
## 12 Tommy's Hamburgers Male 27
## 13 Umami Burger Female 1
## 14 Umami Burger Male 5
burger %>%
group_by(best_burger_place)%>%
summarize (total_votes=n())
## # A tibble: 7 × 2
## best_burger_place total_votes
## <chr> <int>
## 1 Fat Burger 22
## 2 Five Guys Burgers 11
## 3 In-N-Out Burger 343
## 4 Not Sure 18
## 5 Other 46
## 6 Tommy's Hamburgers 54
## 7 Umami Burger 6
# I used AI for this, basically asked it to make a graph based on the gender and burger place.
barplot(table(burger$gender, burger$best_burger_place),
beside = TRUE,
main = "Burger Place Preferences by Gender",
xlab = "Burger Place",
ylab = "Count",
col = c("lightblue", "lightpink"),
las = 2,
cex.names = 0.7)
legend("topright",
legend = unique(burger$gender),
fill = c("pink", "lightblue"))
So, after all, we find out that the answer to our question is In-N-Out Burgers. They have the most votes between both female and male. We can also see on the graph that they have the highest bar. So not only were we able to tell from the “sum(burger)” but we could also see the physical difference compared to the other burger places when it comes to which gender prefers which burger place. Other things I noticed with this is that Umami Burger has less votes than “Other” or “Not sure” which was surprising to me, because that must mean Umami Burger is not popular amongst neither genders. In the future, I want to learn how to shorten the code for the sum() part. There is probably a shorter way to find the summary of each of the columnn (without using table). I just forgot it now… Also getting better at the bar graphs would be helpful.
I used our other in class assignments and homework, mostly Week 4 HW and the air quality data analysis.