Loading Data into R
library(readr)
data <- read_csv("C:/Users/SoloTraveler/Downloads/data.csv")
## Parsed with column specification:
## cols(
## age = col_double(),
## stars = col_double(),
## race = col_character(),
## id = col_double(),
## restaurant = col_character(),
## sex = col_character()
## )
View(data)
Report
Find the total number of cases: 1391
length(data$id)
## [1] 1391
For each restaurant, find the total number of customers.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data %>%
group_by(data$restaurant) %>%
summarise(count=n()) %>%
mutate(Percentage=count/sum(count)*100) %>%
arrange(desc(count))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 4 x 3
## `data$restaurant` count Percentage
## <chr> <int> <dbl>
## 1 Taco Bell 369 26.5
## 2 KFC 344 24.7
## 3 Burger King 340 24.4
## 4 McDonald 338 24.3
For each restaurant, find the average star rating by Sex.
Average customer rating by Sex
mydata1 <- matrix(c(3.0585,3.017, 3.100,3.0505, 2.982, 3.119,2.9640, 2.979, 2.949,2.8670, 2.913, 2.821),ncol=3,byrow=TRUE)
colnames(mydata1) <- c("Overall","Male","Female")
rownames(mydata1) <- c("McDonald","KFC","Taco Bell","Burger King")
mydata1 <- as.table(mydata1)
mydata1
## Overall Male Female
## McDonald 3.0585 3.0170 3.1000
## KFC 3.0505 2.9820 3.1190
## Taco Bell 2.9640 2.9790 2.9490
## Burger King 2.8670 2.9130 2.8210
For each restaurant, find the average star rating by Race.
Average customer rating by Race
mydata2 <- matrix(c(3.05775,3.038,3.112,3.011,3.070,3.05100, 3.000, 3.264,2.963,2.977,2.96650,3.009,3.057,2.787,3.013,2.86900, 2.894, 3.034 ,2.859, 2.689),ncol=5,byrow=TRUE)
colnames(mydata2) <- c("Overall","White","Blake","Hispanic", "Other")
rownames(mydata2) <- c("McDonald","KFC","Taco Bell","Burger King")
mydata2 <- as.table(mydata2)
mydata2
## Overall White Blake Hispanic Other
## McDonald 3.05775 3.03800 3.11200 3.01100 3.07000
## KFC 3.05100 3.00000 3.26400 2.96300 2.97700
## Taco Bell 2.96650 3.00900 3.05700 2.78700 3.01300
## Burger King 2.86900 2.89400 3.03400 2.85900 2.68900
Analysis
In the 2nd and 3rd tables:
1.The restaurants were displayed in the order they were because data frame is ordered by the frequency.
2.The overall column represents average rating by race and sex.
3.The overall column was added to the table in the position it was because it is the average rating by race/sex.
4.The sex/race columns were grouped together because they belong to same category.