Analysis In the 2nd and 3rd tables: 1.The restaurants were displayed in the order they were because they are ordered by their count from the largest to the lowest;
2.The overall column represents mean value of customer rate by sex/race;
3.The overall column was added to the table in the position it was because it is added as the margin of average value : c3= addmargins(c2, FUN = overall);
4.The sex/race columns were grouped together because they belong to the same dimension_;
require(readxl)
## Loading required package: readxl
## Warning: package 'readxl' was built under R version 3.5.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(data.table)
## Warning: package 'data.table' was built under R version 3.5.3
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
library(vcd)
## Loading required package: grid
data= read.csv("C:/Users/zjoyi/Downloads/data.csv")
dft= data.table(data)
# total cases
totalcase= nrow(dft)
totalcase
## [1] 1192
1.The restaurants were displayed in the order they were because they are ordered by their Count from the largest to the lowest, as shown in the following code:setorder(a,-Count);
# we can get Burger king 309, KFC= 273, McDonald = 296, Taco Bell = 314
options(digits = 4)
a= dft %>% group_by(restaurant) %>% summarise( Count = n()) %>% mutate (Percentage = (Count/totalcase)*100)
colnames(a)[1] <- "Restaurant"
# order table by Count
setorder(a,-Count)
a
## # A tibble: 4 x 3
## Restaurant Count Percentage
## * <fct> <int> <dbl>
## 1 Taco Bell 314 26.3
## 2 Burger King 309 25.9
## 3 McDonald 296 24.8
## 4 KFC 273 22.9
For each restaurant, find the average star rating by Sex.Average Customer Rating by Sex;
2.The overall column represents mean value of customer rate by sex/race;
#Average star rating by Sex
b= dft %>% group_by(restaurant, sex) %>% summarise(val = mean(stars))
colnames(b)[1] <- "Restaurant"
b2= xtabs(val ~ Restaurant+ sex, data = b)
b2
## sex
## Restaurant Female Male
## Burger King 3.058 2.954
## KFC 2.890 3.247
## McDonald 2.830 2.920
## Taco Bell 3.144 3.170
#Average Customer Rating by Sex
overall<-mean
b3= addmargins(b2, FUN = overall)
## Margins computed over dimensions
## in the following order:
## 1: Restaurant
## 2: sex
b3<-b3[-nrow(b3),]
b4<-b3[order(-b3[,3]),]
round(b4, 2)
## sex
## Restaurant Female Male overall
## Taco Bell 3.14 3.17 3.16
## KFC 2.89 3.25 3.07
## Burger King 3.06 2.95 3.01
## McDonald 2.83 2.92 2.87
For each restaurant, find the average star rating by race.Average Customer Rating by race;
3.The overall column was added to the table in the position it was because it is added as the margin of average value : c3= addmargins(c2, FUN = overall);
4.The sex/race columns were grouped together because they belong to the same dimension.
#For each restaurant, find the average star rating by race.
c= dft %>% group_by(restaurant, race) %>% summarise(val = mean(stars))
colnames(c)[1] <- "Restaurant"
c2= xtabs(val ~ Restaurant+ race, data = c)
c2
## race
## Restaurant Black Hispanic Other White
## Burger King 2.926 2.877 3.176 3.011
## KFC 2.889 3.194 2.875 3.409
## McDonald 2.805 2.694 3.014 2.962
## Taco Bell 3.167 3.108 3.203 3.152
#Average Customer Rating by race
overall<-mean
c3= addmargins(c2, FUN = overall)
## Margins computed over dimensions
## in the following order:
## 1: Restaurant
## 2: race
c3<-c3[-nrow(c3),]
c4<-c3[order(-c3[,5]),]
round(c4, 2)
## race
## Restaurant Black Hispanic Other White overall
## Taco Bell 3.17 3.11 3.20 3.15 3.16
## KFC 2.89 3.19 2.88 3.41 3.09
## Burger King 2.93 2.88 3.18 3.01 3.00
## McDonald 2.80 2.69 3.01 2.96 2.87