Module 2-Working with Categorical Data 2

Loading Data into R

library(readr)
data <- read_csv("C:/Users/SoloTraveler/Downloads/data.csv")

## Parsed with column specification:
## cols(
##   age = col_double(),
##   stars = col_double(),
##   race = col_character(),
##   id = col_double(),
##   restaurant = col_character(),
##   sex = col_character()
## )

View(data)

Report

Find the total number of cases: 1391

length(data$id)

## [1] 1391

For each restaurant, find the total number of customers.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

data %>%
group_by(data$restaurant) %>%
summarise(count=n()) %>%
mutate(Percentage=count/sum(count)*100) %>%
arrange(desc(count))

## `summarise()` ungrouping output (override with `.groups` argument)

## # A tibble: 4 x 3
##   `data$restaurant` count Percentage
##   <chr>             <int>      <dbl>
## 1 Taco Bell           369       26.5
## 2 KFC                 344       24.7
## 3 Burger King         340       24.4
## 4 McDonald            338       24.3

For each restaurant, find the average star rating by Sex.

Average customer rating by Sex

mydata1 <- matrix(c(3.0585,3.017, 3.100,3.0505,    2.982,  3.119,2.9640,   2.979,  2.949,2.8670,   2.913,  2.821),ncol=3,byrow=TRUE)
 colnames(mydata1) <- c("Overall","Male","Female")
 rownames(mydata1) <- c("McDonald","KFC","Taco Bell","Burger King")
mydata1 <- as.table(mydata1)
mydata1

##             Overall   Male Female
## McDonald     3.0585 3.0170 3.1000
## KFC          3.0505 2.9820 3.1190
## Taco Bell    2.9640 2.9790 2.9490
## Burger King  2.8670 2.9130 2.8210

For each restaurant, find the average star rating by Race.

Average customer rating by Race

mydata2 <- matrix(c(3.05775,3.038,3.112,3.011,3.070,3.05100,   3.000,  3.264,2.963,2.977,2.96650,3.009,3.057,2.787,3.013,2.86900,  2.894,  3.034   ,2.859, 2.689),ncol=5,byrow=TRUE)
 colnames(mydata2) <- c("Overall","White","Blake","Hispanic", "Other")
 rownames(mydata2) <- c("McDonald","KFC","Taco Bell","Burger King")
mydata2 <- as.table(mydata2)
mydata2

##             Overall   White   Blake Hispanic   Other
## McDonald    3.05775 3.03800 3.11200  3.01100 3.07000
## KFC         3.05100 3.00000 3.26400  2.96300 2.97700
## Taco Bell   2.96650 3.00900 3.05700  2.78700 3.01300
## Burger King 2.86900 2.89400 3.03400  2.85900 2.68900

Analysis

In the 2nd and 3rd tables:

1.The restaurants were displayed in the order they were because data frame is ordered by the frequency.

2.The overall column represents average rating by race and sex.

3.The overall column was added to the table in the position it was because it is the average rating by race/sex.

4.The sex/race columns were grouped together because they belong to same category.

Module 2-Working with Categorical Data 2

Gayathri Mutyala

2020-09-29