myData <- read.table("./Independence100.csv", header = TRUE, sep = ",")
head(myData)
## Rank Restaurant Sales Average.Check City
## 1 1 Carmine's (Times Square) 39080335 40 New York
## 2 2 The Boathouse Orlando 35218364 43 Orlando
## 3 3 Old Ebbitt Grill 29104017 33 Washington
## 4 4 LAVO Italian Restaurant & Nightclub 26916180 90 New York
## 5 5 Bryant Park Grill & Cafe 26900000 62 New York
## 6 6 Gibsons Bar & Steakhouse 25409952 80 Chicago
## State Meals.Served
## 1 N.Y. 469803
## 2 Fla. 820819
## 3 D.C. 892830
## 4 N.Y. 198500
## 5 N.Y. 403000
## 6 Ill. 348567
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
myData <- myData %>%
rename(Average_Check = Average.Check) %>%
rename(Meals_Served = Meals.Served)
myData_sub <- myData[,-c(1, 2, 5)]
summary(myData_sub[,-3])
## Sales Average_Check Meals_Served
## Min. :11391678 Min. : 17.00 Min. : 87070
## 1st Qu.:14094836 1st Qu.: 39.00 1st Qu.:189492
## Median :17300776 Median : 65.50 Median :257097
## Mean :17833434 Mean : 69.05 Mean :317167
## 3rd Qu.:19903916 3rd Qu.: 95.00 3rd Qu.:372079
## Max. :39080335 Max. :194.00 Max. :959026
library(pastecs)
##
## Attaching package: 'pastecs'
## The following objects are masked from 'package:dplyr':
##
## first, last
format(round(stat.desc(myData_sub[,-3])),scientific = FALSE)
## Sales Average_Check Meals_Served
## nbr.val 100 100 100
## nbr.null 0 0 0
## nbr.na 0 0 0
## min 11391678 17 87070
## max 39080335 194 959026
## range 27688657 177 871956
## sum 1783343432 6905 31716666
## median 17300776 66 257097
## mean 17833434 69 317167
## SE.mean 501041 3 19221
## CI.mean.0.95 994174 7 38139
## var 25104188607197 1207 36945218450
## std.dev 5010408 35 192211
## coef.var 0 1 1
#Explanation of three sample statistics: # - The range of sales is $27,688,657, meaning that the highest performing resturant in the data set made $27,688,657 more than the lowest performing resturant. # - The mean of Average_check is $69, meaning that the arithmatic average of a check across every resturant was $69. # - The median Meals_Served was 257,097, meaning that across all resturants 50% of resturants served exactly 257,097 meals or less and 50% of resturants served more than 257,097 meals. The median is a good indicator for the typical amount of meals served as it is unimpacted by any potential outliers.
#Distributions:
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
scatterplotMatrix(myData_sub[,-3], smooth = FALSE)
# The scatterplot matrix shows that Sales, Average_Check, and
Meals_Served are all skewed to the right and have a positive coefficent
of asymmetry - meaning that the distribution of data points across all
three variables are concentrated on the lower end and decrease in
frequency as the respective variable increases. Being skewed to the
right indicates that the mean value of Sales, Average_Check, and
Meals_Served will be greater than the median.
#Sales and Average_Check have a positive relationship: as the price of the average check increases, resturants will make more sales. Sales and Meals_Served also have a positive relationship: as meals served increases, so does sales.
#Average_Check and Meals_served have a negative relationship: as the average price of a meal increases, resturants are expected to serve less meals.