myData <- read.table("./Independence100.csv", header = TRUE, sep = ",")

head(myData)

##   Rank                          Restaurant    Sales Average.Check       City
## 1    1            Carmine's (Times Square) 39080335            40   New York
## 2    2               The Boathouse Orlando 35218364            43   Orlando 
## 3    3                    Old Ebbitt Grill 29104017            33 Washington
## 4    4 LAVO Italian Restaurant & Nightclub 26916180            90   New York
## 5    5            Bryant Park Grill & Cafe 26900000            62   New York
## 6    6            Gibsons Bar & Steakhouse 25409952            80    Chicago
##   State Meals.Served
## 1  N.Y.       469803
## 2  Fla.       820819
## 3  D.C.       892830
## 4  N.Y.       198500
## 5  N.Y.       403000
## 6  Ill.       348567

Explaintion of data:

The data set shows various sales metrics of the top 100 highest grossing individual resturants in the United States.

Unit of observation -Individual resturants

Sample Size -100 resturants

Variables:

- Rank (Categorical Ordinal):Show the rank of a resturant from 1-100 based on total sales.

- Resturant (Categorical Nominal): The name of each individual resturant observed

- Sales (Ratio Nominal): The total sales of a resturant during the period of observation. Measured in $

- Average_Check (Ratio Nominal): The average check at a resturant during the period of observation. Measured in $

- City (Categorical Nominal): The city a resturant is located

- State (Categorical Nominal): The state a resturant is located

- Meals_Served (Ratio Nominal): The total meals a resturant served during the period of observation. Measured in quantity of total meals sold

Source: Kaggle.com (2025),“Resturant Business Rankings 2020”

Data Manipulation:

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

myData <- myData %>%
  rename(Average_Check = Average.Check) %>%
  rename(Meals_Served = Meals.Served)

myData_sub <- myData[,-c(1, 2, 5)]

Descriptive Statistics:

summary(myData_sub[,-3])

##      Sales          Average_Check     Meals_Served   
##  Min.   :11391678   Min.   : 17.00   Min.   : 87070  
##  1st Qu.:14094836   1st Qu.: 39.00   1st Qu.:189492  
##  Median :17300776   Median : 65.50   Median :257097  
##  Mean   :17833434   Mean   : 69.05   Mean   :317167  
##  3rd Qu.:19903916   3rd Qu.: 95.00   3rd Qu.:372079  
##  Max.   :39080335   Max.   :194.00   Max.   :959026

library(pastecs)

## 
## Attaching package: 'pastecs'

## The following objects are masked from 'package:dplyr':
## 
##     first, last

format(round(stat.desc(myData_sub[,-3])),scientific = FALSE)

##                       Sales Average_Check Meals_Served
## nbr.val                 100           100          100
## nbr.null                  0             0            0
## nbr.na                    0             0            0
## min                11391678            17        87070
## max                39080335           194       959026
## range              27688657           177       871956
## sum              1783343432          6905     31716666
## median             17300776            66       257097
## mean               17833434            69       317167
## SE.mean              501041             3        19221
## CI.mean.0.95         994174             7        38139
## var          25104188607197          1207  36945218450
## std.dev             5010408            35       192211
## coef.var                  0             1            1

#Explanation of three sample statistics: # - The range of sales is $27,688,657, meaning that the highest performing resturant in the data set made $27,688,657 more than the lowest performing resturant. # - The mean of Average_check is $69, meaning that the arithmatic average of a check across every resturant was $69. # - The median Meals_Served was 257,097, meaning that across all resturants 50% of resturants served exactly 257,097 meals or less and 50% of resturants served more than 257,097 meals. The median is a good indicator for the typical amount of meals served as it is unimpacted by any potential outliers.

#Distributions:

library(car)

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

scatterplotMatrix(myData_sub[,-3], smooth = FALSE)

# The scatterplot matrix shows that Sales, Average_Check, and Meals_Served are all skewed to the right and have a positive coefficent of asymmetry - meaning that the distribution of data points across all three variables are concentrated on the lower end and decrease in frequency as the respective variable increases. Being skewed to the right indicates that the mean value of Sales, Average_Check, and Meals_Served will be greater than the median.

#Sales and Average_Check have a positive relationship: as the price of the average check increases, resturants will make more sales. Sales and Meals_Served also have a positive relationship: as meals served increases, so does sales.

#Average_Check and Meals_served have a negative relationship: as the average price of a meal increases, resturants are expected to serve less meals.

Joseph Holecek Homework 1 5708

2025-03-25