This assignment will give you a few directions, and you can approach it in any way you want (with any package you like). Throughout the assignment, you might encounter problems such as missing values, improper vector/column types, etc. Problem solve to push through these errors.
1. Setup
In the following chunk, load in your libraries, set global chunk options (for clean render), and load in your data. The data we will be using is the pizza.csv file uploaded to canvas on the assignment page.
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
pizza =read_csv("pizza.csv")
New names:
Rows: 1209 Columns: 17
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): area, operator, driver, quality dbl (10): ...1, index, week, weekday,
count, price, delivery_min, temperatu... lgl (2): rabate, wrongpizza date (1):
date
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
# Convert text columns to categoricalpizza = pizza %>%rename("Delivery Time (Minutes)"= delivery_min,"Pizza Temperature"= temperature )
2. Summary Statistics Table
Create a summary stats table that has the following stats: Number of observations, mean/standard deviation (for numeric) and % of data (for categorical). Make sure the variable names are displayed professionally.
Table 1: Summary Statistics for Pizza Delivery Data
Characteristic
N = 1,2091
week
9
88 / 1,177 (7.5%)
10
258 / 1,177 (22%)
11
264 / 1,177 (22%)
12
260 / 1,177 (22%)
13
273 / 1,177 (23%)
14
34 / 1,177 (2.9%)
Unknown
32
weekday
1
144 / 1,177 (12%)
2
117 / 1,177 (9.9%)
3
134 / 1,177 (11%)
4
147 / 1,177 (12%)
5
171 / 1,177 (15%)
6
244 / 1,177 (21%)
7
220 / 1,177 (19%)
Unknown
32
area
Brent
474 / 1,199 (40%)
Camden
344 / 1,199 (29%)
Westminster
381 / 1,199 (32%)
Unknown
10
count
1
108 / 1,197 (9.0%)
2
259 / 1,197 (22%)
3
300 / 1,197 (25%)
4
240 / 1,197 (20%)
5
152 / 1,197 (13%)
6
97 / 1,197 (8.1%)
7
34 / 1,197 (2.8%)
8
7 / 1,197 (0.6%)
Unknown
12
rabate
596 / 1,197 (50%)
Unknown
12
price
49 (22)
Unknown
12
operator
Allanah
367 / 1,201 (31%)
Maria
388 / 1,201 (32%)
Rhonda
446 / 1,201 (37%)
Unknown
8
driver
Butcher
96 / 1,204 (8.0%)
Carpenter
272 / 1,204 (23%)
Carter
234 / 1,204 (19%)
Farmer
117 / 1,204 (9.7%)
Hunter
156 / 1,204 (13%)
Miller
125 / 1,204 (10%)
Taylor
204 / 1,204 (17%)
Unknown
5
delivery_min
26 (11)
temperature
48 (10)
Unknown
39
wine_ordered
187 / 1,197 (16%)
Unknown
12
wine_delivered
163 / 1,197 (14%)
Unknown
12
wrongpizza
83 / 1,205 (6.9%)
Unknown
4
quality
high
496 / 1,008 (49%)
low
156 / 1,008 (15%)
medium
356 / 1,008 (35%)
Unknown
201
1 n / N (%); Mean (SD)
3. Histogram
Produce a histogram of pizza temperature upon delivery. Is the distribution of this variable skewed? If so, which direction and how can you tell?
Explanation: The distribution of pizza temperature upon delivery appears to be slightly left-skewed. Most pizzas are clustered at higher temperatures, while a smaller number of deliveries have much lower temperatures, creating a longer tail on the left side of the distribution.
4. Bar Plot
Produce a bar plot comparing the average time for delivery by area. Explain your findings?
Explanation: The bar plot shows that Westminster has the highest average delivery time, while Brent has the lowest. Camden falls in between the two. This indicates that deliveries take significantly longer in Westminster compared to Brent. This might be because Westminster has worse traffic conditions or may be located further away.
5. Box Plots
Create box plots to for delivery time, comparing the times across the different areas. Explain your findings.
Explanation: The box plot shows that Westminster has the highest median delivery time, while Brent has the lowest. Camden and the NA category fall in between. Westminster also displays more outliers, indicating greater variability and more instances of unusually long delivery times. In contrast, Brent’s delivery times are more tightly clustered, suggesting more consistent and faster deliveries in that area.