Lab Assignment 1

This assignment will give you a few directions, and you can approach it in any way you want (with any package you like). Throughout the assignment, you might encounter problems such as missing values, improper vector/column types, etc. Problem solve to push through these errors.

1. Setup

In the following chunk, load in your libraries, set global chunk options (for clean render), and load in your data. The data we will be using is the pizza.csv file uploaded to canvas on the assignment page.

knitr::opts_chunk$set(
  echo = FALSE,
  message = FALSE,
  warning = FALSE
)

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
pizza = read_csv("pizza.csv")
New names:
Rows: 1209 Columns: 17
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): area, operator, driver, quality dbl (10): ...1, index, week, weekday,
count, price, delivery_min, temperatu... lgl (2): rabate, wrongpizza date (1):
date
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
# Convert text columns to categorical
pizza = pizza %>%
  rename(
    "Delivery Time (Minutes)" = delivery_min,
    "Pizza Temperature" = temperature
  )

2. Summary Statistics Table

Create a summary stats table that has the following stats: Number of observations, mean/standard deviation (for numeric) and % of data (for categorical). Make sure the variable names are displayed professionally.

Table 1: Summary Statistics for Pizza Delivery Data
Characteristic N = 1,2091
week
    9 88 / 1,177 (7.5%)
    10 258 / 1,177 (22%)
    11 264 / 1,177 (22%)
    12 260 / 1,177 (22%)
    13 273 / 1,177 (23%)
    14 34 / 1,177 (2.9%)
    Unknown 32
weekday
    1 144 / 1,177 (12%)
    2 117 / 1,177 (9.9%)
    3 134 / 1,177 (11%)
    4 147 / 1,177 (12%)
    5 171 / 1,177 (15%)
    6 244 / 1,177 (21%)
    7 220 / 1,177 (19%)
    Unknown 32
area
    Brent 474 / 1,199 (40%)
    Camden 344 / 1,199 (29%)
    Westminster 381 / 1,199 (32%)
    Unknown 10
count
    1 108 / 1,197 (9.0%)
    2 259 / 1,197 (22%)
    3 300 / 1,197 (25%)
    4 240 / 1,197 (20%)
    5 152 / 1,197 (13%)
    6 97 / 1,197 (8.1%)
    7 34 / 1,197 (2.8%)
    8 7 / 1,197 (0.6%)
    Unknown 12
rabate 596 / 1,197 (50%)
    Unknown 12
price 49 (22)
    Unknown 12
operator
    Allanah 367 / 1,201 (31%)
    Maria 388 / 1,201 (32%)
    Rhonda 446 / 1,201 (37%)
    Unknown 8
driver
    Butcher 96 / 1,204 (8.0%)
    Carpenter 272 / 1,204 (23%)
    Carter 234 / 1,204 (19%)
    Farmer 117 / 1,204 (9.7%)
    Hunter 156 / 1,204 (13%)
    Miller 125 / 1,204 (10%)
    Taylor 204 / 1,204 (17%)
    Unknown 5
delivery_min 26 (11)
temperature 48 (10)
    Unknown 39
wine_ordered 187 / 1,197 (16%)
    Unknown 12
wine_delivered 163 / 1,197 (14%)
    Unknown 12
wrongpizza 83 / 1,205 (6.9%)
    Unknown 4
quality
    high 496 / 1,008 (49%)
    low 156 / 1,008 (15%)
    medium 356 / 1,008 (35%)
    Unknown 201
1 n / N (%); Mean (SD)

3. Histogram

Produce a histogram of pizza temperature upon delivery. Is the distribution of this variable skewed? If so, which direction and how can you tell?

Explanation: The distribution of pizza temperature upon delivery appears to be slightly left-skewed. Most pizzas are clustered at higher temperatures, while a smaller number of deliveries have much lower temperatures, creating a longer tail on the left side of the distribution.

4. Bar Plot

Produce a bar plot comparing the average time for delivery by area. Explain your findings?

Explanation: The bar plot shows that Westminster has the highest average delivery time, while Brent has the lowest. Camden falls in between the two. This indicates that deliveries take significantly longer in Westminster compared to Brent. This might be because Westminster has worse traffic conditions or may be located further away.

5. Box Plots

Create box plots to for delivery time, comparing the times across the different areas. Explain your findings.

Explanation: The box plot shows that Westminster has the highest median delivery time, while Brent has the lowest. Camden and the NA category fall in between. Westminster also displays more outliers, indicating greater variability and more instances of unusually long delivery times. In contrast, Brent’s delivery times are more tightly clustered, suggesting more consistent and faster deliveries in that area.