Is there a relationship between delivery time and customer rating in food delivery services?
Each row represents one food delivery order. The dataset contains multiple orders (approximately 1000+ cases).
The data is secondary data obtained from a Kaggle dataset. It is a synthetic dataset simulating real-world food delivery operations.
This is an observational study because no variables are controlled or manipulated.
Kaggle – Synthetic Food Delivery Dataset
Rating_Pelanggan (Customer Rating) – Numerical
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data <- read.csv("~/Downloads/synthetic_fooddelivery_dataset.csv")
str(data)
## 'data.frame': 8500 obs. of 11 variables:
## $ ID_Pesanan : chr "ORD-2024-000001" "ORD-2024-000002" "ORD-2024-000003" "ORD-2024-000004" ...
## $ Waktu_Transaksi : chr "2024-03-22 13:15:14" "2024-01-14 17:05:37" "2024-01-04 12:32:38" "2024-01-26 22:34:26" ...
## $ Kategori_Menu : chr "Kopi" "Mie" "Martabak" "Kopi" ...
## $ Harga_Pesanan : int 9000 21000 33500 13500 47500 59500 60000 18500 37500 38500 ...
## $ Jarak_Kirim_KM : num NA 3.74 12.68 2.34 0.95 ...
## $ Waktu_Tunggu_Menit: int 27 37 49 20 27 13 43 12 17 12 ...
## $ Rating_Pelanggan : num 4 NA NA 5 4 5 NA 3 4 5 ...
## $ Ulasan_Teks : chr "Sesuai pesanan" "" "" "Mantap gan!" ...
## $ Status_Promo : chr "False" "False" "True" "False" ...
## $ Tingkat_Keluhan : chr "Tidak Ada" "Tidak Ada" "Rendah" "Tidak Ada" ...
## $ Status_Pesanan : chr "Selesai" "Selesai" "Selesai" "Selesai" ...
summary(data)
## ID_Pesanan Waktu_Transaksi Kategori_Menu Harga_Pesanan
## Length:8500 Length:8500 Length:8500 Min. : 0
## Class :character Class :character Class :character 1st Qu.: 19000
## Mode :character Mode :character Mode :character Median : 29000
## Mean : 113474
## 3rd Qu.: 45000
## Max. :8302000
##
## Jarak_Kirim_KM Waktu_Tunggu_Menit Rating_Pelanggan Ulasan_Teks
## Min. : 0.5000 Min. : 5.0 Min. :1.000 Length:8500
## 1st Qu.: 0.9008 1st Qu.: 13.0 1st Qu.:4.000 Class :character
## Median : 2.1419 Median : 21.0 Median :4.000 Mode :character
## Mean : 3.0954 Mean : 22.7 Mean :4.175
## 3rd Qu.: 4.1569 3rd Qu.: 30.0 3rd Qu.:5.000
## Max. :25.0000 Max. :112.0 Max. :5.000
## NA's :595 NA's :1700
## Status_Promo Tingkat_Keluhan Status_Pesanan
## Length:8500 Length:8500 Length:8500
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
ggplot(data, aes(x = Waktu_Tunggu_Menit, y = Rating_Pelanggan)) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "Waiting Time vs Customer Rating")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 1700 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 1700 rows containing missing values or values outside the scale range
## (`geom_point()`).
names(data)
## [1] "ID_Pesanan" "Waktu_Transaksi" "Kategori_Menu"
## [4] "Harga_Pesanan" "Jarak_Kirim_KM" "Waktu_Tunggu_Menit"
## [7] "Rating_Pelanggan" "Ulasan_Teks" "Status_Promo"
## [10] "Tingkat_Keluhan" "Status_Pesanan"
mean(data$Waktu_Tunggu_Menit, na.rm = TRUE)
## [1] 22.69694
mean(data$Rating_Pelanggan, na.rm = TRUE)
## [1] 4.175294
cor(data$Waktu_Tunggu_Menit, data$Rating_Pelanggan, use = "complete.obs")
## [1] -0.2627982