Research Question

Is there a relationship between delivery time and customer rating in food delivery services?

Cases and Sample Size

Each row represents one food delivery order. The dataset contains multiple orders (approximately 1000+ cases).

Method of Data Collection

The data is secondary data obtained from a Kaggle dataset. It is a synthetic dataset simulating real-world food delivery operations.

Type of Study

This is an observational study because no variables are controlled or manipulated.

Data Source

Kaggle – Synthetic Food Delivery Dataset

Response Variable

Rating_Pelanggan (Customer Rating) – Numerical

Explanatory Variables

Summary Statistics

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data <- read.csv("~/Downloads/synthetic_fooddelivery_dataset.csv")

str(data)
## 'data.frame':    8500 obs. of  11 variables:
##  $ ID_Pesanan        : chr  "ORD-2024-000001" "ORD-2024-000002" "ORD-2024-000003" "ORD-2024-000004" ...
##  $ Waktu_Transaksi   : chr  "2024-03-22 13:15:14" "2024-01-14 17:05:37" "2024-01-04 12:32:38" "2024-01-26 22:34:26" ...
##  $ Kategori_Menu     : chr  "Kopi" "Mie" "Martabak" "Kopi" ...
##  $ Harga_Pesanan     : int  9000 21000 33500 13500 47500 59500 60000 18500 37500 38500 ...
##  $ Jarak_Kirim_KM    : num  NA 3.74 12.68 2.34 0.95 ...
##  $ Waktu_Tunggu_Menit: int  27 37 49 20 27 13 43 12 17 12 ...
##  $ Rating_Pelanggan  : num  4 NA NA 5 4 5 NA 3 4 5 ...
##  $ Ulasan_Teks       : chr  "Sesuai pesanan" "" "" "Mantap gan!" ...
##  $ Status_Promo      : chr  "False" "False" "True" "False" ...
##  $ Tingkat_Keluhan   : chr  "Tidak Ada" "Tidak Ada" "Rendah" "Tidak Ada" ...
##  $ Status_Pesanan    : chr  "Selesai" "Selesai" "Selesai" "Selesai" ...
summary(data)
##   ID_Pesanan        Waktu_Transaksi    Kategori_Menu      Harga_Pesanan    
##  Length:8500        Length:8500        Length:8500        Min.   :      0  
##  Class :character   Class :character   Class :character   1st Qu.:  19000  
##  Mode  :character   Mode  :character   Mode  :character   Median :  29000  
##                                                           Mean   : 113474  
##                                                           3rd Qu.:  45000  
##                                                           Max.   :8302000  
##                                                                            
##  Jarak_Kirim_KM    Waktu_Tunggu_Menit Rating_Pelanggan Ulasan_Teks       
##  Min.   : 0.5000   Min.   :  5.0      Min.   :1.000    Length:8500       
##  1st Qu.: 0.9008   1st Qu.: 13.0      1st Qu.:4.000    Class :character  
##  Median : 2.1419   Median : 21.0      Median :4.000    Mode  :character  
##  Mean   : 3.0954   Mean   : 22.7      Mean   :4.175                      
##  3rd Qu.: 4.1569   3rd Qu.: 30.0      3rd Qu.:5.000                      
##  Max.   :25.0000   Max.   :112.0      Max.   :5.000                      
##  NA's   :595                          NA's   :1700                       
##  Status_Promo       Tingkat_Keluhan    Status_Pesanan    
##  Length:8500        Length:8500        Length:8500       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
## 
ggplot(data, aes(x = Waktu_Tunggu_Menit, y = Rating_Pelanggan)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(title = "Waiting Time vs Customer Rating")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 1700 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 1700 rows containing missing values or values outside the scale range
## (`geom_point()`).

names(data)
##  [1] "ID_Pesanan"         "Waktu_Transaksi"    "Kategori_Menu"     
##  [4] "Harga_Pesanan"      "Jarak_Kirim_KM"     "Waktu_Tunggu_Menit"
##  [7] "Rating_Pelanggan"   "Ulasan_Teks"        "Status_Promo"      
## [10] "Tingkat_Keluhan"    "Status_Pesanan"
mean(data$Waktu_Tunggu_Menit, na.rm = TRUE)
## [1] 22.69694
mean(data$Rating_Pelanggan, na.rm = TRUE)
## [1] 4.175294
cor(data$Waktu_Tunggu_Menit, data$Rating_Pelanggan, use = "complete.obs")
## [1] -0.2627982