#Project Summary
Vivendo, a fast-food chain operating across Brazil, often handles compensation claims related to food poisoning. To support improvements in customer satisfaction, this analysis explored how long it takes to close claims across four legal office locations: Recife, São LuÃs, Fortaleza, and Natal. The dataset included 2,000 records and 8 fields, which were cleaned and preprocessed to ensure accuracy in the analysis. Key issues such as missing values in amount_paid and linked_cases, and inconsistent labeling in the cause field were addressed.
The findings revealed that Recife handles the highest number of claims, while Natal receives the fewest. Through visualizations such as histograms and boxplots, it became evident that most claims are closed within 200 days, though some outliers exceed 300 days. When comparing the claim closure times across locations (after removing outliers), São LuÃs demonstrated the most consistent and quickest response times, followed closely by Recife. In contrast, Natal exhibited greater variability, suggesting inconsistency in claim handling. These insights can guide the legal team in identifying best practices and improving efficiency in offices with longer or inconsistent closure times.
# Set working directory and load required package
knitr::opts_chunk$set(echo = TRUE)
setwd("C:/Users/Yvonne/Downloads")
#LOAD LIBRARY
library(ggplot2)
# Load the dataset
claim <- read.csv("food_claims_2212_cleaned.csv", header = TRUE, sep = ",")
# Preview first few rows
head(claim)
## claim_id time_to_close claim_amount amount_paid location
## 1 1 317 74474.55 51231.37 RECIFE
## 2 2 195 52137.83 42111.30 FORTALEZA
## 3 3 183 24447.20 23986.30 SAO LUIS
## 4 4 186 29006.28 27942.72 FORTALEZA
## 5 5 138 19520.60 16251.06 RECIFE
## 6 6 183 47529.14 38011.98 NATAL
## individuals_on_claim linked_cases cause
## 1 15 FALSE unknown
## 2 12 TRUE unknown
## 3 10 TRUE meat
## 4 11 FALSE meat
## 5 11 FALSE vegetable
## 6 11 FALSE unknown
# Plot: Total claims per location
#The bar chart below shows the total claims filed at each of the four legal offices: Recife, Sao Luis, Fortaleza, and Natal. It is evident from the graph that Recife has the highest number of claims, followed by Sao Luis, Fortaleza, and finally, Natal.
#The distribution of the graph is not even since the highest visited office location(RECIFE) count is almost 3 times the lowest visited office location(NATAL). It is therefore advisable for the legal team to investigate why they prefer reporting at RECIFE.
ggplot(data = claim, aes(x = location)) +
geom_bar(stat = "count") +
ggtitle("TOTAL CLAIMS PER LOCATION") +
theme(plot.title = element_text(hjust = 0.5))
# Plot: Distribution of time to close
#In order for people to be compensated quickly,the number of days it takes for the closing of claims should also be considered since the longer the days are taken to report complains may also mean that the days for compensation will be pushed further. The histogram below shows the distribution of number of days it takes to close claims.
#The majority of the claims take less than 200 days before they're closed,the rest exceed 200 days.From the histogram,we can say that the distribution of time to close claims is right-skewed and that there exists outliers exceeding 300 days.
ggplot(data = claim, aes(x = time_to_close)) +
geom_histogram(binwidth = 10) +
ggtitle("DISTRIBUTION OF TIME TO CLOSE CLAIMS") +
theme(plot.title = element_text(hjust = 0.5))
# Plot: Time to close by location
# So far,we know that RECIFE legal office has received a lot of claim reports for food poisoning. In order to make a comparison we'll have to add information on number of days it takes to close claims.
#A boxplot is useful when comparing distribution of a continuous(time_to_close) variable for each category(4 locations).
#From the boxplot,the Interquartile range and median seem to be closer hence difficult to make comparisons.This is due to presence of outliers.
ggplot(data = claim, aes(x = location, y = time_to_close)) +
geom_boxplot() +
ggtitle("TIME TO CLOSE BY LOCATION") +
theme(plot.title = element_text(hjust = 0.5))
# Remove outliers using IQR method
#In order to ease comparisons,the outliers were removed using IQR method and a new boxplot was plotted as shown below.
#The Interquartile range(IQR) for the distributions indicates that SAO LUIS has the smallest range followed by RECIFE,then FORTALEZA and the one with the highest IQR is NATAL.
#The IQR for time to close claims in RECIFE is lower compared to that of FORTALEZA and NATAL offices even though it showed the highest number of claims.This may suggest that the number of days it takes to close the claims is lower compared to the two legal offices(FORTALEZA and NATAL). Higher IQR indicates that values in the middle are spread out therefore less consistency while lower IQR values show that middle values are clustered therefore more reliable and consistent results.In this case areas with lower IQR include RECIFE and SAO LUIS.
Q1 <- quantile(claim$time_to_close, 0.25)
Q3 <- quantile(claim$time_to_close, 0.75)
IQR_val <- Q3 - Q1
cleaned_claim <- subset(claim, time_to_close > (Q1 - 1.5 * IQR_val) & time_to_close < (Q3 + 1.5 * IQR_val))
# Check the cleaned dataset
dim(cleaned_claim)
## [1] 1877 8
summary(cleaned_claim)
## claim_id time_to_close claim_amount amount_paid
## Min. : 2 Min. : 90 Min. : 1760 Min. : 1517
## 1st Qu.: 508 1st Qu.:156 1st Qu.:13203 1st Qu.:10824
## Median :1008 Median :177 Median :23832 Median :19406
## Mean :1006 Mean :178 Mean :25775 Mean :20582
## 3rd Qu.:1506 3rd Qu.:196 3rd Qu.:36693 3rd Qu.:29225
## Max. :2000 Max. :272 Max. :76107 Max. :52499
## location individuals_on_claim linked_cases cause
## Length:1877 Min. : 1.000 Mode :logical Length:1877
## Class :character 1st Qu.: 4.000 FALSE:1396 Class :character
## Mode :character Median : 8.000 TRUE :481 Mode :character
## Mean : 7.806
## 3rd Qu.:11.000
## Max. :15.000
# Plot: Cleaned time to close by location
# From the results below,the legal team should focus on locations that have less than 200 days to close but also take into account those locations that have a time to close of more than 200 days.This will enable it to work on ways to reduce the time it takes before closing claims.This in turn may reduce the time it takes before compensating customers.
ggplot(data = cleaned_claim, aes(x = location, y = time_to_close)) +
geom_boxplot() +
ggtitle("TIME TO CLOSE BY LOCATION (CLEANED)") +
theme(plot.title = element_text(hjust = 0.5))
# Export cleaned dataset
write.csv(cleaned_claim, file = "cleaned_claim.csv", row.names = FALSE)