Question 13.2 was completed in Python (SimPy) rather than R, per the assignment’s allowed tools. The model simulates passengers arriving via a Poisson process (λ1 = 5/min) to an ID/boarding-pass check (modeled as a multi-server resource, exponential service time, mean 0.75 min), followed by routing to the shortest available personal-check queue (uniform service time, 0.5–1 min).
Different combinations of ID-checkers and personal-check queues were tested to find the minimum staffing levels that keep average wait time under 15 minutes.
(You can paste a summary of your results/table here, or insert a
screenshot of your Colab output using:
)
breastcancer <- read.table(
"C:/Users/moham/Downloads/Georgia Tech University/WK6/week 6 Homework-Summer/week 6 data-summer/data 14.1/breast-cancer-wisconsin.txt.txt",
header = FALSE,
na.strings = "?",
sep = ","
)
head(breastcancer)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
## 1 1000025 5 1 1 1 2 1 3 1 1 2
## 2 1002945 5 4 4 5 7 10 3 2 1 2
## 3 1015425 3 1 1 1 2 2 3 1 1 2
## 4 1016277 6 8 8 1 3 4 3 7 1 2
## 5 1017023 4 1 1 3 2 1 3 1 1 2
## 6 1017122 8 10 10 8 7 10 9 7 1 4
dim(breastcancer)
## [1] 699 11
str(breastcancer)
## 'data.frame': 699 obs. of 11 variables:
## $ V1 : int 1000025 1002945 1015425 1016277 1017023 1017122 1018099 1018561 1033078 1033078 ...
## $ V2 : int 5 5 3 6 4 8 1 2 2 4 ...
## $ V3 : int 1 4 1 8 1 10 1 1 1 2 ...
## $ V4 : int 1 4 1 8 1 10 1 2 1 1 ...
## $ V5 : int 1 5 1 1 3 8 1 1 1 1 ...
## $ V6 : int 2 7 2 3 2 7 2 2 2 2 ...
## $ V7 : int 1 10 2 4 1 10 10 1 1 1 ...
## $ V8 : int 3 3 3 3 3 9 3 3 1 2 ...
## $ V9 : int 1 2 1 7 1 7 1 1 1 1 ...
## $ V10: int 1 1 1 1 1 1 1 1 5 1 ...
## $ V11: int 2 2 2 2 2 4 2 2 2 2 ...
summary(breastcancer)
## V1 V2 V3 V4
## Min. : 61634 Min. : 1.000 Min. : 1.000 Min. : 1.000
## 1st Qu.: 870689 1st Qu.: 2.000 1st Qu.: 1.000 1st Qu.: 1.000
## Median : 1171710 Median : 4.000 Median : 1.000 Median : 1.000
## Mean : 1071704 Mean : 4.418 Mean : 3.134 Mean : 3.207
## 3rd Qu.: 1238298 3rd Qu.: 6.000 3rd Qu.: 5.000 3rd Qu.: 5.000
## Max. :13454352 Max. :10.000 Max. :10.000 Max. :10.000
##
## V5 V6 V7 V8
## Min. : 1.000 Min. : 1.000 Min. : 1.000 Min. : 1.000
## 1st Qu.: 1.000 1st Qu.: 2.000 1st Qu.: 1.000 1st Qu.: 2.000
## Median : 1.000 Median : 2.000 Median : 1.000 Median : 3.000
## Mean : 2.807 Mean : 3.216 Mean : 3.545 Mean : 3.438
## 3rd Qu.: 4.000 3rd Qu.: 4.000 3rd Qu.: 6.000 3rd Qu.: 5.000
## Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000
## NA's :16
## V9 V10 V11
## Min. : 1.000 Min. : 1.000 Min. :2.00
## 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.:2.00
## Median : 1.000 Median : 1.000 Median :2.00
## Mean : 2.867 Mean : 1.589 Mean :2.69
## 3rd Qu.: 4.000 3rd Qu.: 1.000 3rd Qu.:4.00
## Max. :10.000 Max. :10.000 Max. :4.00
##
anyNA(breastcancer)
## [1] TRUE
colSums(is.na(breastcancer))
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
## 0 0 0 0 0 0 16 0 0 0 0
sum(is.na(breastcancer))
## [1] 16
There are 16 missing values, all in one column.
colnames(breastcancer) <- c(
"Sample_code_number",
"Clump_Thickness",
"Uniformity_of_Cell_Size",
"Uniformity_of_Cell_Shape",
"Marginal_Adhesion",
"Single_Epithelial_Cell_Size",
"Bare_Nuclei",
"Bland_Chromatin",
"Normal_Nucleoli",
"Mitoses",
"Class"
)
colSums(is.na(breastcancer))
## Sample_code_number Clump_Thickness
## 0 0
## Uniformity_of_Cell_Size Uniformity_of_Cell_Shape
## 0 0
## Marginal_Adhesion Single_Epithelial_Cell_Size
## 0 0
## Bare_Nuclei Bland_Chromatin
## 16 0
## Normal_Nucleoli Mitoses
## 0 0
## Class
## 0
Missing values are confirmed to be only in
Bare_Nuclei.
breastcancer_mean <- breastcancer
mean_bare_nuclei <- mean(breastcancer_mean$Bare_Nuclei, na.rm = TRUE)
breastcancer_mean$Bare_Nuclei[is.na(breastcancer_mean$Bare_Nuclei)] <- mean_bare_nuclei
sum(is.na(breastcancer_mean$Bare_Nuclei))
## [1] 0
mean_bare_nuclei
## [1] 3.544656
breastcancer_reg <- breastcancer
complete_rows <- breastcancer_reg[!is.na(breastcancer_reg$Bare_Nuclei), ]
missing_rows <- breastcancer_reg[is.na(breastcancer_reg$Bare_Nuclei), ]
reg_model <- lm(Bare_Nuclei ~ Clump_Thickness + Uniformity_of_Cell_Size +
Uniformity_of_Cell_Shape + Marginal_Adhesion +
Single_Epithelial_Cell_Size + Bland_Chromatin +
Normal_Nucleoli + Mitoses,
data = complete_rows)
summary(reg_model)
##
## Call:
## lm(formula = Bare_Nuclei ~ Clump_Thickness + Uniformity_of_Cell_Size +
## Uniformity_of_Cell_Shape + Marginal_Adhesion + Single_Epithelial_Cell_Size +
## Bland_Chromatin + Normal_Nucleoli + Mitoses, data = complete_rows)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.7316 -0.9426 -0.3002 0.6725 8.6998
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.616652 0.194975 -3.163 0.00163 **
## Clump_Thickness 0.230156 0.041691 5.521 4.83e-08 ***
## Uniformity_of_Cell_Size -0.067980 0.076170 -0.892 0.37246
## Uniformity_of_Cell_Shape 0.340442 0.073420 4.637 4.25e-06 ***
## Marginal_Adhesion 0.339705 0.045919 7.398 4.13e-13 ***
## Single_Epithelial_Cell_Size 0.090392 0.062541 1.445 0.14883
## Bland_Chromatin 0.320577 0.059047 5.429 7.91e-08 ***
## Normal_Nucleoli 0.007293 0.044486 0.164 0.86983
## Mitoses -0.075230 0.059331 -1.268 0.20524
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.274 on 674 degrees of freedom
## Multiple R-squared: 0.615, Adjusted R-squared: 0.6104
## F-statistic: 134.6 on 8 and 674 DF, p-value: < 2.2e-16
predicted_values <- predict(reg_model, newdata = missing_rows)
predicted_values
## 24 41 140 146 159 165 236 250
## 5.3660666 8.2259101 0.8892805 1.6605574 1.0899300 2.2208736 2.7818889 1.7605617
## 276 293 295 298 316 322 412 618
## 2.1208694 5.8459477 0.9796727 2.3918282 5.5419942 1.7605617 0.8892805 0.5687034
predicted_values_clamped <- round(predicted_values)
predicted_values_clamped[predicted_values_clamped < 1] <- 1
predicted_values_clamped[predicted_values_clamped > 10] <- 10
breastcancer_reg$Bare_Nuclei[is.na(breastcancer_reg$Bare_Nuclei)] <- predicted_values_clamped
sum(is.na(breastcancer_reg$Bare_Nuclei))
## [1] 0
breastcancer_pert <- breastcancer
predicted_values_pert <- predict(reg_model, newdata = missing_rows)
residual_sd <- summary(reg_model)$sigma
set.seed(42)
noise <- rnorm(length(predicted_values_pert), mean = 0, sd = residual_sd)
predicted_values_pert_noisy <- predicted_values_pert + noise
predicted_values_pert_noisy <- round(predicted_values_pert_noisy)
predicted_values_pert_noisy[predicted_values_pert_noisy < 1] <- 1
predicted_values_pert_noisy[predicted_values_pert_noisy > 10] <- 10
breastcancer_pert$Bare_Nuclei[is.na(breastcancer_pert$Bare_Nuclei)] <- predicted_values_pert_noisy
sum(is.na(breastcancer_pert$Bare_Nuclei))
## [1] 0
All three imputed datasets (breastcancer_mean,
breastcancer_reg, breastcancer_pert) now have
zero missing values in Bare_Nuclei.
I work for an insurance company, where the end-to-end auto claims process presents a clear optimization opportunity: minimizing total customer wait time from the moment an accident is reported to the completion of vehicle repair, while working within limited staffing and shop capacity.
This spans three connected stages: (1) call center response when accidents are reported, (2) dispatch/field response time, and (3) repair shop turnaround once a vehicle is dropped off.
Decision variables would include: number of call center staff per shift, number of field adjusters assigned per region, and how repair jobs are scheduled or prioritized at partner shops.
Data needed would include: historical call volume by time of day/day of week, average call handling time and its variance, geographic/time-of-day accident frequency, repair shop capacity (bays, mechanics), historical repair duration by damage severity and part availability, and customer wait time at each stage of the process to identify where the true bottlenecks are.