Dataset Coffee Chain mencatat transaksi penjualan produk kopi dan teh dari sebuah jaringan kedai kopi di berbagai pasar (Central, East, South, West) selama periode Januari 2012 – Desember 2013. Dataset terdiri dari 4.248 baris dan 20 variabel, mencakup data aktual (Sales, Profit, Margin) dan target anggaran (Budget Sales, Budget Profit, Budget Margin).
Insight utama yang dianalisis:
Seberapa jauh kinerja profit aktual setiap produk menyimpang dari target anggaran, dan bagaimana tren profit berkembang dari 2012 ke 2013?
Insight ini relevan karena terdapat produk dengan profit negatif (Green Tea) di tengah tren keseluruhan yang meningkat, sehingga perlu dievaluasi lebih lanjut.
library(readxl)
# Load data
df <- readxl::read_excel("C:/Users/asus/Downloads/Tugas SIM 2025B - Coffee Chain Datasets.xlsx")
# Pastikan kolom Date bertipe Date
df$Date <- as.Date(df$Date)
df$YearMonth <- as.Date(format(df$Date, "%Y-%m-01"))
df$Year <- as.integer(format(df$Date, "%Y"))
df$profit_gap <- df$Profit - df$`Budget Profit` # selisih aktual vs target
str(df)## tibble [4,248 × 23] (S3: tbl_df/tbl/data.frame)
## $ Area Code : num [1:4248] 719 970 970 303 303 720 970 719 970 719 ...
## $ Date : Date[1:4248], format: "2012-01-01" "2012-01-01" ...
## $ Market : chr [1:4248] "Central" "Central" "Central" "Central" ...
## $ Market Size : chr [1:4248] "Major Market" "Major Market" "Major Market" "Major Market" ...
## $ Product : chr [1:4248] "Amaretto" "Colombian" "Decaf Irish Cream" "Green Tea" ...
## $ Product Line : chr [1:4248] "Beans" "Beans" "Beans" "Leaves" ...
## $ Product Type : chr [1:4248] "Coffee" "Coffee" "Coffee" "Tea" ...
## $ State : chr [1:4248] "Colorado" "Colorado" "Colorado" "Colorado" ...
## $ Type : chr [1:4248] "Regular" "Regular" "Decaf" "Regular" ...
## $ Budget COGS : num [1:4248] 90 80 100 30 60 80 140 50 50 40 ...
## $ Budget Margin : num [1:4248] 130 110 140 50 90 130 160 80 70 70 ...
## $ Budget Profit : num [1:4248] 100 80 110 30 70 80 110 20 40 20 ...
## $ Budget Sales : num [1:4248] 220 190 240 80 150 210 300 130 120 110 ...
## $ COGS : num [1:4248] 89 83 95 44 54 72 170 63 60 58 ...
## $ Inventory : num [1:4248] 777 623 821 623 456 ...
## $ Margin : num [1:4248] 130 107 139 56 80 108 171 87 80 72 ...
## $ Marketing : num [1:4248] 24 27 26 14 15 23 47 57 19 22 ...
## $ Profit : num [1:4248] 94 68 101 30 54 53 99 0 33 17 ...
## $ Sales : num [1:4248] 219 190 234 100 134 180 341 150 140 130 ...
## $ Total Expenses: num [1:4248] 36 39 38 26 26 55 72 87 47 55 ...
## $ YearMonth : Date[1:4248], format: "2012-01-01" "2012-01-01" ...
## $ Year : int [1:4248] 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
## $ profit_gap : num [1:4248] -6 -12 -9 0 -16 -27 -11 -20 -7 -3 ...
# Ringkasan statistik variabel kunci
summary(df[, c("Sales", "Profit", "Budget Profit", "Margin", "Marketing", "Total Expenses")])## Sales Profit Budget Profit Margin
## Min. : 17 Min. :-638.0 Min. :-320.00 Min. :-302.00
## 1st Qu.:100 1st Qu.: 17.0 1st Qu.: 20.00 1st Qu.: 52.75
## Median :138 Median : 40.0 Median : 40.00 Median : 76.00
## Mean :193 Mean : 61.1 Mean : 60.91 Mean : 104.29
## 3rd Qu.:230 3rd Qu.: 92.0 3rd Qu.: 80.00 3rd Qu.: 132.00
## Max. :912 Max. : 778.0 Max. : 560.00 Max. : 613.00
## Marketing Total Expenses
## Min. : 0.00 Min. : 10.00
## 1st Qu.: 13.00 1st Qu.: 33.00
## Median : 22.00 Median : 46.00
## Mean : 31.19 Mean : 54.06
## 3rd Qu.: 39.00 3rd Qu.: 65.00
## Max. :156.00 Max. :190.00
produk_summary <- do.call(rbind, lapply(split(df, df$Product), function(x) {
data.frame(
Product = x$Product[1],
n = nrow(x),
Profit = mean(x$Profit, na.rm = TRUE),
BudgetProfit = mean(x$`Budget Profit`, na.rm = TRUE),
ProfitGap = mean(x$profit_gap, na.rm = TRUE),
stringsAsFactors = FALSE
)
}))
produk_summary <- produk_summary[order(produk_summary$ProfitGap), ]
rownames(produk_summary) <- NULL
knitr::kable(produk_summary, digits = 2,
col.names = c("Produk", "N", "Avg Profit", "Avg Budget", "Gap (Aktual - Budget)"),
caption = "Rata-rata Profit Aktual vs Budget per Produk")| Produk | N | Avg Profit | Avg Budget | Gap (Aktual - Budget) |
|---|---|---|---|---|
| Green Tea | 288 | -0.80 | 13.19 | -14.00 |
| Decaf Irish Cream | 384 | 36.43 | 49.64 | -13.21 |
| Caffe Mocha | 480 | 36.83 | 48.50 | -11.67 |
| Amaretto | 192 | 25.47 | 36.56 | -11.09 |
| Colombian | 480 | 116.26 | 120.42 | -4.16 |
| Mint | 192 | 32.05 | 35.31 | -3.26 |
| Decaf Espresso | 408 | 72.31 | 72.21 | 0.10 |
| Caffe Latte | 216 | 52.66 | 46.76 | 5.90 |
| Lemon | 480 | 62.23 | 56.29 | 5.94 |
| Chamomile | 384 | 70.91 | 64.90 | 6.02 |
| Regular Espresso | 72 | 139.79 | 125.83 | 13.96 |
| Darjeeling | 384 | 75.66 | 59.53 | 16.13 |
| Earl Grey | 288 | 83.90 | 61.11 | 22.79 |
Temuan awal:
ps <- produk_summary[order(produk_summary$ProfitGap), ]
colors <- ifelse(ps$ProfitGap >= 0, "#2ecc71", "#e74c3c")
par(mar = c(4, 10, 4, 3))
barplot(
ps$ProfitGap,
names.arg = ps$Product,
horiz = TRUE,
col = colors,
border = NA,
las = 1,
xlab = "Rata-rata Gap Profit (Aktual − Budget)",
main = "Gap Profit Aktual vs Budget per Produk",
cex.names = 0.85
)
abline(v = 0, lty = 2, col = "gray40")
mtext("Nilai positif = profit aktual melampaui target; negatif = di bawah target",
side = 3, line = 0.3, cex = 0.8, col = "gray40")
# Label nilai
text(
x = ps$ProfitGap + ifelse(ps$ProfitGap >= 0, 0.3, -0.3),
y = seq(0.7, by = 1.2, length.out = nrow(ps)),
labels = round(ps$ProfitGap, 1),
cex = 0.75,
adj = ifelse(ps$ProfitGap >= 0, 0, 1)
)
legend("bottomright",
legend = c("Di atas Target", "Di bawah Target"),
fill = c("#2ecc71", "#e74c3c"),
border = NA,
bty = "n",
cex = 0.85)# Agregasi profit per bulan dan tahun
tren <- aggregate(Profit ~ YearMonth + Year, data = df, FUN = sum)
tren <- tren[order(tren$YearMonth), ]
df_2012 <- tren[tren$Year == 2012, ]
df_2013 <- tren[tren$Year == 2013, ]
par(mar = c(4, 5, 4, 2))
plot(
df_2012$YearMonth, df_2012$Profit,
type = "b", col = "#3498db", pch = 16, lwd = 2,
xlab = "Bulan", ylab = "Total Profit",
main = "Tren Total Profit Bulanan: 2012 vs 2013",
ylim = range(tren$Profit) * c(0.9, 1.1),
xaxt = "n",
yaxt = "n"
)
lines(df_2013$YearMonth, df_2013$Profit,
type = "b", col = "#e67e22", pch = 16, lwd = 2)
axis(1, at = df_2012$YearMonth,
labels = format(df_2012$YearMonth, "%b %Y"),
las = 2, cex.axis = 0.7)
axis(2, at = axTicks(2), las = 1,
labels = formatC(axTicks(2), format = "f", big.mark = ".", digits = 0))
mtext("Profit 2013 konsisten lebih tinggi di semua bulan",
side = 3, line = 0.3, cex = 0.8, col = "gray40")
legend("topleft",
legend = c("2012", "2013"),
col = c("#3498db", "#e67e22"),
lwd = 2, pch = 16, bty = "n", cex = 0.9)product_lines <- unique(df$`Product Line`)
par(mfrow = c(1, length(product_lines)), mar = c(5, 4, 4, 1))
fill_colors <- c("Beans" = "#8B4513", "Leaves" = "#228B22")
for (pl in product_lines) {
sub <- df[df$`Product Line` == pl, ]
boxplot(
Profit ~ Market,
data = sub,
col = fill_colors[pl],
border = "gray30",
outline = TRUE,
outpch = 1, outcex = 0.4, outcol = adjustcolor(fill_colors[pl], alpha.f = 0.3),
main = pl,
xlab = "Market",
ylab = if (pl == product_lines[1]) "Profit" else "",
las = 1,
cex.axis = 0.85
)
}
mtext("Distribusi Profit per Market dan Product Line",
side = 3, line = -1.5, outer = TRUE, font = 2, cex = 1)# Paired t-test: profit aktual vs budget profit
t_result <- t.test(df$Profit, df$`Budget Profit`, paired = TRUE)
t_result##
## Paired t-test
##
## data: df$Profit and df$`Budget Profit`
## t = 0.31011, df = 4247, p-value = 0.7565
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.9809752 1.3496193
## sample estimates:
## mean difference
## 0.184322
## Rata-rata profit aktual : 61.1
## Rata-rata budget profit : 60.91
## Selisih rata-rata : 0.18
## p-value : 0.7565
# Regresi profit terhadap sales, marketing, COGS, total expenses
model <- lm(Profit ~ Sales + Marketing + COGS + `Total Expenses`, data = df)
summary(model)##
## Call:
## lm(formula = Profit ~ Sales + Marketing + COGS + `Total Expenses`,
## data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -112.028 -5.748 0.526 5.525 105.228
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.366873 0.705850 3.353 0.000806 ***
## Sales 1.208316 0.003721 324.744 < 2e-16 ***
## Marketing 0.195850 0.040466 4.840 1.35e-06 ***
## COGS -1.241869 0.010250 -121.156 < 2e-16 ***
## `Total Expenses` -1.400421 0.031161 -44.942 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.86 on 4243 degrees of freedom
## Multiple R-squared: 0.9726, Adjusted R-squared: 0.9725
## F-statistic: 3.76e+04 on 4 and 4243 DF, p-value: < 2.2e-16
## Df Sum Sq Mean Sq F value Pr(>F)
## Product 12 4267412 355618 37.97 <2e-16 ***
## Residuals 4235 39666224 9366
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Post-hoc: produk mana saja yang berbeda signifikan?
tukey <- TukeyHSD(anova_model)
tukey_df <- as.data.frame(tukey$Product)
tukey_df$Perbandingan <- rownames(tukey_df)
rownames(tukey_df) <- NULL
tukey_df <- tukey_df[tukey_df$`p adj` < 0.05, ]
tukey_df <- tukey_df[order(tukey_df$`p adj`), ]
tukey_df <- tukey_df[, c("Perbandingan", "diff", "lwr", "upr", "p adj")]
cat("Pasangan produk dengan perbedaan profit signifikan (p < 0.05):\n")## Pasangan produk dengan perbedaan profit signifikan (p < 0.05):
## Jumlah pasangan signifikan: 52
knitr::kable(head(tukey_df, 10), digits = 4,
caption = "10 Pasangan Produk Teratas dengan Perbedaan Profit Signifikan")| Perbandingan | diff | lwr | upr | p adj | |
|---|---|---|---|---|---|
| 25 | Colombian-Caffe Mocha | 79.4292 | 58.7222 | 100.1361 | 0 |
| 45 | Decaf Irish Cream-Colombian | -79.8286 | -101.7917 | -57.8656 | 0 |
| 47 | Green Tea-Colombian | -117.0604 | -140.9707 | -93.1501 | 0 |
| 75 | Regular Espresso-Green Tea | 140.5938 | 98.3259 | 182.8616 | 0 |
| 4 | Colombian-Amaretto | 90.7896 | 63.3969 | 118.1823 | 0 |
| 73 | Lemon-Green Tea | 63.0292 | 39.1189 | 86.9395 | 0 |
| 48 | Lemon-Colombian | -54.0312 | -74.7382 | -33.3243 | 0 |
| 39 | Green Tea-Chamomile | -71.7161 | -96.7221 | -46.7101 | 0 |
| 12 | Regular Espresso-Amaretto | 114.3229 | 69.9920 | 158.6538 | 0 |
| 49 | Mint-Colombian | -84.2062 | -111.5989 | -56.8136 | 0 |
Uji t berpasangan menghasilkan p-value > 0.05, yang berarti secara statistik tidak ada perbedaan signifikan antara rata-rata profit aktual dan budget profit secara keseluruhan. Artinya, perencanaan anggaran profit cukup akurat di level agregat. Namun, analisis per produk menunjukkan variasi yang besar — beberapa produk sangat under-budget (Green Tea, Decaf Irish Cream) dan beberapa melampaui target (Earl Grey, Regular Espresso).
Dari model regresi linear:
ANOVA menunjukkan perbedaan profit yang sangat signifikan antar produk (p < 0.001). Uji Tukey mengkonfirmasi bahwa produk seperti Regular Espresso dan Colombian berbeda secara signifikan dari produk underperformer seperti Green Tea. Ini mengindikasikan perlunya strategi berbeda per produk, bukan pendekatan satu-ukuran-untuk-semua.
Secara agregat, profit aktual Coffee Chain hampir sesuai budget — tidak ada penyimpangan signifikan secara statistik.
Per produk, terdapat kesenjangan besar: Green Tea merugi dan konsisten di bawah budget, sedangkan Regular Espresso dan Earl Grey secara konsisten melampaui target.
Tren 2012 → 2013 menunjukkan pertumbuhan profit yang signifikan (~50% lebih tinggi di 2013), dengan pola musiman yang serupa di kedua tahun.
Driver utama profit adalah volume sales dan efisiensi biaya (COGS & Total Expenses) — strategi yang paling berdampak adalah meningkatkan sales sekaligus mengendalikan COGS.
Rekomendasi: Evaluasi ulang strategi produk Green Tea (harga, biaya, atau penghentian), dan replikasi formula sukses Regular Espresso ke produk lain.