Data Science Programming
Prakticum Week 4
1 Dynamic Multi-Formula Function
1.1 Introduction
This task builds a function compute_formula(x, formula)
that calculates four mathematical formulas — linear, quadratic,
cubic, and exponential — for x = 1 to 20. Nested loops are used
to compute all formulas at once, with input validation to ensure only
valid formula names are accepted. Results are displayed in a table and
visualized in a single combined plot.
1.2 Function Definition
# ── Fungsi compute_formula ─────────────────────────────
compute_formula <- function(x, formula) {
# Validasi input formula
valid_formulas <- c("linear", "quadratic", "cubic", "exponential")
if (!(formula %in% valid_formulas)) {
stop(paste("Invalid formula. Choose from:", paste(valid_formulas, collapse = ", ")))
}
# Hitung y berdasarkan formula yang dipilih
if (formula == "linear") return(2 * x + 1)
if (formula == "quadratic") return(x^2 + 3 * x + 2)
if (formula == "cubic") return(x^3 - 2*x^2 + x + 5)
if (formula == "exponential") return(exp(0.3 * x))
}1.3 Compute All Formulas Using Nested Loops
x_values <- 1:20
formulas <- c("linear", "quadratic", "cubic", "exponential")
results <- list()
# Nested loop: per formula → per nilai x
for (formula in formulas) {
y_values <- c()
for (x in x_values) {
y <- compute_formula(x, formula)
y_values <- c(y_values, y)
}
results[[formula]] <- y_values
}
# Buat data frame
df_results <- data.frame(
x = x_values,
Linear = round(results[["linear"]], 2),
Quadratic = round(results[["quadratic"]], 2),
Cubic = round(results[["cubic"]], 2),
Exponential = round(results[["exponential"]], 4)
)
# Tampilkan tabel dengan kableExtra
knitr::kable(df_results,
col.names = c("x", "Linear", "Quadratic", "Cubic", "Exponential"),
caption = "Table 1: Computed Formula Values for x = 1 to 20",
format = "html") %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = TRUE) %>%
row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
row_spec(seq(2, nrow(df_results), 2), background = "#fcd5d5")| x | Linear | Quadratic | Cubic | Exponential |
|---|---|---|---|---|
| 1 | 3 | 6 | 5 | 1.3499 |
| 2 | 5 | 12 | 7 | 1.8221 |
| 3 | 7 | 20 | 17 | 2.4596 |
| 4 | 9 | 30 | 41 | 3.3201 |
| 5 | 11 | 42 | 85 | 4.4817 |
| 6 | 13 | 56 | 155 | 6.0496 |
| 7 | 15 | 72 | 257 | 8.1662 |
| 8 | 17 | 90 | 397 | 11.0232 |
| 9 | 19 | 110 | 581 | 14.8797 |
| 10 | 21 | 132 | 815 | 20.0855 |
| 11 | 23 | 156 | 1105 | 27.1126 |
| 12 | 25 | 182 | 1457 | 36.5982 |
| 13 | 27 | 210 | 1877 | 49.4024 |
| 14 | 29 | 240 | 2371 | 66.6863 |
| 15 | 31 | 272 | 2945 | 90.0171 |
| 16 | 33 | 306 | 3605 | 121.5104 |
| 17 | 35 | 342 | 4357 | 164.0219 |
| 18 | 37 | 380 | 5207 | 221.4064 |
| 19 | 39 | 420 | 6161 | 298.8674 |
| 20 | 41 | 462 | 7225 | 403.4288 |
1.4 Visualization
y_max <- max(unlist(results))
y_min <- min(unlist(results))
colors <- c("linear" = "steelblue",
"quadratic" = "limegreen",
"cubic" = "tomato",
"exponential" = "#7f1d1d")
plot(x_values, results[["linear"]],
type = "b", col = colors["linear"],
lwd = 2.5, pch = 16, cex = 1,
ylim = c(y_min, y_max),
xlab = "x values (1 to 20)",
ylab = "y values",
main = "Comparison of Four Mathematical Formulas (x = 1 to 20)",
cex.main = 1.3, cex.lab = 1.1)
for (formula in c("quadratic", "cubic", "exponential")) {
lines(x_values, results[[formula]],
type = "b", col = colors[formula],
lwd = 2.5, pch = 16, cex = 1)
}
grid(lty = "dashed", col = "gray85")
legend("topleft",
legend = c("Linear", "Quadratic", "Cubic", "Exponential"),
col = colors,
lwd = 2.5, pch = 16, bty = "n", cex = 1)1.5 Interpretation
The plot clearly shows that each formula exhibits a distinctly different growth pattern as x increases from 1 to 20. The exponential formula grows the fastest, dominating all other formulas and rising steeply toward large values. The cubic formula also shows strong growth but at a notably slower rate than exponential. The quadratic formula grows moderately, while the linear formula appears nearly flat in comparison to the others. This highlights how the mathematical structure of a model directly determines the scale and rate of its output growth.
2 Nested Simulation - Multi-Sales & Discounts
2.1 Introduction
This task simulates daily sales data for multiple salespersons over
several days. A nested function apply_discount() calculates
conditional discount rates based on sales amount, while a
cumulative_sales() function tracks running totals per
salesperson. Results include summary statistics and cumulative sales
visualization.
2.2 Function Definition
# ── Fungsi apply_discount (nested) ────────────────────
apply_discount <- function(sales_amount) {
if (sales_amount >= 900) return(0.20)
else if (sales_amount >= 700) return(0.15)
else if (sales_amount >= 500) return(0.10)
else if (sales_amount >= 300) return(0.05)
else return(0.00)
}
# ── Fungsi cumulative sales (nested) ──────────────────
cumulative_sales <- function(sales_list) {
total <- 0
result <- c()
for (s in sales_list) {
total <- total + s
result <- c(result, total)
}
return(result)
}
# ── Fungsi utama simulate_sales ───────────────────────
simulate_sales <- function(n_salesperson, days) {
data <- data.frame()
# Loop per salesperson
for (sp_id in 1:n_salesperson) {
daily_sales <- c()
# Loop per hari
for (day in 1:days) {
sales_amount <- sample(100:1000, 1)
discount_rate <- apply_discount(sales_amount)
net_sales <- sales_amount * (1 - discount_rate)
daily_sales <- c(daily_sales, sales_amount)
data <- rbind(data, data.frame(
sales_id = sp_id,
day = day,
sales_amount = sales_amount,
discount_rate = discount_rate,
net_sales = round(net_sales, 2)
))
}
cat("Salesperson", sp_id, "- Total Kumulatif:", sum(daily_sales), "\n")
}
return(data)
}2.3 Simulate Sales Data
## Salesperson 1 - Total Kumulatif: 3587
## Salesperson 2 - Total Kumulatif: 6300
## Salesperson 3 - Total Kumulatif: 6411
## Salesperson 4 - Total Kumulatif: 4224
## Salesperson 5 - Total Kumulatif: 5466
knitr::kable(head(df_sales, 15),
col.names = c("Sales ID", "Day", "Sales Amount", "Discount Rate", "Net Sales"),
caption = "Table 2: Sales Simulation Data (First 15 Rows)",
format = "html") %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = TRUE) %>%
row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
row_spec(seq(2, 15, 2), background = "#fcd5d5")| Sales ID | Day | Sales Amount | Discount Rate | Net Sales |
|---|---|---|---|---|
| 1 | 1 | 660 | 0.10 | 594.00 |
| 1 | 2 | 420 | 0.05 | 399.00 |
| 1 | 3 | 252 | 0.00 | 252.00 |
| 1 | 4 | 173 | 0.00 | 173.00 |
| 1 | 5 | 327 | 0.05 | 310.65 |
| 1 | 6 | 245 | 0.00 | 245.00 |
| 1 | 7 | 733 | 0.15 | 623.05 |
| 1 | 8 | 148 | 0.00 | 148.00 |
| 1 | 9 | 227 | 0.00 | 227.00 |
| 1 | 10 | 402 | 0.05 | 381.90 |
| 2 | 1 | 123 | 0.00 | 123.00 |
| 2 | 2 | 938 | 0.20 | 750.40 |
| 2 | 3 | 455 | 0.05 | 432.25 |
| 2 | 4 | 700 | 0.15 | 595.00 |
| 2 | 5 | 264 | 0.00 | 264.00 |
2.4 Summary Statistics
summary_sales <- aggregate(
cbind(sales_amount, discount_rate, net_sales) ~ sales_id,
data = df_sales,
FUN = function(x) round(mean(x), 2)
)
names(summary_sales) <- c("Sales ID", "Avg Sales", "Avg Discount", "Avg Net Sales")
knitr::kable(summary_sales,
caption = "Table 3: Summary Statistics per Salesperson",
format = "html") %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = TRUE) %>%
row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
row_spec(seq(2, nrow(summary_sales), 2), background = "#fcd5d5")| Sales ID | Avg Sales | Avg Discount | Avg Net Sales |
|---|---|---|---|
| 1 | 358.7 | 0.04 | 335.36 |
| 2 | 630.0 | 0.12 | 537.07 |
| 3 | 641.1 | 0.12 | 547.84 |
| 4 | 422.4 | 0.06 | 390.29 |
| 5 | 546.6 | 0.09 | 486.40 |
2.5 Visualization
par(mar = c(5, 5, 4, 2),
cex.main = 1.6,
cex.lab = 1.4,
cex.axis = 1.2)
colors <- c("steelblue", "limegreen", "tomato", "#7f1d1d", "orange")
# Hitung semua cumulative sales dulu untuk ylim
all_cum <- c()
for (i in 1:5) {
sp_data <- df_sales[df_sales$sales_id == i, ]
all_cum <- c(all_cum, cumsum(sp_data$sales_amount))
}
# Plot salesperson pertama
sp1 <- df_sales[df_sales$sales_id == 1, ]
cum_sp1 <- cumsum(sp1$sales_amount)
plot(1:10, cum_sp1,
type = "b", col = colors[1],
lwd = 3, pch = 16, cex = 1.3,
ylim = c(0, max(all_cum) * 1.1),
xlab = "Day",
ylab = "Cumulative Sales",
main = "Cumulative Sales per Salesperson (10 Days)")
# Loop salesperson lainnya
for (i in 2:5) {
sp_data <- df_sales[df_sales$sales_id == i, ]
cum_sp <- cumsum(sp_data$sales_amount)
lines(1:10, cum_sp,
type = "b", col = colors[i],
lwd = 3, pch = 16, cex = 1.3)
}
grid(lty = "dashed", col = "gray85")
legend("topleft",
legend = paste("Salesperson", 1:5),
col = colors,
lwd = 3, pch = 16, bty = "n", cex = 1.1)2.6 Interpretation
The simulation shows sales performance across 5 salespersons over 10 days. Cumulative sales grow steadily for all salespersons, reflecting consistent daily transactions. Higher discount rates are applied automatically when sales amounts exceed certain thresholds, which reduces net sales but incentivizes higher volume transactions. Differences in cumulative totals across salespersons reflect the natural variation in randomly generated daily sales.
3 Multi-Level Performance Categorization
3.1 Introduction
This task builds a function categorize_performance()
that classifies sales amounts into 5 performance levels —
Excellent, Very Good, Good, Average, and Poor. A loop
iterates through each value, calculates the percentage per category, and
results are visualized using a bar chart and pie chart.
3.2 Function Definition
# ── Fungsi kategorisasi performa ──────────────────────
categorize_performance <- function(sales_amount) {
categories <- c()
# Loop per nilai sales
for (sales in sales_amount) {
if (sales >= 900) categories <- c(categories, "Excellent")
else if (sales >= 700) categories <- c(categories, "Very Good")
else if (sales >= 500) categories <- c(categories, "Good")
else if (sales >= 300) categories <- c(categories, "Average")
else categories <- c(categories, "Poor")
}
return(categories)
}3.3 Categorize Performance Data
set.seed(42)
sales_data <- sample(100:1000, 200, replace = TRUE)
performance <- categorize_performance(sales_data)
df_perf <- data.frame(
sales_amount = sales_data,
performance = performance
)
# Hitung jumlah dan persentase per kategori
category_order <- c("Excellent", "Very Good", "Good", "Average", "Poor")
counts <- sapply(category_order, function(cat) sum(df_perf$performance == cat))
percentages <- round(counts / length(performance) * 100, 2)
df_summary <- data.frame(
Category = category_order,
Count = counts,
Percentage = paste0(percentages, "%")
)
knitr::kable(df_summary,
caption = "Table 4: Performance Category Distribution",
format = "html",
row.names = FALSE) %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = TRUE) %>%
row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
row_spec(seq(2, nrow(df_summary), 2), background = "#fcd5d5")| Category | Count | Percentage |
|---|---|---|
| Excellent | 21 | 10.5% |
| Very Good | 49 | 24.5% |
| Good | 38 | 19% |
| Average | 49 | 24.5% |
| Poor | 43 | 21.5% |
3.4 Visualization
par(mfrow = c(1, 2),
mar = c(5, 5, 4, 2),
cex.main = 1.4,
cex.lab = 1.2,
cex.axis = 1.1)
colors_cat <- c("Excellent" = "#7f1d1d",
"Very Good" = "#b91c1c",
"Good" = "#ef4444",
"Average" = "#fca5a5",
"Poor" = "#fee2e2")
# ── Bar Chart ──────────────────────────────────────────
barplot(counts,
names.arg = category_order,
col = colors_cat,
border = "white",
main = "Performance Category Distribution\n(Bar Chart)",
xlab = "Category",
ylab = "Count",
ylim = c(0, max(counts) * 1.2))
# Tambahkan label di atas bar
text(x = seq(0.7, by = 1.2, length.out = 5),
y = counts + 1.5,
labels = paste0(counts, "\n(", percentages, "%)"),
cex = 1, font = 2)
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")
# ── Pie Chart ──────────────────────────────────────────
pie(counts,
labels = paste0(category_order, "\n", percentages, "%"),
col = colors_cat,
main = "Performance Category Distribution\n(Pie Chart)",
cex = 1.1)3.5 Interpretation
The distribution of performance categories reveals the overall sales quality across 200 data points. The Good and Average categories tend to dominate, indicating that most sales fall within the mid-range. Excellent performers represent the top tier with sales above 900, while Poor performers fall below 300. This categorization helps identify which segments need improvement and which are performing well, enabling more targeted sales strategies and performance evaluations.
4 Multi-Company Dataset Simulation
4.1 Introduction
This task simulates a multi-company employee dataset using nested
loops. The function generate_company_data() generates
employee records including salary, department, performance score, and
KPI score for each company. Employees with KPI > 90
are flagged as top performers. Results are summarized per company and
visualized through multiple plots.
4.2 Function Definition
# ── Fungsi generate_company_data ──────────────────────
generate_company_data <- function(n_company, n_employees) {
departments <- c("HR", "Finance", "Engineering", "Marketing", "Operations")
data <- data.frame()
# Nested loop: per perusahaan → per karyawan
for (company_id in 1:n_company) {
for (emp_num in 1:n_employees) {
salary <- sample(4000:20000, 1)
department <- sample(departments, 1)
performance_score <- round(runif(1, 50, 100), 2)
KPI_score <- round(runif(1, 50, 100), 2)
is_top_performer <- KPI_score > 90
data <- rbind(data, data.frame(
company_id = paste0("Company ", company_id),
employee_id = paste0("C", company_id, "_E", sprintf("%03d", emp_num)),
salary = salary,
department = department,
performance_score = performance_score,
KPI_score = KPI_score,
is_top_performer = is_top_performer
))
}
}
return(data)
}4.3 Generate & Display Data
set.seed(42)
df_company <- generate_company_data(n_company = 5, n_employees = 30)
knitr::kable(head(df_company[, -7], 10),
col.names = c("Company ID", "Employee ID", "Salary",
"Department", "Performance Score", "KPI Score"),
caption = "Table 5: Company Dataset (First 10 Rows)",
format = "html") %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = TRUE) %>%
row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
row_spec(seq(2, 10, 2), background = "#fcd5d5")| Company ID | Employee ID | Salary | Department | Performance Score | KPI Score |
|---|---|---|---|---|---|
| Company 1 | C1_E001 | 14800 | Operations | 64.31 | 91.52 |
| Company 1 | C1_E002 | 13289 | Marketing | 86.83 | 56.73 |
| Company 1 | C1_E003 | 14288 | Marketing | 73.11 | 97.00 |
| Company 1 | C1_E004 | 18957 | Marketing | 73.75 | 78.02 |
| Company 1 | C1_E005 | 14094 | Engineering | 99.44 | 97.33 |
| Company 1 | C1_E006 | 9402 | Marketing | 69.51 | 95.29 |
| Company 1 | C1_E007 | 16908 | Operations | 86.88 | 90.55 |
| Company 1 | C1_E008 | 13051 | Engineering | 91.65 | 50.37 |
| Company 1 | C1_E009 | 17609 | Engineering | 71.79 | 51.87 |
| Company 1 | C1_E010 | 18649 | Marketing | 94.39 | 82.00 |
4.4 Summary per Company
summary_company <- do.call(rbind, lapply(unique(df_company$company_id), function(comp) {
subset_df <- df_company[df_company$company_id == comp, ]
data.frame(
Company = comp,
Total_Employees= nrow(subset_df),
Avg_Salary = round(mean(subset_df$salary), 2),
Avg_Performance= round(mean(subset_df$performance_score), 2),
Max_KPI = round(max(subset_df$KPI_score), 2),
Top_Performers = sum(subset_df$is_top_performer)
)
}))
knitr::kable(summary_company,
col.names = c("Company", "Total Employees", "Avg Salary",
"Avg Performance", "Max KPI", "Top Performers"),
caption = "Table 6: Summary Statistics per Company",
format = "html",
row.names = FALSE) %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = TRUE) %>%
row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
row_spec(seq(2, nrow(summary_company), 2), background = "#fcd5d5")| Company | Total Employees | Avg Salary | Avg Performance | Max KPI | Top Performers |
|---|---|---|---|---|---|
| Company 1 | 30 | 13077.33 | 77.37 | 99.14 | 12 |
| Company 2 | 30 | 12434.40 | 75.90 | 99.83 | 3 |
| Company 3 | 30 | 13440.43 | 71.97 | 99.54 | 7 |
| Company 4 | 30 | 11192.10 | 74.10 | 95.14 | 7 |
| Company 5 | 30 | 11353.83 | 73.26 | 97.75 | 8 |
4.5 Visualization
par(mfrow = c(2, 2),
mar = c(5, 5, 4, 2),
cex.main = 1.3,
cex.lab = 1.2,
cex.axis = 1.1)
colors <- c("#7f1d1d", "#b91c1c", "#ef4444", "#fca5a5", "#fee2e2")
companies <- unique(df_company$company_id)
# ── Plot 1: Avg Salary ─────────────────────────────────
barplot(summary_company$Avg_Salary,
names.arg = paste0("C", 1:5),
col = colors,
border = "white",
main = "Average Salary per Company",
xlab = "Company",
ylab = "Average Salary",
ylim = c(0, max(summary_company$Avg_Salary) * 1.2))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")
# ── Plot 2: Avg Performance ────────────────────────────
barplot(summary_company$Avg_Performance,
names.arg = paste0("C", 1:5),
col = colors,
border = "white",
main = "Average Performance Score per Company",
xlab = "Company",
ylab = "Avg Performance Score",
ylim = c(0, 110))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")
# ── Plot 3: Top Performers ─────────────────────────────
barplot(summary_company$Top_Performers,
names.arg = paste0("C", 1:5),
col = colors,
border = "white",
main = "Top Performers per Company (KPI > 90)",
xlab = "Company",
ylab = "Number of Top Performers",
ylim = c(0, max(summary_company$Top_Performers) * 1.3))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")
# ── Plot 4: Scatter KPI vs Performance ────────────────
plot(df_company$performance_score, df_company$KPI_score,
col = colors[as.numeric(as.factor(df_company$company_id))],
pch = 16, cex = 1.2,
xlab = "Performance Score",
ylab = "KPI Score",
main = "Performance Score vs KPI Score")
abline(h = 90, col = "black", lty = 2, lwd = 2)
legend("topleft",
legend = paste0("Company ", 1:5),
col = colors,
pch = 16, bty = "n", cex = 0.9)
grid(lty = "dashed", col = "gray85")4.6 Interpretation
The multi-company simulation reveals variation in salary, performance, and KPI scores across 5 companies. Average salaries differ between companies due to the random generation of employee data within the same range. The scatter plot shows no strong linear relationship between performance score and KPI score, suggesting these two metrics capture different aspects of employee contribution. Companies with more top performers (KPI > 90) tend to have stronger overall KPI averages, highlighting the impact of high-performing individuals on company-level outcomes.
5 Monte Carlo Simulation - Pi & Probability
5.1 Introduction
This task estimates the value of π (Pi) using the Monte Carlo method by randomly throwing points into a unit square and checking how many fall inside a unit circle. An additional probability analysis computes the chance of a point landing in a defined sub-square. Results are visualized showing points inside vs outside the circle, and a convergence plot of π estimates.
5.2 Function Definition
# ── Fungsi monte_carlo_pi ─────────────────────────────
monte_carlo_pi <- function(n_points) {
# Generate titik acak (x, y) antara -1 dan 1
x <- runif(n_points, -1, 1)
y <- runif(n_points, -1, 1)
# Hitung jarak dari pusat (0,0)
distances <- sqrt(x^2 + y^2)
inside_circle <- distances <= 1
# Estimasi Pi
pi_estimate <- 4 * sum(inside_circle) / n_points
# Probabilitas titik jatuh di sub-kotak (0 s.d. 0.5)
in_subsquare <- sum(x >= 0 & x <= 0.5 & y >= 0 & y <= 0.5)
prob_subsquare <- in_subsquare / n_points
return(list(
pi_estimate = pi_estimate,
x_inside = x[inside_circle],
y_inside = y[inside_circle],
x_outside = x[!inside_circle],
y_outside = y[!inside_circle],
prob_subsquare = prob_subsquare
))
}5.3 Monte Carlo Results
set.seed(42)
n_list <- c(100, 1000, 10000, 100000)
# Loop untuk berbagai jumlah titik
results_mc <- do.call(rbind, lapply(n_list, function(n) {
res <- monte_carlo_pi(n)
data.frame(
N_Points = format(n, big.mark = ","),
Pi_Estimate = round(res$pi_estimate, 6),
Error = round(abs(res$pi_estimate - pi), 6),
Prob_SubSquare = round(res$prob_subsquare, 4)
)
}))
knitr::kable(results_mc,
col.names = c("N Points", "Pi Estimate", "Error", "Prob Sub-Square"),
caption = "Table 7: Monte Carlo Pi Estimation Results",
format = "html",
row.names = FALSE) %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = TRUE) %>%
row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
row_spec(seq(2, nrow(results_mc), 2), background = "#fcd5d5")| N Points | Pi Estimate | Error | Prob Sub-Square |
|---|---|---|---|
| 100 | 3.1600 | 0.018407 | 0.0700 |
| 1,000 | 3.1400 | 0.001593 | 0.0520 |
| 10,000 | 3.1396 | 0.001993 | 0.0639 |
| 1e+05 | 3.1390 | 0.002593 | 0.0606 |
5.4 Visualization
par(mfrow = c(1, 2),
mar = c(5, 5, 4, 2),
cex.main = 1.3,
cex.lab = 1.2,
cex.axis = 1.1)
# ── Plot 1: Titik dalam vs luar lingkaran (n=1000) ────
set.seed(42)
res1000 <- monte_carlo_pi(1000)
plot(res1000$x_outside, res1000$y_outside,
col = "#fca5a5", pch = 16, cex = 0.8,
xlim = c(-1.1, 1.1), ylim = c(-1.1, 1.1),
xlab = "x", ylab = "y",
main = paste0("Monte Carlo Simulation (n=1,000)\nEstimated π = ",
round(res1000$pi_estimate, 5)),
asp = 1)
points(res1000$x_inside, res1000$y_inside,
col = "#7f1d1d", pch = 16, cex = 0.8)
# Gambar lingkaran
theta <- seq(0, 2*pi, length.out = 300)
lines(cos(theta), sin(theta), col = "black", lwd = 2)
# Gambar sub-kotak
rect(0, 0, 0.5, 0.5, border = "darkgreen", lwd = 2, lty = 2)
legend("topleft",
legend = c("Inside Circle", "Outside Circle", "Sub-Square"),
col = c("#7f1d1d", "#fca5a5", "darkgreen"),
pch = c(16, 16, NA), lty = c(NA, NA, 2),
bty = "n", cex = 0.95)
grid(lty = "dashed", col = "gray85")
# ── Plot 2: Konvergensi estimasi Pi ───────────────────
set.seed(42)
n_iter <- 500
pi_conv <- c()
inside_count <- 0
for (i in 1:n_iter) {
x_r <- runif(1, -1, 1)
y_r <- runif(1, -1, 1)
if (x_r^2 + y_r^2 <= 1) inside_count <- inside_count + 1
pi_conv <- c(pi_conv, 4 * inside_count / i)
}
plot(1:n_iter, pi_conv,
type = "l", col = "#7f1d1d", lwd = 2,
xlab = "Number of Iterations",
ylab = "Estimated π",
main = "Convergence of π Estimation\n(Monte Carlo)")
abline(h = pi, col = "black", lty = 2, lwd = 2)
legend("topright",
legend = c("Estimated π", paste0("True π = ", round(pi, 5))),
col = c("#7f1d1d", "black"),
lty = c(1, 2), lwd = 2, bty = "n", cex = 0.95)
grid(lty = "dashed", col = "gray85")5.5 Interpretation
The Monte Carlo simulation demonstrates that as the number of random points increases, the estimated value of π converges closer to its true value (3.14159). With only 100 points the estimate is quite inaccurate, but with 100,000 points the error becomes very small. The convergence plot confirms this trend — the estimated π fluctuates widely at first, then stabilizes around the true value as iterations increase. The probability of a point landing in the sub-square (0 to 0.5) is approximately 0.0625, consistent with the theoretical value of (0.5 × 0.5) / (2 × 2) = 0.0625.
6 Advanced Data Transformation & Feature Engineering
6.1 Introduction
This task applies two transformation techniques — Min-Max
Normalization and Z-Score Standardization — to
numerical columns using loop-based functions. New features are also
engineered from existing data: performance_category,
salary_bracket, and KPI_tier. Distributions
before and after transformation are compared using histograms and
boxplots.
6.2 Function Definition
# ── Fungsi normalize_columns (Min-Max) ────────────────
normalize_columns <- function(df, columns) {
df_norm <- df
for (col in columns) {
min_val <- min(df_norm[[col]])
max_val <- max(df_norm[[col]])
df_norm[[paste0(col, "_normalized")]] <- (df_norm[[col]] - min_val) / (max_val - min_val)
}
return(df_norm)
}
# ── Fungsi z_score ────────────────────────────────────
z_score <- function(df, columns) {
df_z <- df
for (col in columns) {
mean_val <- mean(df_z[[col]])
std_val <- sd(df_z[[col]])
df_z[[paste0(col, "_zscore")]] <- (df_z[[col]] - mean_val) / std_val
}
return(df_z)
}
# ── Fungsi feature engineering ────────────────────────
create_features <- function(df) {
perf_cat <- c()
sal_bracket <- c()
kpi_tier <- c()
for (i in 1:nrow(df)) {
# Performance category
if (df$performance_score[i] >= 90) perf_cat <- c(perf_cat, "Excellent")
else if (df$performance_score[i] >= 75) perf_cat <- c(perf_cat, "Very Good")
else if (df$performance_score[i] >= 60) perf_cat <- c(perf_cat, "Good")
else perf_cat <- c(perf_cat, "Average")
# Salary bracket
if (df$salary[i] >= 16000) sal_bracket <- c(sal_bracket, "Very High")
else if (df$salary[i] >= 11000) sal_bracket <- c(sal_bracket, "High")
else if (df$salary[i] >= 7000) sal_bracket <- c(sal_bracket, "Medium")
else sal_bracket <- c(sal_bracket, "Low")
# KPI tier
if (df$KPI_score[i] >= 90) kpi_tier <- c(kpi_tier, "Platinum")
else if (df$KPI_score[i] >= 75) kpi_tier <- c(kpi_tier, "Gold")
else if (df$KPI_score[i] >= 60) kpi_tier <- c(kpi_tier, "Silver")
else kpi_tier <- c(kpi_tier, "Bronze")
}
df$performance_category <- perf_cat
df$salary_bracket <- sal_bracket
df$KPI_tier <- kpi_tier
return(df)
}6.3 Generate Data & Apply Transformation
set.seed(42)
n <- 200
departments <- c("HR", "Finance", "Engineering", "Marketing", "Operations")
df_raw <- data.frame(
employee_id = paste0("E", sprintf("%03d", 1:n)),
salary = sample(4000:20000, n, replace = TRUE),
department = sample(departments, n, replace = TRUE),
performance_score = round(runif(n, 50, 100), 2),
KPI_score = round(runif(n, 50, 100), 2)
)
num_cols <- c("salary", "performance_score", "KPI_score")
df_norm <- normalize_columns(df_raw, num_cols)
df_z <- z_score(df_raw, num_cols)
df_featured <- create_features(df_raw)
# Tampilkan tabel normalisasi
knitr::kable(head(df_norm[, c("employee_id", "salary", "salary_normalized",
"performance_score", "performance_score_normalized")], 8),
col.names = c("Employee ID", "Salary", "Salary Normalized",
"Performance", "Performance Normalized"),
caption = "Table 8: Min-Max Normalization Results (First 8 Rows)",
format = "html") %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = TRUE) %>%
row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
row_spec(seq(2, 8, 2), background = "#fcd5d5")| Employee ID | Salary | Salary Normalized | Performance | Performance Normalized |
|---|---|---|---|---|
| E001 | 14800 | 0.6751596 | 87.96 | 0.7621535 |
| E002 | 16260 | 0.7665582 | 65.26 | 0.3061470 |
| E003 | 6368 | 0.1473019 | 58.28 | 0.1659301 |
| E004 | 9272 | 0.3290973 | 51.64 | 0.0325432 |
| E005 | 13289 | 0.5805684 | 56.83 | 0.1368019 |
| E006 | 5251 | 0.0773757 | 58.86 | 0.1775814 |
| E007 | 19505 | 0.9697008 | 75.98 | 0.5214946 |
| E008 | 12825 | 0.5515212 | 90.56 | 0.8143833 |
6.4 Feature Engineering Results
# Ringkasan fitur baru
feat_summary <- data.frame(
Feature = c("performance_category", "salary_bracket", "KPI_tier"),
Categories = c(paste(unique(df_featured$performance_category), collapse = ", "),
paste(unique(df_featured$salary_bracket), collapse = ", "),
paste(unique(df_featured$KPI_tier), collapse = ", "))
)
knitr::kable(feat_summary,
col.names = c("New Feature", "Categories"),
caption = "Table 9: New Engineered Features",
format = "html",
row.names = FALSE) %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = TRUE) %>%
row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
row_spec(seq(2, nrow(feat_summary), 2), background = "#fcd5d5")| New Feature | Categories |
|---|---|
| performance_category | Very Good, Good, Average, Excellent |
| salary_bracket | High, Very High, Low, Medium |
| KPI_tier | Gold, Platinum, Silver, Bronze |
6.5 Visualization
par(mfrow = c(2, 3),
mar = c(5, 5, 4, 2),
cex.main = 1.2,
cex.lab = 1.1,
cex.axis = 1.0)
# ── Histogram: Sebelum vs Sesudah Normalisasi ──────────
cols_before <- c("salary", "performance_score", "KPI_score")
cols_after <- c("salary_normalized", "performance_score_normalized", "KPI_score_normalized")
titles_b <- c("Salary (Before)", "Performance Score (Before)", "KPI Score (Before)")
titles_a <- c("Salary Normalized (After)", "Performance Normalized (After)", "KPI Normalized (After)")
for (i in 1:3) {
hist(df_raw[[cols_before[i]]],
col = "#fca5a5",
border = "white",
main = titles_b[i],
xlab = "Value",
ylab = "Frequency")
grid(lty = "dashed", col = "gray85")
}
for (i in 1:3) {
hist(df_norm[[cols_after[i]]],
col = "#7f1d1d",
border = "white",
main = titles_a[i],
xlab = "Normalized Value (0-1)",
ylab = "Frequency")
grid(lty = "dashed", col = "gray85")
}par(mfrow = c(1, 2),
mar = c(6, 5, 4, 2),
cex.main = 1.2,
cex.lab = 1.1,
cex.axis = 1.0)
# ── Boxplot: Data Asli vs Z-Score ─────────────────────
boxplot(df_raw[, num_cols],
col = "#fca5a5",
border = "#7f1d1d",
main = "Original Data (Boxplot)",
ylab = "Value",
las = 2)
grid(lty = "dashed", col = "gray85")
z_cols <- paste0(num_cols, "_zscore")
boxplot(df_z[, z_cols],
col = "#7f1d1d",
border = "#b91c1c",
main = "After Z-Score Standardization (Boxplot)",
names = c("Salary\nZ-Score", "Performance\nZ-Score", "KPI\nZ-Score"),
ylab = "Z-Score Value",
las = 2)
grid(lty = "dashed", col = "gray85")6.6 Interpretation
Min-Max normalization rescales all numerical columns to a range of
0 to 1, making them directly comparable regardless of
their original scale. Z-Score standardization transforms the data so
that each column has a mean of 0 and standard deviation of
1, which is useful for algorithms sensitive to data scale. The
histograms confirm that the shape of the distribution is
preserved after both transformations — only the scale changes,
not the underlying pattern. The newly engineered features —
performance_category, salary_bracket, and
KPI_tier — provide meaningful categorical labels that
simplify further analysis and reporting.
7 Mini Project - Company KPI Dashboard & Simulation
7.1 Introduction
This mini project generates a complete employee dataset for 7 companies with 50–200 employees each. The dataset includes employee ID, company ID, salary, department, performance score, and KPI score. Employees are categorized into KPI tiers using a loop-based function. Results are summarized per company and visualized through multiple advanced plots including grouped bar charts, scatter plots with regression lines, and salary distribution boxplots.
7.2 Function Definition
# ── Fungsi generate dataset ───────────────────────────
generate_dashboard_data <- function(n_company, min_emp, max_emp) {
departments <- c("HR", "Finance", "Engineering", "Marketing", "Operations")
data <- data.frame()
# Nested loop: per perusahaan → per karyawan
for (company_id in 1:n_company) {
n_emp <- sample(min_emp:max_emp, 1)
for (emp_num in 1:n_emp) {
salary <- sample(4000:20000, 1)
department <- sample(departments, 1)
performance_score <- round(runif(1, 50, 100), 2)
KPI_score <- round(runif(1, 50, 100), 2)
data <- rbind(data, data.frame(
employee_id = paste0("C", company_id, "_E", sprintf("%03d", emp_num)),
company_id = paste0("Company ", company_id),
salary = salary,
department = department,
performance_score = performance_score,
KPI_score = KPI_score
))
}
}
return(data)
}
# ── Fungsi kategorisasi KPI tier ──────────────────────
categorize_kpi <- function(df) {
tiers <- c()
for (kpi in df$KPI_score) {
if (kpi >= 90) tiers <- c(tiers, "Platinum")
else if (kpi >= 75) tiers <- c(tiers, "Gold")
else if (kpi >= 60) tiers <- c(tiers, "Silver")
else tiers <- c(tiers, "Bronze")
}
return(tiers)
}
# ── Fungsi summary per perusahaan ─────────────────────
summarize_companies <- function(df) {
companies <- unique(df$company_id)
summary <- data.frame()
for (comp in companies) {
subset_df <- df[df$company_id == comp, ]
summary <- rbind(summary, data.frame(
Company = comp,
Total_Employees = nrow(subset_df),
Avg_Salary = round(mean(subset_df$salary), 2),
Avg_KPI = round(mean(subset_df$KPI_score), 2),
Avg_Performance = round(mean(subset_df$performance_score), 2),
Top_Performers = sum(subset_df$KPI_score > 90)
))
}
return(summary)
}7.3 Generate Data
set.seed(42)
df <- generate_dashboard_data(n_company = 7, min_emp = 50, max_emp = 200)
df$KPI_tier <- categorize_kpi(df)
cat("Total Karyawan:", nrow(df), "\n")## Total Karyawan: 841
## Total Perusahaan: 7
knitr::kable(head(df, 10),
col.names = c("Employee ID", "Company ID", "Salary",
"Department", "Performance Score", "KPI Score", "KPI Tier"),
caption = "Table 10: Company Dataset Preview (First 10 Rows)",
format = "html") %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = TRUE) %>%
row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
row_spec(seq(2, 10, 2), background = "#fcd5d5")| Employee ID | Company ID | Salary | Department | Performance Score | KPI Score | KPI Tier |
|---|---|---|---|---|---|---|
| C1_E001 | Company 1 | 16260 | HR | 91.52 | 82.09 | Gold |
| C1_E002 | Company 1 | 5251 | Finance | 56.73 | 82.85 | Gold |
| C1_E003 | Company 1 | 17439 | Marketing | 73.11 | 97.00 | Platinum |
| C1_E004 | Company 1 | 18957 | Marketing | 73.75 | 78.02 | Gold |
| C1_E005 | Company 1 | 14094 | Engineering | 99.44 | 97.33 | Platinum |
| C1_E006 | Company 1 | 9402 | Marketing | 69.51 | 95.29 | Platinum |
| C1_E007 | Company 1 | 16908 | Operations | 86.88 | 90.55 | Platinum |
| C1_E008 | Company 1 | 13051 | Engineering | 91.65 | 50.37 | Bronze |
| C1_E009 | Company 1 | 17609 | Engineering | 71.79 | 51.87 | Bronze |
| C1_E010 | Company 1 | 18649 | Marketing | 94.39 | 82.00 | Gold |
7.4 Summary per Company
summary_df <- summarize_companies(df)
knitr::kable(summary_df,
col.names = c("Company", "Total Employees", "Avg Salary",
"Avg KPI", "Avg Performance", "Top Performers"),
caption = "Table 11: Summary Statistics per Company",
format = "html",
row.names = FALSE) %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = TRUE) %>%
row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
row_spec(seq(2, nrow(summary_df), 2), background = "#fcd5d5")| Company | Total Employees | Avg Salary | Avg KPI | Avg Performance | Top Performers |
|---|---|---|---|---|---|
| Company 1 | 98 | 12785.39 | 76.29 | 74.94 | 22 |
| Company 2 | 95 | 11197.06 | 76.14 | 74.10 | 25 |
| Company 3 | 76 | 11693.68 | 74.08 | 74.52 | 14 |
| Company 4 | 129 | 11852.91 | 73.62 | 74.82 | 17 |
| Company 5 | 168 | 11934.82 | 75.65 | 75.51 | 41 |
| Company 6 | 142 | 12184.23 | 75.98 | 73.79 | 30 |
| Company 7 | 133 | 11858.23 | 75.76 | 73.67 | 34 |
7.5 Top Performers per Company
top_perf <- df[df$KPI_score > 90, ]
top_table <- do.call(rbind, lapply(unique(df$company_id), function(comp) {
subset_top <- top_perf[top_perf$company_id == comp, ]
if (nrow(subset_top) > 0) {
head(subset_top[order(-subset_top$KPI_score),
c("employee_id", "company_id", "department",
"salary", "performance_score", "KPI_score", "KPI_tier")], 3)
}
}))
knitr::kable(top_table,
col.names = c("Employee ID", "Company", "Department",
"Salary", "Performance", "KPI Score", "KPI Tier"),
caption = "Table 12: Top 3 Performers per Company (KPI > 90)",
format = "html",
row.names = FALSE) %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = TRUE) %>%
row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
row_spec(seq(2, nrow(top_table), 2), background = "#fcd5d5")| Employee ID | Company | Department | Salary | Performance | KPI Score | KPI Tier |
|---|---|---|---|---|---|---|
| C1_E050 | Company 1 | Finance | 10626 | 50.88 | 99.83 | Platinum |
| C1_E068 | Company 1 | Finance | 7070 | 95.05 | 99.54 | Platinum |
| C1_E013 | Company 1 | HR | 15618 | 83.78 | 99.14 | Platinum |
| C2_E066 | Company 2 | HR | 7489 | 97.23 | 99.61 | Platinum |
| C2_E043 | Company 2 | Finance | 10826 | 60.23 | 97.75 | Platinum |
| C2_E089 | Company 2 | HR | 6419 | 71.73 | 96.81 | Platinum |
| C3_E042 | Company 3 | HR | 12940 | 70.48 | 98.50 | Platinum |
| C3_E040 | Company 3 | Finance | 8226 | 73.71 | 97.46 | Platinum |
| C3_E049 | Company 3 | Finance | 5535 | 67.89 | 96.61 | Platinum |
| C4_E091 | Company 4 | Engineering | 7261 | 59.55 | 99.97 | Platinum |
| C4_E072 | Company 4 | Engineering | 16350 | 55.75 | 99.89 | Platinum |
| C4_E123 | Company 4 | Finance | 6518 | 55.15 | 99.27 | Platinum |
| C5_E028 | Company 5 | Operations | 18944 | 59.55 | 99.99 | Platinum |
| C5_E060 | Company 5 | Marketing | 17872 | 84.50 | 99.77 | Platinum |
| C5_E032 | Company 5 | Marketing | 10836 | 86.00 | 99.17 | Platinum |
| C6_E009 | Company 6 | Engineering | 10282 | 93.18 | 99.96 | Platinum |
| C6_E139 | Company 6 | HR | 9008 | 90.55 | 99.51 | Platinum |
| C6_E078 | Company 6 | Operations | 13529 | 58.37 | 99.24 | Platinum |
| C7_E074 | Company 7 | Engineering | 7871 | 84.06 | 98.52 | Platinum |
| C7_E077 | Company 7 | Marketing | 18225 | 53.03 | 98.52 | Platinum |
| C7_E095 | Company 7 | HR | 7629 | 97.21 | 98.41 | Platinum |
7.6 Visualization
colors <- c("#7f1d1d","#b91c1c","#ef4444","#f97316","#fca5a5","#fcd5d5","#fee2e2")
companies <- unique(df$company_id)
par(mfrow = c(2, 2),
mar = c(6, 5, 4, 2),
cex.main = 1.2,
cex.lab = 1.1,
cex.axis = 0.95)
# ── Plot 1: Avg Salary per Company ────────────────────
barplot(summary_df$Avg_Salary,
names.arg = paste0("C", 1:7),
col = colors,
border = "white",
main = "Average Salary per Company",
xlab = "Company",
ylab = "Average Salary",
ylim = c(0, max(summary_df$Avg_Salary) * 1.2))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")
# ── Plot 2: Avg KPI per Company ───────────────────────
barplot(summary_df$Avg_KPI,
names.arg = paste0("C", 1:7),
col = colors,
border = "white",
main = "Average KPI Score per Company",
xlab = "Company",
ylab = "Average KPI Score",
ylim = c(0, max(summary_df$Avg_KPI) * 1.2))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")
# ── Plot 3: Top Performers per Company ───────────────
barplot(summary_df$Top_Performers,
names.arg = paste0("C", 1:7),
col = colors,
border = "white",
main = "Top Performers per Company (KPI > 90)",
xlab = "Company",
ylab = "Number of Top Performers",
ylim = c(0, max(summary_df$Top_Performers) * 1.3))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")
# ── Plot 4: Scatter Performance vs KPI + Regression ──
plot(df$performance_score, df$KPI_score,
col = colors[as.numeric(as.factor(df$company_id))],
pch = 16, cex = 0.9,
xlab = "Performance Score",
ylab = "KPI Score",
main = "Performance vs KPI Score\n(with Regression Line)")
# Regression line
fit <- lm(KPI_score ~ performance_score, data = df)
abline(fit, col = "black", lwd = 2.5, lty = 2)
legend("topleft",
legend = c(paste0("C", 1:7), "Regression"),
col = c(colors, "black"),
pch = c(rep(16, 7), NA),
lty = c(rep(NA, 7), 2),
lwd = c(rep(NA, 7), 2),
bty = "n", cex = 0.8)
grid(lty = "dashed", col = "gray85")par(mfrow = c(1, 2),
mar = c(6, 5, 4, 2),
cex.main = 1.2,
cex.lab = 1.1,
cex.axis = 0.95)
# ── Plot 5: Salary Distribution Boxplot ───────────────
salary_list <- lapply(companies, function(comp) {
df[df$company_id == comp, "salary"]
})
boxplot(salary_list,
names = paste0("C", 1:7),
col = colors,
border = "#7f1d1d",
main = "Salary Distribution per Company",
xlab = "Company",
ylab = "Salary",
las = 1)
grid(lty = "dashed", col = "gray85")
# ── Plot 6: KPI Tier Distribution ─────────────────────
tier_order <- c("Platinum", "Gold", "Silver", "Bronze")
tier_colors <- c("#7f1d1d", "#b91c1c", "#ef4444", "#fca5a5")
tier_counts <- sapply(tier_order, function(t) sum(df$KPI_tier == t))
pie(tier_counts,
labels = paste0(tier_order, "\n", round(tier_counts/sum(tier_counts)*100, 1), "%"),
col = tier_colors,
main = "KPI Tier Distribution\n(All Companies)",
cex = 1.0)par(mar = c(7, 5, 4, 2),
cex.main = 1.2,
cex.lab = 1.1,
cex.axis = 0.85)
# ── Plot 7: Grouped Bar Chart - Avg KPI per Dept ──────
dept_list <- c("HR", "Finance", "Engineering", "Marketing", "Operations")
n_dept <- length(dept_list)
n_comp <- length(companies)
dept_matrix <- matrix(0, nrow = n_dept, ncol = n_comp)
for (i in 1:n_dept) {
for (j in 1:n_comp) {
subset_dj <- df[df$department == dept_list[i] & df$company_id == companies[j], ]
dept_matrix[i, j] <- if (nrow(subset_dj) > 0) round(mean(subset_dj$KPI_score), 2) else 0
}
}
barplot(dept_matrix,
beside = TRUE,
names.arg = paste0("C", 1:7),
col = c("#7f1d1d","#b91c1c","#ef4444","#f97316","#fca5a5"),
border = "white",
main = "Average KPI Score per Department per Company\n(Grouped Bar Chart)",
xlab = "Company",
ylab = "Average KPI Score",
ylim = c(0, 110),
las = 1)
legend("topright",
legend = dept_list,
fill = c("#7f1d1d","#b91c1c","#ef4444","#f97316","#fca5a5"),
bty = "n", cex = 0.9)
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")7.7 Interpretation
The KPI dashboard reveals meaningful differences in employee performance across 7 companies. Average salaries are relatively consistent across companies, reflecting a uniform salary range in the simulation. The scatter plot with regression line shows no strong linear relationship between performance score and KPI score, suggesting these metrics measure different dimensions of employee contribution. The grouped bar chart highlights that KPI scores vary across departments within each company, with Engineering and Finance typically showing stronger KPI performance. The salary boxplot confirms a wide spread of salaries within each company, indicating diverse employee compensation levels. The KPI tier pie chart shows that the majority of employees fall in the Silver and Gold tiers, with a smaller proportion reaching Platinum status.
8 Automated Report Generation per Company
8.1 Introduction
This bonus task builds an automated report generation system using functions and loops. For each company in the dataset, a structured summary is generated automatically — including key statistics, top performers, department breakdown, and visualizations. The system loops through all companies and compiles results into a unified HTML-ready output, simulating a real-world automated reporting pipeline.
8.2 Function Definition
# ── Fungsi generate_single_report ─────────────────────
generate_single_report <- function(df, company_name) {
subset_df <- df[df$company_id == company_name, ]
# Basic stats
total_emp <- nrow(subset_df)
avg_salary <- round(mean(subset_df$salary), 2)
avg_kpi <- round(mean(subset_df$KPI_score), 2)
avg_perf <- round(mean(subset_df$performance_score), 2)
top_count <- sum(subset_df$KPI_score > 90)
max_kpi <- round(max(subset_df$KPI_score), 2)
min_salary <- min(subset_df$salary)
max_salary <- max(subset_df$salary)
# KPI tier distribution
tier_order <- c("Platinum", "Gold", "Silver", "Bronze")
tier_counts <- sapply(tier_order, function(t) sum(subset_df$KPI_tier == t))
tier_pct <- round(tier_counts / total_emp * 100, 1)
# Department breakdown
dept_summary <- do.call(rbind, lapply(unique(subset_df$department), function(dept) {
dept_df <- subset_df[subset_df$department == dept, ]
data.frame(
Department = dept,
Count = nrow(dept_df),
Avg_Salary = round(mean(dept_df$salary), 2),
Avg_KPI = round(mean(dept_df$KPI_score), 2),
Top_Performers = sum(dept_df$KPI_score > 90)
)
}))
dept_summary <- dept_summary[order(-dept_summary$Avg_KPI), ]
return(list(
company = company_name,
total_emp = total_emp,
avg_salary = avg_salary,
avg_kpi = avg_kpi,
avg_perf = avg_perf,
top_count = top_count,
max_kpi = max_kpi,
min_salary = min_salary,
max_salary = max_salary,
tier_counts = tier_counts,
tier_pct = tier_pct,
dept_summary = dept_summary,
subset_df = subset_df
))
}
# ── Fungsi print_report_table ─────────────────────────
print_report_table <- function(report) {
summary_tbl <- data.frame(
Metric = c("Total Employees", "Average Salary", "Average KPI Score",
"Average Performance", "Top Performers (KPI > 90)",
"Highest KPI Score", "Salary Range"),
Value = c(
report$total_emp,
paste0("$", format(report$avg_salary, big.mark = ",")),
report$avg_kpi,
report$avg_perf,
paste0(report$top_count, " employees"),
report$max_kpi,
paste0("$", format(report$min_salary, big.mark = ","),
" - $", format(report$max_salary, big.mark = ","))
)
)
print(
knitr::kable(summary_tbl,
col.names = c("Metric", "Value"),
format = "html",
row.names = FALSE) %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = TRUE) %>%
row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
row_spec(seq(2, nrow(summary_tbl), 2), background = "#fcd5d5")
)
}
# ── Fungsi print_dept_table ───────────────────────────
print_dept_table <- function(report) {
print(
knitr::kable(report$dept_summary,
col.names = c("Department", "Headcount", "Avg Salary",
"Avg KPI", "Top Performers"),
format = "html",
row.names = FALSE) %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = TRUE) %>%
row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
row_spec(seq(2, nrow(report$dept_summary), 2), background = "#fcd5d5")
)
}
# ── Fungsi plot_company_report ────────────────────────
plot_company_report <- function(report) {
tier_order <- c("Platinum", "Gold", "Silver", "Bronze")
tier_colors <- c("#7f1d1d", "#b91c1c", "#ef4444", "#fca5a5")
subset_df <- report$subset_df
par(mfrow = c(1, 3),
mar = c(5, 5, 4, 2),
cex.main = 1.2,
cex.lab = 1.1,
cex.axis = 1.0)
# ── Plot 1: KPI Tier Pie Chart ─────────────────────
pie(report$tier_counts,
labels = paste0(tier_order, "\n", report$tier_pct, "%"),
col = tier_colors,
main = paste0(report$company, "\nKPI Tier Distribution"),
cex = 0.95)
# ── Plot 2: Salary Distribution Histogram ──────────
hist(subset_df$salary,
col = "#fca5a5",
border = "white",
main = paste0(report$company, "\nSalary Distribution"),
xlab = "Salary",
ylab = "Frequency")
abline(v = report$avg_salary,
col = "#7f1d1d", lwd = 2, lty = 2)
legend("topright",
legend = paste0("Avg: $", format(report$avg_salary, big.mark = ",")),
col = "#7f1d1d", lty = 2, lwd = 2,
bty = "n", cex = 0.9)
grid(lty = "dashed", col = "gray85")
# ── Plot 3: Scatter Performance vs KPI ─────────────
plot(subset_df$performance_score, subset_df$KPI_score,
col = "#7f1d1d", pch = 16, cex = 1.1,
xlab = "Performance Score",
ylab = "KPI Score",
main = paste0(report$company, "\nPerformance vs KPI"))
fit <- lm(KPI_score ~ performance_score, data = subset_df)
abline(fit, col = "black", lwd = 2, lty = 2)
abline(h = 90, col = "#b91c1c", lwd = 1.5, lty = 3)
legend("topleft",
legend = c("Regression", "KPI = 90"),
col = c("black", "#b91c1c"),
lty = c(2, 3), lwd = 2,
bty = "n", cex = 0.85)
grid(lty = "dashed", col = "gray85")
par(mfrow = c(1, 1))
}8.3 Generate All Company Reports
The loop below automatically generates a complete structured report for each company in the dataset — including summary statistics, department breakdown table, and three visualizations. This simulates an automated reporting pipeline where output is produced per entity without manual intervention.
companies_list <- unique(df$company_id)
# ── Loop utama: generate report per perusahaan ────────
for (comp in companies_list) {
# Header perusahaan
cat(paste0("\n\n### ", comp, "\n\n"))
# Generate report object
report <- generate_single_report(df, comp)
# Summary stats table
cat("**Summary Statistics**\n\n")
print_report_table(report)
# Department breakdown table
cat("\n\n**Department Breakdown**\n\n")
print_dept_table(report)
# Visualizations
cat("\n\n")
plot_company_report(report)
cat("\n\n---\n\n")
}8.3.1 Company 1
Summary Statistics
| Metric | Value |
|---|---|
| Total Employees | 98 |
| Average Salary | $12,785.39 |
| Average KPI Score | 76.29 |
| Average Performance | 74.94 |
| Top Performers (KPI > 90) | 22 employees |
| Highest KPI Score | 99.83 |
| Salary Range | $4,001 - $19,721 |
Department Breakdown
| Department | Headcount | Avg Salary | Avg KPI | Top Performers |
|---|---|---|---|---|
| Finance | 17 | 10293.00 | 78.95 | 3 |
| Marketing | 19 | 12195.68 | 78.33 | 5 |
| HR | 28 | 13891.96 | 76.32 | 6 |
| Engineering | 14 | 11845.21 | 74.92 | 5 |
| Operations | 20 | 14573.05 | 73.03 | 3 |
8.3.2 Company 2
Summary Statistics
| Metric | Value |
|---|---|
| Total Employees | 95 |
| Average Salary | $11,197.06 |
| Average KPI Score | 76.14 |
| Average Performance | 74.1 |
| Top Performers (KPI > 90) | 25 employees |
| Highest KPI Score | 99.61 |
| Salary Range | $4,109 - $19,928 |
Department Breakdown
| Department | Headcount | Avg Salary | Avg KPI | Top Performers |
|---|---|---|---|---|
| Marketing | 14 | 11035.57 | 79.80 | 4 |
| Finance | 28 | 11174.54 | 77.21 | 8 |
| HR | 18 | 9884.22 | 76.63 | 5 |
| Operations | 18 | 12555.61 | 74.55 | 4 |
| Engineering | 17 | 11318.76 | 72.56 | 4 |
8.3.3 Company 3
Summary Statistics
| Metric | Value |
|---|---|
| Total Employees | 76 |
| Average Salary | $11,693.68 |
| Average KPI Score | 74.08 |
| Average Performance | 74.52 |
| Top Performers (KPI > 90) | 14 employees |
| Highest KPI Score | 98.5 |
| Salary Range | $4,416 - $19,949 |
Department Breakdown
| Department | Headcount | Avg Salary | Avg KPI | Top Performers |
|---|---|---|---|---|
| Finance | 17 | 9687.94 | 79.77 | 6 |
| HR | 18 | 10905.22 | 77.94 | 3 |
| Marketing | 14 | 12814.57 | 72.88 | 3 |
| Engineering | 19 | 13073.21 | 71.91 | 2 |
| Operations | 8 | 12492.00 | 60.57 | 0 |
8.3.4 Company 4
Summary Statistics
| Metric | Value |
|---|---|
| Total Employees | 129 |
| Average Salary | $11,852.91 |
| Average KPI Score | 73.62 |
| Average Performance | 74.82 |
| Top Performers (KPI > 90) | 17 employees |
| Highest KPI Score | 99.97 |
| Salary Range | $4,083 - $19,855 |
Department Breakdown
| Department | Headcount | Avg Salary | Avg KPI | Top Performers |
|---|---|---|---|---|
| Engineering | 32 | 10565.44 | 78.36 | 8 |
| Operations | 24 | 12426.54 | 72.66 | 1 |
| Marketing | 23 | 12826.91 | 72.32 | 5 |
| Finance | 25 | 12323.04 | 72.03 | 3 |
| HR | 25 | 11584.00 | 71.28 | 0 |
8.3.5 Company 5
Summary Statistics
| Metric | Value |
|---|---|
| Total Employees | 168 |
| Average Salary | $11,934.82 |
| Average KPI Score | 75.65 |
| Average Performance | 75.51 |
| Top Performers (KPI > 90) | 41 employees |
| Highest KPI Score | 99.99 |
| Salary Range | $4,091 - $19,916 |
Department Breakdown
| Department | Headcount | Avg Salary | Avg KPI | Top Performers |
|---|---|---|---|---|
| Finance | 31 | 12049.68 | 79.08 | 12 |
| Engineering | 29 | 11635.17 | 76.53 | 7 |
| HR | 37 | 12636.59 | 76.07 | 8 |
| Marketing | 39 | 11867.87 | 75.03 | 10 |
| Operations | 32 | 11365.28 | 71.79 | 4 |
8.3.6 Company 6
Summary Statistics
| Metric | Value |
|---|---|
| Total Employees | 142 |
| Average Salary | $12,184.23 |
| Average KPI Score | 75.98 |
| Average Performance | 73.79 |
| Top Performers (KPI > 90) | 30 employees |
| Highest KPI Score | 99.96 |
| Salary Range | $4,096 - $19,955 |
Department Breakdown
| Department | Headcount | Avg Salary | Avg KPI | Top Performers |
|---|---|---|---|---|
| Finance | 23 | 12689.96 | 78.75 | 5 |
| Marketing | 27 | 11231.67 | 77.04 | 5 |
| Operations | 36 | 12376.19 | 75.72 | 9 |
| Engineering | 26 | 12510.42 | 75.64 | 7 |
| HR | 30 | 12140.73 | 73.48 | 4 |
8.3.7 Company 7
Summary Statistics
| Metric | Value |
|---|---|
| Total Employees | 133 |
| Average Salary | $11,858.23 |
| Average KPI Score | 75.76 |
| Average Performance | 73.67 |
| Top Performers (KPI > 90) | 34 employees |
| Highest KPI Score | 98.52 |
| Salary Range | $4,002 - $19,922 |
Department Breakdown
| Department | Headcount | Avg Salary | Avg KPI | Top Performers |
|---|---|---|---|---|
| HR | 24 | 11913.12 | 82.09 | 9 |
| Engineering | 26 | 12432.00 | 77.60 | 10 |
| Finance | 28 | 11077.61 | 77.49 | 10 |
| Operations | 26 | 10503.73 | 71.76 | 2 |
| Marketing | 29 | 13266.45 | 70.79 | 3 |
8.4 Consolidated Report Summary
# ── Tabel ringkasan semua perusahaan ──────────────────
all_reports <- lapply(companies_list, function(comp) {
r <- generate_single_report(df, comp)
data.frame(
Company = r$company,
Total_Employees = r$total_emp,
Avg_Salary = r$avg_salary,
Avg_KPI = r$avg_kpi,
Avg_Performance = r$avg_perf,
Top_Performers = r$top_count,
Max_KPI = r$max_kpi
)
})
consolidated <- do.call(rbind, all_reports)
knitr::kable(consolidated,
col.names = c("Company", "Employees", "Avg Salary",
"Avg KPI", "Avg Performance",
"Top Performers", "Max KPI"),
caption = "Table 13: Consolidated Report — All Companies",
format = "html",
row.names = FALSE) %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = TRUE) %>%
row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
row_spec(seq(2, nrow(consolidated), 2), background = "#fcd5d5")| Company | Employees | Avg Salary | Avg KPI | Avg Performance | Top Performers | Max KPI |
|---|---|---|---|---|---|---|
| Company 1 | 98 | 12785.39 | 76.29 | 74.94 | 22 | 99.83 |
| Company 2 | 95 | 11197.06 | 76.14 | 74.10 | 25 | 99.61 |
| Company 3 | 76 | 11693.68 | 74.08 | 74.52 | 14 | 98.50 |
| Company 4 | 129 | 11852.91 | 73.62 | 74.82 | 17 | 99.97 |
| Company 5 | 168 | 11934.82 | 75.65 | 75.51 | 41 | 99.99 |
| Company 6 | 142 | 12184.23 | 75.98 | 73.79 | 30 | 99.96 |
| Company 7 | 133 | 11858.23 | 75.76 | 73.67 | 34 | 98.52 |
8.5 Export to CSV
# ── Export ringkasan ke CSV ───────────────────────────
write.csv(consolidated,
file = "company_report_summary.csv",
row.names = FALSE)
cat("Report exported successfully: company_report_summary.csv\n")## Report exported successfully: company_report_summary.csv
## Rows: 7 | Columns: 7
8.6 Interpretation
The automated report generation system successfully loops through all 7 companies, producing a structured report for each without any manual repetition. Each company report includes a summary statistics table, department breakdown, and three visualizations: KPI tier distribution, salary histogram, and a performance vs KPI scatter plot with regression line. The consolidated summary table aggregates all companies into a single comparative view, making it easy to identify which companies have the highest average KPI, salary, or top performer count. The CSV export ensures the results can be shared or used in downstream reporting tools. This approach demonstrates how functions and loops together create scalable, reusable reporting pipelines — a core pattern in real-world data science workflows.