Data Science Programming

Prakticum Week 4

Logo


1 Dynamic Multi-Formula Function

1.1 Introduction

This task builds a function compute_formula(x, formula) that calculates four mathematical formulas — linear, quadratic, cubic, and exponential — for x = 1 to 20. Nested loops are used to compute all formulas at once, with input validation to ensure only valid formula names are accepted. Results are displayed in a table and visualized in a single combined plot.


1.2 Function Definition

# ── Fungsi compute_formula ─────────────────────────────
compute_formula <- function(x, formula) {

  # Validasi input formula
  valid_formulas <- c("linear", "quadratic", "cubic", "exponential")
  if (!(formula %in% valid_formulas)) {
    stop(paste("Invalid formula. Choose from:", paste(valid_formulas, collapse = ", ")))
  }

  # Hitung y berdasarkan formula yang dipilih
  if (formula == "linear")      return(2 * x + 1)
  if (formula == "quadratic")   return(x^2 + 3 * x + 2)
  if (formula == "cubic")       return(x^3 - 2*x^2 + x + 5)
  if (formula == "exponential") return(exp(0.3 * x))
}

1.3 Compute All Formulas Using Nested Loops

x_values <- 1:20
formulas  <- c("linear", "quadratic", "cubic", "exponential")
results   <- list()

# Nested loop: per formula → per nilai x
for (formula in formulas) {
  y_values <- c()
  for (x in x_values) {
    y        <- compute_formula(x, formula)
    y_values <- c(y_values, y)
  }
  results[[formula]] <- y_values
}

# Buat data frame
df_results <- data.frame(
  x           = x_values,
  Linear      = round(results[["linear"]],      2),
  Quadratic   = round(results[["quadratic"]],   2),
  Cubic       = round(results[["cubic"]],       2),
  Exponential = round(results[["exponential"]], 4)
)

# Tampilkan tabel dengan kableExtra
knitr::kable(df_results,
             col.names = c("x", "Linear", "Quadratic", "Cubic", "Exponential"),
             caption   = "Table 1: Computed Formula Values for x = 1 to 20",
             format    = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(df_results), 2), background = "#fcd5d5")
Table 1: Computed Formula Values for x = 1 to 20
x Linear Quadratic Cubic Exponential
1 3 6 5 1.3499
2 5 12 7 1.8221
3 7 20 17 2.4596
4 9 30 41 3.3201
5 11 42 85 4.4817
6 13 56 155 6.0496
7 15 72 257 8.1662
8 17 90 397 11.0232
9 19 110 581 14.8797
10 21 132 815 20.0855
11 23 156 1105 27.1126
12 25 182 1457 36.5982
13 27 210 1877 49.4024
14 29 240 2371 66.6863
15 31 272 2945 90.0171
16 33 306 3605 121.5104
17 35 342 4357 164.0219
18 37 380 5207 221.4064
19 39 420 6161 298.8674
20 41 462 7225 403.4288

1.4 Visualization

y_max  <- max(unlist(results))
y_min  <- min(unlist(results))
colors <- c("linear"      = "steelblue",
            "quadratic"   = "limegreen",
            "cubic"       = "tomato",
            "exponential" = "#7f1d1d")

plot(x_values, results[["linear"]],
     type = "b", col = colors["linear"],
     lwd = 2.5, pch = 16, cex = 1,
     ylim = c(y_min, y_max),
     xlab = "x values (1 to 20)",
     ylab = "y values",
     main = "Comparison of Four Mathematical Formulas (x = 1 to 20)",
     cex.main = 1.3, cex.lab = 1.1)

for (formula in c("quadratic", "cubic", "exponential")) {
  lines(x_values, results[[formula]],
        type = "b", col = colors[formula],
        lwd = 2.5, pch = 16, cex = 1)
}

grid(lty = "dashed", col = "gray85")
legend("topleft",
       legend = c("Linear", "Quadratic", "Cubic", "Exponential"),
       col    = colors,
       lwd = 2.5, pch = 16, bty = "n", cex = 1)


1.5 Interpretation

The plot clearly shows that each formula exhibits a distinctly different growth pattern as x increases from 1 to 20. The exponential formula grows the fastest, dominating all other formulas and rising steeply toward large values. The cubic formula also shows strong growth but at a notably slower rate than exponential. The quadratic formula grows moderately, while the linear formula appears nearly flat in comparison to the others. This highlights how the mathematical structure of a model directly determines the scale and rate of its output growth.

2 Nested Simulation - Multi-Sales & Discounts

2.1 Introduction

This task simulates daily sales data for multiple salespersons over several days. A nested function apply_discount() calculates conditional discount rates based on sales amount, while a cumulative_sales() function tracks running totals per salesperson. Results include summary statistics and cumulative sales visualization.


2.2 Function Definition

# ── Fungsi apply_discount (nested) ────────────────────
apply_discount <- function(sales_amount) {
  if      (sales_amount >= 900) return(0.20)
  else if (sales_amount >= 700) return(0.15)
  else if (sales_amount >= 500) return(0.10)
  else if (sales_amount >= 300) return(0.05)
  else                          return(0.00)
}

# ── Fungsi cumulative sales (nested) ──────────────────
cumulative_sales <- function(sales_list) {
  total  <- 0
  result <- c()
  for (s in sales_list) {
    total  <- total + s
    result <- c(result, total)
  }
  return(result)
}

# ── Fungsi utama simulate_sales ───────────────────────
simulate_sales <- function(n_salesperson, days) {
  data <- data.frame()

  # Loop per salesperson
  for (sp_id in 1:n_salesperson) {
    daily_sales <- c()

    # Loop per hari
    for (day in 1:days) {
      sales_amount  <- sample(100:1000, 1)
      discount_rate <- apply_discount(sales_amount)
      net_sales     <- sales_amount * (1 - discount_rate)
      daily_sales   <- c(daily_sales, sales_amount)

      data <- rbind(data, data.frame(
        sales_id      = sp_id,
        day           = day,
        sales_amount  = sales_amount,
        discount_rate = discount_rate,
        net_sales     = round(net_sales, 2)
      ))
    }
    cat("Salesperson", sp_id, "- Total Kumulatif:", sum(daily_sales), "\n")
  }
  return(data)
}

2.3 Simulate Sales Data

set.seed(42)
df_sales <- simulate_sales(n_salesperson = 5, days = 10)
## Salesperson 1 - Total Kumulatif: 3587 
## Salesperson 2 - Total Kumulatif: 6300 
## Salesperson 3 - Total Kumulatif: 6411 
## Salesperson 4 - Total Kumulatif: 4224 
## Salesperson 5 - Total Kumulatif: 5466
knitr::kable(head(df_sales, 15),
             col.names = c("Sales ID", "Day", "Sales Amount", "Discount Rate", "Net Sales"),
             caption   = "Table 2: Sales Simulation Data (First 15 Rows)",
             format    = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, 15, 2), background = "#fcd5d5")
Table 2: Sales Simulation Data (First 15 Rows)
Sales ID Day Sales Amount Discount Rate Net Sales
1 1 660 0.10 594.00
1 2 420 0.05 399.00
1 3 252 0.00 252.00
1 4 173 0.00 173.00
1 5 327 0.05 310.65
1 6 245 0.00 245.00
1 7 733 0.15 623.05
1 8 148 0.00 148.00
1 9 227 0.00 227.00
1 10 402 0.05 381.90
2 1 123 0.00 123.00
2 2 938 0.20 750.40
2 3 455 0.05 432.25
2 4 700 0.15 595.00
2 5 264 0.00 264.00

2.4 Summary Statistics

summary_sales <- aggregate(
  cbind(sales_amount, discount_rate, net_sales) ~ sales_id,
  data = df_sales,
  FUN  = function(x) round(mean(x), 2)
)
names(summary_sales) <- c("Sales ID", "Avg Sales", "Avg Discount", "Avg Net Sales")

knitr::kable(summary_sales,
             caption = "Table 3: Summary Statistics per Salesperson",
             format  = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(summary_sales), 2), background = "#fcd5d5")
Table 3: Summary Statistics per Salesperson
Sales ID Avg Sales Avg Discount Avg Net Sales
1 358.7 0.04 335.36
2 630.0 0.12 537.07
3 641.1 0.12 547.84
4 422.4 0.06 390.29
5 546.6 0.09 486.40

2.5 Visualization

par(mar      = c(5, 5, 4, 2),
    cex.main = 1.6,
    cex.lab  = 1.4,
    cex.axis = 1.2)

colors <- c("steelblue", "limegreen", "tomato", "#7f1d1d", "orange")

# Hitung semua cumulative sales dulu untuk ylim
all_cum <- c()
for (i in 1:5) {
  sp_data <- df_sales[df_sales$sales_id == i, ]
  all_cum <- c(all_cum, cumsum(sp_data$sales_amount))
}

# Plot salesperson pertama
sp1     <- df_sales[df_sales$sales_id == 1, ]
cum_sp1 <- cumsum(sp1$sales_amount)

plot(1:10, cum_sp1,
     type = "b", col = colors[1],
     lwd = 3, pch = 16, cex = 1.3,
     ylim = c(0, max(all_cum) * 1.1),
     xlab = "Day",
     ylab = "Cumulative Sales",
     main = "Cumulative Sales per Salesperson (10 Days)")

# Loop salesperson lainnya
for (i in 2:5) {
  sp_data <- df_sales[df_sales$sales_id == i, ]
  cum_sp  <- cumsum(sp_data$sales_amount)
  lines(1:10, cum_sp,
        type = "b", col = colors[i],
        lwd = 3, pch = 16, cex = 1.3)
}

grid(lty = "dashed", col = "gray85")
legend("topleft",
       legend = paste("Salesperson", 1:5),
       col    = colors,
       lwd = 3, pch = 16, bty = "n", cex = 1.1)


2.6 Interpretation

The simulation shows sales performance across 5 salespersons over 10 days. Cumulative sales grow steadily for all salespersons, reflecting consistent daily transactions. Higher discount rates are applied automatically when sales amounts exceed certain thresholds, which reduces net sales but incentivizes higher volume transactions. Differences in cumulative totals across salespersons reflect the natural variation in randomly generated daily sales.

3 Multi-Level Performance Categorization

3.1 Introduction

This task builds a function categorize_performance() that classifies sales amounts into 5 performance levels — Excellent, Very Good, Good, Average, and Poor. A loop iterates through each value, calculates the percentage per category, and results are visualized using a bar chart and pie chart.


3.2 Function Definition

# ── Fungsi kategorisasi performa ──────────────────────
categorize_performance <- function(sales_amount) {
  categories <- c()

  # Loop per nilai sales
  for (sales in sales_amount) {
    if      (sales >= 900) categories <- c(categories, "Excellent")
    else if (sales >= 700) categories <- c(categories, "Very Good")
    else if (sales >= 500) categories <- c(categories, "Good")
    else if (sales >= 300) categories <- c(categories, "Average")
    else                   categories <- c(categories, "Poor")
  }
  return(categories)
}

3.3 Categorize Performance Data

set.seed(42)
sales_data   <- sample(100:1000, 200, replace = TRUE)
performance  <- categorize_performance(sales_data)

df_perf <- data.frame(
  sales_amount = sales_data,
  performance  = performance
)

# Hitung jumlah dan persentase per kategori
category_order <- c("Excellent", "Very Good", "Good", "Average", "Poor")
counts         <- sapply(category_order, function(cat) sum(df_perf$performance == cat))
percentages    <- round(counts / length(performance) * 100, 2)

df_summary <- data.frame(
  Category   = category_order,
  Count      = counts,
  Percentage = paste0(percentages, "%")
)

knitr::kable(df_summary,
             caption  = "Table 4: Performance Category Distribution",
             format   = "html",
             row.names = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(df_summary), 2), background = "#fcd5d5")
Table 4: Performance Category Distribution
Category Count Percentage
Excellent 21 10.5%
Very Good 49 24.5%
Good 38 19%
Average 49 24.5%
Poor 43 21.5%

3.4 Visualization

par(mfrow    = c(1, 2),
    mar      = c(5, 5, 4, 2),
    cex.main = 1.4,
    cex.lab  = 1.2,
    cex.axis = 1.1)

colors_cat <- c("Excellent" = "#7f1d1d",
                "Very Good" = "#b91c1c",
                "Good"      = "#ef4444",
                "Average"   = "#fca5a5",
                "Poor"      = "#fee2e2")

# ── Bar Chart ──────────────────────────────────────────
barplot(counts,
        names.arg = category_order,
        col       = colors_cat,
        border    = "white",
        main      = "Performance Category Distribution\n(Bar Chart)",
        xlab      = "Category",
        ylab      = "Count",
        ylim      = c(0, max(counts) * 1.2))

# Tambahkan label di atas bar
text(x      = seq(0.7, by = 1.2, length.out = 5),
     y      = counts + 1.5,
     labels = paste0(counts, "\n(", percentages, "%)"),
     cex    = 1, font = 2)

grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")

# ── Pie Chart ──────────────────────────────────────────
pie(counts,
    labels  = paste0(category_order, "\n", percentages, "%"),
    col     = colors_cat,
    main    = "Performance Category Distribution\n(Pie Chart)",
    cex     = 1.1)

par(mfrow = c(1, 1))

3.5 Interpretation

The distribution of performance categories reveals the overall sales quality across 200 data points. The Good and Average categories tend to dominate, indicating that most sales fall within the mid-range. Excellent performers represent the top tier with sales above 900, while Poor performers fall below 300. This categorization helps identify which segments need improvement and which are performing well, enabling more targeted sales strategies and performance evaluations.

4 Multi-Company Dataset Simulation

4.1 Introduction

This task simulates a multi-company employee dataset using nested loops. The function generate_company_data() generates employee records including salary, department, performance score, and KPI score for each company. Employees with KPI > 90 are flagged as top performers. Results are summarized per company and visualized through multiple plots.


4.2 Function Definition

# ── Fungsi generate_company_data ──────────────────────
generate_company_data <- function(n_company, n_employees) {
  departments <- c("HR", "Finance", "Engineering", "Marketing", "Operations")
  data        <- data.frame()

  # Nested loop: per perusahaan → per karyawan
  for (company_id in 1:n_company) {
    for (emp_num in 1:n_employees) {
      salary            <- sample(4000:20000, 1)
      department        <- sample(departments, 1)
      performance_score <- round(runif(1, 50, 100), 2)
      KPI_score         <- round(runif(1, 50, 100), 2)
      is_top_performer  <- KPI_score > 90

      data <- rbind(data, data.frame(
        company_id        = paste0("Company ", company_id),
        employee_id       = paste0("C", company_id, "_E", sprintf("%03d", emp_num)),
        salary            = salary,
        department        = department,
        performance_score = performance_score,
        KPI_score         = KPI_score,
        is_top_performer  = is_top_performer
      ))
    }
  }
  return(data)
}

4.3 Generate & Display Data

set.seed(42)
df_company <- generate_company_data(n_company = 5, n_employees = 30)

knitr::kable(head(df_company[, -7], 10),
             col.names = c("Company ID", "Employee ID", "Salary",
                           "Department", "Performance Score", "KPI Score"),
             caption   = "Table 5: Company Dataset (First 10 Rows)",
             format    = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, 10, 2), background = "#fcd5d5")
Table 5: Company Dataset (First 10 Rows)
Company ID Employee ID Salary Department Performance Score KPI Score
Company 1 C1_E001 14800 Operations 64.31 91.52
Company 1 C1_E002 13289 Marketing 86.83 56.73
Company 1 C1_E003 14288 Marketing 73.11 97.00
Company 1 C1_E004 18957 Marketing 73.75 78.02
Company 1 C1_E005 14094 Engineering 99.44 97.33
Company 1 C1_E006 9402 Marketing 69.51 95.29
Company 1 C1_E007 16908 Operations 86.88 90.55
Company 1 C1_E008 13051 Engineering 91.65 50.37
Company 1 C1_E009 17609 Engineering 71.79 51.87
Company 1 C1_E010 18649 Marketing 94.39 82.00

4.4 Summary per Company

summary_company <- do.call(rbind, lapply(unique(df_company$company_id), function(comp) {
  subset_df <- df_company[df_company$company_id == comp, ]
  data.frame(
    Company        = comp,
    Total_Employees= nrow(subset_df),
    Avg_Salary     = round(mean(subset_df$salary), 2),
    Avg_Performance= round(mean(subset_df$performance_score), 2),
    Max_KPI        = round(max(subset_df$KPI_score), 2),
    Top_Performers = sum(subset_df$is_top_performer)
  )
}))

knitr::kable(summary_company,
             col.names = c("Company", "Total Employees", "Avg Salary",
                           "Avg Performance", "Max KPI", "Top Performers"),
             caption   = "Table 6: Summary Statistics per Company",
             format    = "html",
             row.names = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(summary_company), 2), background = "#fcd5d5")
Table 6: Summary Statistics per Company
Company Total Employees Avg Salary Avg Performance Max KPI Top Performers
Company 1 30 13077.33 77.37 99.14 12
Company 2 30 12434.40 75.90 99.83 3
Company 3 30 13440.43 71.97 99.54 7
Company 4 30 11192.10 74.10 95.14 7
Company 5 30 11353.83 73.26 97.75 8

4.5 Visualization

par(mfrow    = c(2, 2),
    mar      = c(5, 5, 4, 2),
    cex.main = 1.3,
    cex.lab  = 1.2,
    cex.axis = 1.1)

colors   <- c("#7f1d1d", "#b91c1c", "#ef4444", "#fca5a5", "#fee2e2")
companies <- unique(df_company$company_id)

# ── Plot 1: Avg Salary ─────────────────────────────────
barplot(summary_company$Avg_Salary,
        names.arg = paste0("C", 1:5),
        col       = colors,
        border    = "white",
        main      = "Average Salary per Company",
        xlab      = "Company",
        ylab      = "Average Salary",
        ylim      = c(0, max(summary_company$Avg_Salary) * 1.2))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")

# ── Plot 2: Avg Performance ────────────────────────────
barplot(summary_company$Avg_Performance,
        names.arg = paste0("C", 1:5),
        col       = colors,
        border    = "white",
        main      = "Average Performance Score per Company",
        xlab      = "Company",
        ylab      = "Avg Performance Score",
        ylim      = c(0, 110))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")

# ── Plot 3: Top Performers ─────────────────────────────
barplot(summary_company$Top_Performers,
        names.arg = paste0("C", 1:5),
        col       = colors,
        border    = "white",
        main      = "Top Performers per Company (KPI > 90)",
        xlab      = "Company",
        ylab      = "Number of Top Performers",
        ylim      = c(0, max(summary_company$Top_Performers) * 1.3))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")

# ── Plot 4: Scatter KPI vs Performance ────────────────
plot(df_company$performance_score, df_company$KPI_score,
     col  = colors[as.numeric(as.factor(df_company$company_id))],
     pch  = 16, cex = 1.2,
     xlab = "Performance Score",
     ylab = "KPI Score",
     main = "Performance Score vs KPI Score")
abline(h = 90, col = "black", lty = 2, lwd = 2)
legend("topleft",
       legend = paste0("Company ", 1:5),
       col    = colors,
       pch    = 16, bty = "n", cex = 0.9)
grid(lty = "dashed", col = "gray85")

par(mfrow = c(1, 1))

4.6 Interpretation

The multi-company simulation reveals variation in salary, performance, and KPI scores across 5 companies. Average salaries differ between companies due to the random generation of employee data within the same range. The scatter plot shows no strong linear relationship between performance score and KPI score, suggesting these two metrics capture different aspects of employee contribution. Companies with more top performers (KPI > 90) tend to have stronger overall KPI averages, highlighting the impact of high-performing individuals on company-level outcomes.

5 Monte Carlo Simulation - Pi & Probability

5.1 Introduction

This task estimates the value of π (Pi) using the Monte Carlo method by randomly throwing points into a unit square and checking how many fall inside a unit circle. An additional probability analysis computes the chance of a point landing in a defined sub-square. Results are visualized showing points inside vs outside the circle, and a convergence plot of π estimates.


5.2 Function Definition

# ── Fungsi monte_carlo_pi ─────────────────────────────
monte_carlo_pi <- function(n_points) {

  # Generate titik acak (x, y) antara -1 dan 1
  x <- runif(n_points, -1, 1)
  y <- runif(n_points, -1, 1)

  # Hitung jarak dari pusat (0,0)
  distances     <- sqrt(x^2 + y^2)
  inside_circle <- distances <= 1

  # Estimasi Pi
  pi_estimate <- 4 * sum(inside_circle) / n_points

  # Probabilitas titik jatuh di sub-kotak (0 s.d. 0.5)
  in_subsquare   <- sum(x >= 0 & x <= 0.5 & y >= 0 & y <= 0.5)
  prob_subsquare <- in_subsquare / n_points

  return(list(
    pi_estimate    = pi_estimate,
    x_inside       = x[inside_circle],
    y_inside       = y[inside_circle],
    x_outside      = x[!inside_circle],
    y_outside      = y[!inside_circle],
    prob_subsquare = prob_subsquare
  ))
}

5.3 Monte Carlo Results

set.seed(42)
n_list <- c(100, 1000, 10000, 100000)

# Loop untuk berbagai jumlah titik
results_mc <- do.call(rbind, lapply(n_list, function(n) {
  res <- monte_carlo_pi(n)
  data.frame(
    N_Points       = format(n, big.mark = ","),
    Pi_Estimate    = round(res$pi_estimate, 6),
    Error          = round(abs(res$pi_estimate - pi), 6),
    Prob_SubSquare = round(res$prob_subsquare, 4)
  )
}))

knitr::kable(results_mc,
             col.names = c("N Points", "Pi Estimate", "Error", "Prob Sub-Square"),
             caption   = "Table 7: Monte Carlo Pi Estimation Results",
             format    = "html",
             row.names = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(results_mc), 2), background = "#fcd5d5")
Table 7: Monte Carlo Pi Estimation Results
N Points Pi Estimate Error Prob Sub-Square
100 3.1600 0.018407 0.0700
1,000 3.1400 0.001593 0.0520
10,000 3.1396 0.001993 0.0639
1e+05 3.1390 0.002593 0.0606

5.4 Visualization

par(mfrow    = c(1, 2),
    mar      = c(5, 5, 4, 2),
    cex.main = 1.3,
    cex.lab  = 1.2,
    cex.axis = 1.1)

# ── Plot 1: Titik dalam vs luar lingkaran (n=1000) ────
set.seed(42)
res1000 <- monte_carlo_pi(1000)

plot(res1000$x_outside, res1000$y_outside,
     col = "#fca5a5", pch = 16, cex = 0.8,
     xlim = c(-1.1, 1.1), ylim = c(-1.1, 1.1),
     xlab = "x", ylab = "y",
     main = paste0("Monte Carlo Simulation (n=1,000)\nEstimated π = ",
                   round(res1000$pi_estimate, 5)),
     asp  = 1)
points(res1000$x_inside, res1000$y_inside,
       col = "#7f1d1d", pch = 16, cex = 0.8)

# Gambar lingkaran
theta <- seq(0, 2*pi, length.out = 300)
lines(cos(theta), sin(theta), col = "black", lwd = 2)

# Gambar sub-kotak
rect(0, 0, 0.5, 0.5, border = "darkgreen", lwd = 2, lty = 2)

legend("topleft",
       legend = c("Inside Circle", "Outside Circle", "Sub-Square"),
       col    = c("#7f1d1d", "#fca5a5", "darkgreen"),
       pch    = c(16, 16, NA), lty = c(NA, NA, 2),
       bty    = "n", cex = 0.95)
grid(lty = "dashed", col = "gray85")

# ── Plot 2: Konvergensi estimasi Pi ───────────────────
set.seed(42)
n_iter       <- 500
pi_conv      <- c()
inside_count <- 0

for (i in 1:n_iter) {
  x_r <- runif(1, -1, 1)
  y_r <- runif(1, -1, 1)
  if (x_r^2 + y_r^2 <= 1) inside_count <- inside_count + 1
  pi_conv <- c(pi_conv, 4 * inside_count / i)
}

plot(1:n_iter, pi_conv,
     type = "l", col = "#7f1d1d", lwd = 2,
     xlab = "Number of Iterations",
     ylab = "Estimated π",
     main = "Convergence of π Estimation\n(Monte Carlo)")
abline(h = pi, col = "black", lty = 2, lwd = 2)
legend("topright",
       legend = c("Estimated π", paste0("True π = ", round(pi, 5))),
       col    = c("#7f1d1d", "black"),
       lty    = c(1, 2), lwd = 2, bty = "n", cex = 0.95)
grid(lty = "dashed", col = "gray85")

par(mfrow = c(1, 1))

5.5 Interpretation

The Monte Carlo simulation demonstrates that as the number of random points increases, the estimated value of π converges closer to its true value (3.14159). With only 100 points the estimate is quite inaccurate, but with 100,000 points the error becomes very small. The convergence plot confirms this trend — the estimated π fluctuates widely at first, then stabilizes around the true value as iterations increase. The probability of a point landing in the sub-square (0 to 0.5) is approximately 0.0625, consistent with the theoretical value of (0.5 × 0.5) / (2 × 2) = 0.0625.

6 Advanced Data Transformation & Feature Engineering

6.1 Introduction

This task applies two transformation techniques — Min-Max Normalization and Z-Score Standardization — to numerical columns using loop-based functions. New features are also engineered from existing data: performance_category, salary_bracket, and KPI_tier. Distributions before and after transformation are compared using histograms and boxplots.


6.2 Function Definition

# ── Fungsi normalize_columns (Min-Max) ────────────────
normalize_columns <- function(df, columns) {
  df_norm <- df
  for (col in columns) {
    min_val <- min(df_norm[[col]])
    max_val <- max(df_norm[[col]])
    df_norm[[paste0(col, "_normalized")]] <- (df_norm[[col]] - min_val) / (max_val - min_val)
  }
  return(df_norm)
}

# ── Fungsi z_score ────────────────────────────────────
z_score <- function(df, columns) {
  df_z <- df
  for (col in columns) {
    mean_val <- mean(df_z[[col]])
    std_val  <- sd(df_z[[col]])
    df_z[[paste0(col, "_zscore")]] <- (df_z[[col]] - mean_val) / std_val
  }
  return(df_z)
}

# ── Fungsi feature engineering ────────────────────────
create_features <- function(df) {
  perf_cat    <- c()
  sal_bracket <- c()
  kpi_tier    <- c()

  for (i in 1:nrow(df)) {
    # Performance category
    if      (df$performance_score[i] >= 90) perf_cat <- c(perf_cat, "Excellent")
    else if (df$performance_score[i] >= 75) perf_cat <- c(perf_cat, "Very Good")
    else if (df$performance_score[i] >= 60) perf_cat <- c(perf_cat, "Good")
    else                                    perf_cat <- c(perf_cat, "Average")

    # Salary bracket
    if      (df$salary[i] >= 16000) sal_bracket <- c(sal_bracket, "Very High")
    else if (df$salary[i] >= 11000) sal_bracket <- c(sal_bracket, "High")
    else if (df$salary[i] >= 7000)  sal_bracket <- c(sal_bracket, "Medium")
    else                            sal_bracket <- c(sal_bracket, "Low")

    # KPI tier
    if      (df$KPI_score[i] >= 90) kpi_tier <- c(kpi_tier, "Platinum")
    else if (df$KPI_score[i] >= 75) kpi_tier <- c(kpi_tier, "Gold")
    else if (df$KPI_score[i] >= 60) kpi_tier <- c(kpi_tier, "Silver")
    else                            kpi_tier <- c(kpi_tier, "Bronze")
  }

  df$performance_category <- perf_cat
  df$salary_bracket       <- sal_bracket
  df$KPI_tier             <- kpi_tier
  return(df)
}

6.3 Generate Data & Apply Transformation

set.seed(42)
n           <- 200
departments <- c("HR", "Finance", "Engineering", "Marketing", "Operations")

df_raw <- data.frame(
  employee_id       = paste0("E", sprintf("%03d", 1:n)),
  salary            = sample(4000:20000, n, replace = TRUE),
  department        = sample(departments, n, replace = TRUE),
  performance_score = round(runif(n, 50, 100), 2),
  KPI_score         = round(runif(n, 50, 100), 2)
)

num_cols    <- c("salary", "performance_score", "KPI_score")
df_norm     <- normalize_columns(df_raw, num_cols)
df_z        <- z_score(df_raw, num_cols)
df_featured <- create_features(df_raw)

# Tampilkan tabel normalisasi
knitr::kable(head(df_norm[, c("employee_id", "salary", "salary_normalized",
                               "performance_score", "performance_score_normalized")], 8),
             col.names = c("Employee ID", "Salary", "Salary Normalized",
                           "Performance", "Performance Normalized"),
             caption   = "Table 8: Min-Max Normalization Results (First 8 Rows)",
             format    = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, 8, 2), background = "#fcd5d5")
Table 8: Min-Max Normalization Results (First 8 Rows)
Employee ID Salary Salary Normalized Performance Performance Normalized
E001 14800 0.6751596 87.96 0.7621535
E002 16260 0.7665582 65.26 0.3061470
E003 6368 0.1473019 58.28 0.1659301
E004 9272 0.3290973 51.64 0.0325432
E005 13289 0.5805684 56.83 0.1368019
E006 5251 0.0773757 58.86 0.1775814
E007 19505 0.9697008 75.98 0.5214946
E008 12825 0.5515212 90.56 0.8143833

6.4 Feature Engineering Results

# Ringkasan fitur baru
feat_summary <- data.frame(
  Feature    = c("performance_category", "salary_bracket", "KPI_tier"),
  Categories = c(paste(unique(df_featured$performance_category), collapse = ", "),
                 paste(unique(df_featured$salary_bracket),       collapse = ", "),
                 paste(unique(df_featured$KPI_tier),             collapse = ", "))
)

knitr::kable(feat_summary,
             col.names = c("New Feature", "Categories"),
             caption   = "Table 9: New Engineered Features",
             format    = "html",
             row.names = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(feat_summary), 2), background = "#fcd5d5")
Table 9: New Engineered Features
New Feature Categories
performance_category Very Good, Good, Average, Excellent
salary_bracket High, Very High, Low, Medium
KPI_tier Gold, Platinum, Silver, Bronze

6.5 Visualization

par(mfrow    = c(2, 3),
    mar      = c(5, 5, 4, 2),
    cex.main = 1.2,
    cex.lab  = 1.1,
    cex.axis = 1.0)

# ── Histogram: Sebelum vs Sesudah Normalisasi ──────────
cols_before <- c("salary", "performance_score", "KPI_score")
cols_after  <- c("salary_normalized", "performance_score_normalized", "KPI_score_normalized")
titles_b    <- c("Salary (Before)", "Performance Score (Before)", "KPI Score (Before)")
titles_a    <- c("Salary Normalized (After)", "Performance Normalized (After)", "KPI Normalized (After)")

for (i in 1:3) {
  hist(df_raw[[cols_before[i]]],
       col    = "#fca5a5",
       border = "white",
       main   = titles_b[i],
       xlab   = "Value",
       ylab   = "Frequency")
  grid(lty = "dashed", col = "gray85")
}

for (i in 1:3) {
  hist(df_norm[[cols_after[i]]],
       col    = "#7f1d1d",
       border = "white",
       main   = titles_a[i],
       xlab   = "Normalized Value (0-1)",
       ylab   = "Frequency")
  grid(lty = "dashed", col = "gray85")
}

par(mfrow = c(1, 1))
par(mfrow    = c(1, 2),
    mar      = c(6, 5, 4, 2),
    cex.main = 1.2,
    cex.lab  = 1.1,
    cex.axis = 1.0)

# ── Boxplot: Data Asli vs Z-Score ─────────────────────
boxplot(df_raw[, num_cols],
        col    = "#fca5a5",
        border = "#7f1d1d",
        main   = "Original Data (Boxplot)",
        ylab   = "Value",
        las    = 2)
grid(lty = "dashed", col = "gray85")

z_cols <- paste0(num_cols, "_zscore")
boxplot(df_z[, z_cols],
        col    = "#7f1d1d",
        border = "#b91c1c",
        main   = "After Z-Score Standardization (Boxplot)",
        names  = c("Salary\nZ-Score", "Performance\nZ-Score", "KPI\nZ-Score"),
        ylab   = "Z-Score Value",
        las    = 2)
grid(lty = "dashed", col = "gray85")

par(mfrow = c(1, 1))

6.6 Interpretation

Min-Max normalization rescales all numerical columns to a range of 0 to 1, making them directly comparable regardless of their original scale. Z-Score standardization transforms the data so that each column has a mean of 0 and standard deviation of 1, which is useful for algorithms sensitive to data scale. The histograms confirm that the shape of the distribution is preserved after both transformations — only the scale changes, not the underlying pattern. The newly engineered features — performance_category, salary_bracket, and KPI_tier — provide meaningful categorical labels that simplify further analysis and reporting.

7 Mini Project - Company KPI Dashboard & Simulation

7.1 Introduction

This mini project generates a complete employee dataset for 7 companies with 50–200 employees each. The dataset includes employee ID, company ID, salary, department, performance score, and KPI score. Employees are categorized into KPI tiers using a loop-based function. Results are summarized per company and visualized through multiple advanced plots including grouped bar charts, scatter plots with regression lines, and salary distribution boxplots.


7.2 Function Definition

# ── Fungsi generate dataset ───────────────────────────
generate_dashboard_data <- function(n_company, min_emp, max_emp) {
  departments <- c("HR", "Finance", "Engineering", "Marketing", "Operations")
  data        <- data.frame()

  # Nested loop: per perusahaan → per karyawan
  for (company_id in 1:n_company) {
    n_emp <- sample(min_emp:max_emp, 1)
    for (emp_num in 1:n_emp) {
      salary            <- sample(4000:20000, 1)
      department        <- sample(departments, 1)
      performance_score <- round(runif(1, 50, 100), 2)
      KPI_score         <- round(runif(1, 50, 100), 2)

      data <- rbind(data, data.frame(
        employee_id       = paste0("C", company_id, "_E", sprintf("%03d", emp_num)),
        company_id        = paste0("Company ", company_id),
        salary            = salary,
        department        = department,
        performance_score = performance_score,
        KPI_score         = KPI_score
      ))
    }
  }
  return(data)
}

# ── Fungsi kategorisasi KPI tier ──────────────────────
categorize_kpi <- function(df) {
  tiers <- c()
  for (kpi in df$KPI_score) {
    if      (kpi >= 90) tiers <- c(tiers, "Platinum")
    else if (kpi >= 75) tiers <- c(tiers, "Gold")
    else if (kpi >= 60) tiers <- c(tiers, "Silver")
    else                tiers <- c(tiers, "Bronze")
  }
  return(tiers)
}

# ── Fungsi summary per perusahaan ─────────────────────
summarize_companies <- function(df) {
  companies <- unique(df$company_id)
  summary   <- data.frame()

  for (comp in companies) {
    subset_df <- df[df$company_id == comp, ]
    summary   <- rbind(summary, data.frame(
      Company         = comp,
      Total_Employees = nrow(subset_df),
      Avg_Salary      = round(mean(subset_df$salary), 2),
      Avg_KPI         = round(mean(subset_df$KPI_score), 2),
      Avg_Performance = round(mean(subset_df$performance_score), 2),
      Top_Performers  = sum(subset_df$KPI_score > 90)
    ))
  }
  return(summary)
}

7.3 Generate Data

set.seed(42)
df <- generate_dashboard_data(n_company = 7, min_emp = 50, max_emp = 200)
df$KPI_tier <- categorize_kpi(df)

cat("Total Karyawan:", nrow(df), "\n")
## Total Karyawan: 841
cat("Total Perusahaan:", length(unique(df$company_id)), "\n")
## Total Perusahaan: 7
knitr::kable(head(df, 10),
             col.names = c("Employee ID", "Company ID", "Salary",
                           "Department", "Performance Score", "KPI Score", "KPI Tier"),
             caption   = "Table 10: Company Dataset Preview (First 10 Rows)",
             format    = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, 10, 2), background = "#fcd5d5")
Table 10: Company Dataset Preview (First 10 Rows)
Employee ID Company ID Salary Department Performance Score KPI Score KPI Tier
C1_E001 Company 1 16260 HR 91.52 82.09 Gold
C1_E002 Company 1 5251 Finance 56.73 82.85 Gold
C1_E003 Company 1 17439 Marketing 73.11 97.00 Platinum
C1_E004 Company 1 18957 Marketing 73.75 78.02 Gold
C1_E005 Company 1 14094 Engineering 99.44 97.33 Platinum
C1_E006 Company 1 9402 Marketing 69.51 95.29 Platinum
C1_E007 Company 1 16908 Operations 86.88 90.55 Platinum
C1_E008 Company 1 13051 Engineering 91.65 50.37 Bronze
C1_E009 Company 1 17609 Engineering 71.79 51.87 Bronze
C1_E010 Company 1 18649 Marketing 94.39 82.00 Gold

7.4 Summary per Company

summary_df <- summarize_companies(df)

knitr::kable(summary_df,
             col.names = c("Company", "Total Employees", "Avg Salary",
                           "Avg KPI", "Avg Performance", "Top Performers"),
             caption   = "Table 11: Summary Statistics per Company",
             format    = "html",
             row.names = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(summary_df), 2), background = "#fcd5d5")
Table 11: Summary Statistics per Company
Company Total Employees Avg Salary Avg KPI Avg Performance Top Performers
Company 1 98 12785.39 76.29 74.94 22
Company 2 95 11197.06 76.14 74.10 25
Company 3 76 11693.68 74.08 74.52 14
Company 4 129 11852.91 73.62 74.82 17
Company 5 168 11934.82 75.65 75.51 41
Company 6 142 12184.23 75.98 73.79 30
Company 7 133 11858.23 75.76 73.67 34

7.5 Top Performers per Company

top_perf <- df[df$KPI_score > 90, ]

top_table <- do.call(rbind, lapply(unique(df$company_id), function(comp) {
  subset_top <- top_perf[top_perf$company_id == comp, ]
  if (nrow(subset_top) > 0) {
    head(subset_top[order(-subset_top$KPI_score),
                    c("employee_id", "company_id", "department",
                      "salary", "performance_score", "KPI_score", "KPI_tier")], 3)
  }
}))

knitr::kable(top_table,
             col.names = c("Employee ID", "Company", "Department",
                           "Salary", "Performance", "KPI Score", "KPI Tier"),
             caption   = "Table 12: Top 3 Performers per Company (KPI > 90)",
             format    = "html",
             row.names = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(top_table), 2), background = "#fcd5d5")
Table 12: Top 3 Performers per Company (KPI > 90)
Employee ID Company Department Salary Performance KPI Score KPI Tier
C1_E050 Company 1 Finance 10626 50.88 99.83 Platinum
C1_E068 Company 1 Finance 7070 95.05 99.54 Platinum
C1_E013 Company 1 HR 15618 83.78 99.14 Platinum
C2_E066 Company 2 HR 7489 97.23 99.61 Platinum
C2_E043 Company 2 Finance 10826 60.23 97.75 Platinum
C2_E089 Company 2 HR 6419 71.73 96.81 Platinum
C3_E042 Company 3 HR 12940 70.48 98.50 Platinum
C3_E040 Company 3 Finance 8226 73.71 97.46 Platinum
C3_E049 Company 3 Finance 5535 67.89 96.61 Platinum
C4_E091 Company 4 Engineering 7261 59.55 99.97 Platinum
C4_E072 Company 4 Engineering 16350 55.75 99.89 Platinum
C4_E123 Company 4 Finance 6518 55.15 99.27 Platinum
C5_E028 Company 5 Operations 18944 59.55 99.99 Platinum
C5_E060 Company 5 Marketing 17872 84.50 99.77 Platinum
C5_E032 Company 5 Marketing 10836 86.00 99.17 Platinum
C6_E009 Company 6 Engineering 10282 93.18 99.96 Platinum
C6_E139 Company 6 HR 9008 90.55 99.51 Platinum
C6_E078 Company 6 Operations 13529 58.37 99.24 Platinum
C7_E074 Company 7 Engineering 7871 84.06 98.52 Platinum
C7_E077 Company 7 Marketing 18225 53.03 98.52 Platinum
C7_E095 Company 7 HR 7629 97.21 98.41 Platinum

7.6 Visualization

colors    <- c("#7f1d1d","#b91c1c","#ef4444","#f97316","#fca5a5","#fcd5d5","#fee2e2")
companies <- unique(df$company_id)

par(mfrow    = c(2, 2),
    mar      = c(6, 5, 4, 2),
    cex.main = 1.2,
    cex.lab  = 1.1,
    cex.axis = 0.95)

# ── Plot 1: Avg Salary per Company ────────────────────
barplot(summary_df$Avg_Salary,
        names.arg = paste0("C", 1:7),
        col       = colors,
        border    = "white",
        main      = "Average Salary per Company",
        xlab      = "Company",
        ylab      = "Average Salary",
        ylim      = c(0, max(summary_df$Avg_Salary) * 1.2))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")

# ── Plot 2: Avg KPI per Company ───────────────────────
barplot(summary_df$Avg_KPI,
        names.arg = paste0("C", 1:7),
        col       = colors,
        border    = "white",
        main      = "Average KPI Score per Company",
        xlab      = "Company",
        ylab      = "Average KPI Score",
        ylim      = c(0, max(summary_df$Avg_KPI) * 1.2))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")

# ── Plot 3: Top Performers per Company ───────────────
barplot(summary_df$Top_Performers,
        names.arg = paste0("C", 1:7),
        col       = colors,
        border    = "white",
        main      = "Top Performers per Company (KPI > 90)",
        xlab      = "Company",
        ylab      = "Number of Top Performers",
        ylim      = c(0, max(summary_df$Top_Performers) * 1.3))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")

# ── Plot 4: Scatter Performance vs KPI + Regression ──
plot(df$performance_score, df$KPI_score,
     col  = colors[as.numeric(as.factor(df$company_id))],
     pch  = 16, cex = 0.9,
     xlab = "Performance Score",
     ylab = "KPI Score",
     main = "Performance vs KPI Score\n(with Regression Line)")

# Regression line
fit <- lm(KPI_score ~ performance_score, data = df)
abline(fit, col = "black", lwd = 2.5, lty = 2)
legend("topleft",
       legend = c(paste0("C", 1:7), "Regression"),
       col    = c(colors, "black"),
       pch    = c(rep(16, 7), NA),
       lty    = c(rep(NA, 7), 2),
       lwd    = c(rep(NA, 7), 2),
       bty    = "n", cex = 0.8)
grid(lty = "dashed", col = "gray85")

par(mfrow = c(1, 1))
par(mfrow    = c(1, 2),
    mar      = c(6, 5, 4, 2),
    cex.main = 1.2,
    cex.lab  = 1.1,
    cex.axis = 0.95)

# ── Plot 5: Salary Distribution Boxplot ───────────────
salary_list <- lapply(companies, function(comp) {
  df[df$company_id == comp, "salary"]
})

boxplot(salary_list,
        names  = paste0("C", 1:7),
        col    = colors,
        border = "#7f1d1d",
        main   = "Salary Distribution per Company",
        xlab   = "Company",
        ylab   = "Salary",
        las    = 1)
grid(lty = "dashed", col = "gray85")

# ── Plot 6: KPI Tier Distribution ─────────────────────
tier_order  <- c("Platinum", "Gold", "Silver", "Bronze")
tier_colors <- c("#7f1d1d", "#b91c1c", "#ef4444", "#fca5a5")
tier_counts <- sapply(tier_order, function(t) sum(df$KPI_tier == t))

pie(tier_counts,
    labels  = paste0(tier_order, "\n", round(tier_counts/sum(tier_counts)*100, 1), "%"),
    col     = tier_colors,
    main    = "KPI Tier Distribution\n(All Companies)",
    cex     = 1.0)

par(mfrow = c(1, 1))
par(mar      = c(7, 5, 4, 2),
    cex.main = 1.2,
    cex.lab  = 1.1,
    cex.axis = 0.85)

# ── Plot 7: Grouped Bar Chart - Avg KPI per Dept ──────
dept_list   <- c("HR", "Finance", "Engineering", "Marketing", "Operations")
n_dept      <- length(dept_list)
n_comp      <- length(companies)
dept_matrix <- matrix(0, nrow = n_dept, ncol = n_comp)

for (i in 1:n_dept) {
  for (j in 1:n_comp) {
    subset_dj <- df[df$department == dept_list[i] & df$company_id == companies[j], ]
    dept_matrix[i, j] <- if (nrow(subset_dj) > 0) round(mean(subset_dj$KPI_score), 2) else 0
  }
}

barplot(dept_matrix,
        beside    = TRUE,
        names.arg = paste0("C", 1:7),
        col       = c("#7f1d1d","#b91c1c","#ef4444","#f97316","#fca5a5"),
        border    = "white",
        main      = "Average KPI Score per Department per Company\n(Grouped Bar Chart)",
        xlab      = "Company",
        ylab      = "Average KPI Score",
        ylim      = c(0, 110),
        las       = 1)

legend("topright",
       legend = dept_list,
       fill   = c("#7f1d1d","#b91c1c","#ef4444","#f97316","#fca5a5"),
       bty    = "n", cex = 0.9)
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")


7.7 Interpretation

The KPI dashboard reveals meaningful differences in employee performance across 7 companies. Average salaries are relatively consistent across companies, reflecting a uniform salary range in the simulation. The scatter plot with regression line shows no strong linear relationship between performance score and KPI score, suggesting these metrics measure different dimensions of employee contribution. The grouped bar chart highlights that KPI scores vary across departments within each company, with Engineering and Finance typically showing stronger KPI performance. The salary boxplot confirms a wide spread of salaries within each company, indicating diverse employee compensation levels. The KPI tier pie chart shows that the majority of employees fall in the Silver and Gold tiers, with a smaller proportion reaching Platinum status.

8 Automated Report Generation per Company

8.1 Introduction

This bonus task builds an automated report generation system using functions and loops. For each company in the dataset, a structured summary is generated automatically — including key statistics, top performers, department breakdown, and visualizations. The system loops through all companies and compiles results into a unified HTML-ready output, simulating a real-world automated reporting pipeline.


8.2 Function Definition

# ── Fungsi generate_single_report ─────────────────────
generate_single_report <- function(df, company_name) {
  subset_df <- df[df$company_id == company_name, ]

  # Basic stats
  total_emp  <- nrow(subset_df)
  avg_salary <- round(mean(subset_df$salary), 2)
  avg_kpi    <- round(mean(subset_df$KPI_score), 2)
  avg_perf   <- round(mean(subset_df$performance_score), 2)
  top_count  <- sum(subset_df$KPI_score > 90)
  max_kpi    <- round(max(subset_df$KPI_score), 2)
  min_salary <- min(subset_df$salary)
  max_salary <- max(subset_df$salary)

  # KPI tier distribution
  tier_order  <- c("Platinum", "Gold", "Silver", "Bronze")
  tier_counts <- sapply(tier_order, function(t) sum(subset_df$KPI_tier == t))
  tier_pct    <- round(tier_counts / total_emp * 100, 1)

  # Department breakdown
  dept_summary <- do.call(rbind, lapply(unique(subset_df$department), function(dept) {
    dept_df <- subset_df[subset_df$department == dept, ]
    data.frame(
      Department     = dept,
      Count          = nrow(dept_df),
      Avg_Salary     = round(mean(dept_df$salary), 2),
      Avg_KPI        = round(mean(dept_df$KPI_score), 2),
      Top_Performers = sum(dept_df$KPI_score > 90)
    )
  }))
  dept_summary <- dept_summary[order(-dept_summary$Avg_KPI), ]

  return(list(
    company      = company_name,
    total_emp    = total_emp,
    avg_salary   = avg_salary,
    avg_kpi      = avg_kpi,
    avg_perf     = avg_perf,
    top_count    = top_count,
    max_kpi      = max_kpi,
    min_salary   = min_salary,
    max_salary   = max_salary,
    tier_counts  = tier_counts,
    tier_pct     = tier_pct,
    dept_summary = dept_summary,
    subset_df    = subset_df
  ))
}

# ── Fungsi print_report_table ─────────────────────────
print_report_table <- function(report) {
  summary_tbl <- data.frame(
    Metric = c("Total Employees", "Average Salary", "Average KPI Score",
               "Average Performance", "Top Performers (KPI > 90)",
               "Highest KPI Score", "Salary Range"),
    Value  = c(
      report$total_emp,
      paste0("$", format(report$avg_salary, big.mark = ",")),
      report$avg_kpi,
      report$avg_perf,
      paste0(report$top_count, " employees"),
      report$max_kpi,
      paste0("$", format(report$min_salary, big.mark = ","),
             " - $", format(report$max_salary, big.mark = ","))
    )
  )

  print(
    knitr::kable(summary_tbl,
                 col.names = c("Metric", "Value"),
                 format    = "html",
                 row.names = FALSE) %>%
      kable_styling(bootstrap_options = c("striped", "hover"),
                    full_width        = TRUE) %>%
      row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
      row_spec(seq(2, nrow(summary_tbl), 2), background = "#fcd5d5")
  )
}

# ── Fungsi print_dept_table ───────────────────────────
print_dept_table <- function(report) {
  print(
    knitr::kable(report$dept_summary,
                 col.names = c("Department", "Headcount", "Avg Salary",
                               "Avg KPI", "Top Performers"),
                 format    = "html",
                 row.names = FALSE) %>%
      kable_styling(bootstrap_options = c("striped", "hover"),
                    full_width        = TRUE) %>%
      row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
      row_spec(seq(2, nrow(report$dept_summary), 2), background = "#fcd5d5")
  )
}

# ── Fungsi plot_company_report ────────────────────────
plot_company_report <- function(report) {
  tier_order  <- c("Platinum", "Gold", "Silver", "Bronze")
  tier_colors <- c("#7f1d1d", "#b91c1c", "#ef4444", "#fca5a5")
  subset_df   <- report$subset_df

  par(mfrow    = c(1, 3),
      mar      = c(5, 5, 4, 2),
      cex.main = 1.2,
      cex.lab  = 1.1,
      cex.axis = 1.0)

  # ── Plot 1: KPI Tier Pie Chart ─────────────────────
  pie(report$tier_counts,
      labels = paste0(tier_order, "\n", report$tier_pct, "%"),
      col    = tier_colors,
      main   = paste0(report$company, "\nKPI Tier Distribution"),
      cex    = 0.95)

  # ── Plot 2: Salary Distribution Histogram ──────────
  hist(subset_df$salary,
       col    = "#fca5a5",
       border = "white",
       main   = paste0(report$company, "\nSalary Distribution"),
       xlab   = "Salary",
       ylab   = "Frequency")
  abline(v   = report$avg_salary,
         col = "#7f1d1d", lwd = 2, lty = 2)
  legend("topright",
         legend = paste0("Avg: $", format(report$avg_salary, big.mark = ",")),
         col    = "#7f1d1d", lty = 2, lwd = 2,
         bty    = "n", cex = 0.9)
  grid(lty = "dashed", col = "gray85")

  # ── Plot 3: Scatter Performance vs KPI ─────────────
  plot(subset_df$performance_score, subset_df$KPI_score,
       col  = "#7f1d1d", pch = 16, cex = 1.1,
       xlab = "Performance Score",
       ylab = "KPI Score",
       main = paste0(report$company, "\nPerformance vs KPI"))
  fit <- lm(KPI_score ~ performance_score, data = subset_df)
  abline(fit, col = "black", lwd = 2, lty = 2)
  abline(h = 90, col = "#b91c1c", lwd = 1.5, lty = 3)
  legend("topleft",
         legend = c("Regression", "KPI = 90"),
         col    = c("black", "#b91c1c"),
         lty    = c(2, 3), lwd = 2,
         bty    = "n", cex = 0.85)
  grid(lty = "dashed", col = "gray85")

  par(mfrow = c(1, 1))
}

8.3 Generate All Company Reports

The loop below automatically generates a complete structured report for each company in the dataset — including summary statistics, department breakdown table, and three visualizations. This simulates an automated reporting pipeline where output is produced per entity without manual intervention.

companies_list <- unique(df$company_id)

# ── Loop utama: generate report per perusahaan ────────
for (comp in companies_list) {

  # Header perusahaan
  cat(paste0("\n\n### ", comp, "\n\n"))

  # Generate report object
  report <- generate_single_report(df, comp)

  # Summary stats table
  cat("**Summary Statistics**\n\n")
  print_report_table(report)

  # Department breakdown table
  cat("\n\n**Department Breakdown**\n\n")
  print_dept_table(report)

  # Visualizations
  cat("\n\n")
  plot_company_report(report)

  cat("\n\n---\n\n")
}

8.3.1 Company 1

Summary Statistics

Metric Value
Total Employees 98
Average Salary $12,785.39
Average KPI Score 76.29
Average Performance 74.94
Top Performers (KPI > 90) 22 employees
Highest KPI Score 99.83
Salary Range $4,001 - $19,721

Department Breakdown

Department Headcount Avg Salary Avg KPI Top Performers
Finance 17 10293.00 78.95 3
Marketing 19 12195.68 78.33 5
HR 28 13891.96 76.32 6
Engineering 14 11845.21 74.92 5
Operations 20 14573.05 73.03 3


8.3.2 Company 2

Summary Statistics

Metric Value
Total Employees 95
Average Salary $11,197.06
Average KPI Score 76.14
Average Performance 74.1
Top Performers (KPI > 90) 25 employees
Highest KPI Score 99.61
Salary Range $4,109 - $19,928

Department Breakdown

Department Headcount Avg Salary Avg KPI Top Performers
Marketing 14 11035.57 79.80 4
Finance 28 11174.54 77.21 8
HR 18 9884.22 76.63 5
Operations 18 12555.61 74.55 4
Engineering 17 11318.76 72.56 4


8.3.3 Company 3

Summary Statistics

Metric Value
Total Employees 76
Average Salary $11,693.68
Average KPI Score 74.08
Average Performance 74.52
Top Performers (KPI > 90) 14 employees
Highest KPI Score 98.5
Salary Range $4,416 - $19,949

Department Breakdown

Department Headcount Avg Salary Avg KPI Top Performers
Finance 17 9687.94 79.77 6
HR 18 10905.22 77.94 3
Marketing 14 12814.57 72.88 3
Engineering 19 13073.21 71.91 2
Operations 8 12492.00 60.57 0


8.3.4 Company 4

Summary Statistics

Metric Value
Total Employees 129
Average Salary $11,852.91
Average KPI Score 73.62
Average Performance 74.82
Top Performers (KPI > 90) 17 employees
Highest KPI Score 99.97
Salary Range $4,083 - $19,855

Department Breakdown

Department Headcount Avg Salary Avg KPI Top Performers
Engineering 32 10565.44 78.36 8
Operations 24 12426.54 72.66 1
Marketing 23 12826.91 72.32 5
Finance 25 12323.04 72.03 3
HR 25 11584.00 71.28 0


8.3.5 Company 5

Summary Statistics

Metric Value
Total Employees 168
Average Salary $11,934.82
Average KPI Score 75.65
Average Performance 75.51
Top Performers (KPI > 90) 41 employees
Highest KPI Score 99.99
Salary Range $4,091 - $19,916

Department Breakdown

Department Headcount Avg Salary Avg KPI Top Performers
Finance 31 12049.68 79.08 12
Engineering 29 11635.17 76.53 7
HR 37 12636.59 76.07 8
Marketing 39 11867.87 75.03 10
Operations 32 11365.28 71.79 4


8.3.6 Company 6

Summary Statistics

Metric Value
Total Employees 142
Average Salary $12,184.23
Average KPI Score 75.98
Average Performance 73.79
Top Performers (KPI > 90) 30 employees
Highest KPI Score 99.96
Salary Range $4,096 - $19,955

Department Breakdown

Department Headcount Avg Salary Avg KPI Top Performers
Finance 23 12689.96 78.75 5
Marketing 27 11231.67 77.04 5
Operations 36 12376.19 75.72 9
Engineering 26 12510.42 75.64 7
HR 30 12140.73 73.48 4


8.3.7 Company 7

Summary Statistics

Metric Value
Total Employees 133
Average Salary $11,858.23
Average KPI Score 75.76
Average Performance 73.67
Top Performers (KPI > 90) 34 employees
Highest KPI Score 98.52
Salary Range $4,002 - $19,922

Department Breakdown

Department Headcount Avg Salary Avg KPI Top Performers
HR 24 11913.12 82.09 9
Engineering 26 12432.00 77.60 10
Finance 28 11077.61 77.49 10
Operations 26 10503.73 71.76 2
Marketing 29 13266.45 70.79 3



8.4 Consolidated Report Summary

# ── Tabel ringkasan semua perusahaan ──────────────────
all_reports <- lapply(companies_list, function(comp) {
  r <- generate_single_report(df, comp)
  data.frame(
    Company         = r$company,
    Total_Employees = r$total_emp,
    Avg_Salary      = r$avg_salary,
    Avg_KPI         = r$avg_kpi,
    Avg_Performance = r$avg_perf,
    Top_Performers  = r$top_count,
    Max_KPI         = r$max_kpi
  )
})

consolidated <- do.call(rbind, all_reports)

knitr::kable(consolidated,
             col.names = c("Company", "Employees", "Avg Salary",
                           "Avg KPI", "Avg Performance",
                           "Top Performers", "Max KPI"),
             caption   = "Table 13: Consolidated Report — All Companies",
             format    = "html",
             row.names = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(consolidated), 2), background = "#fcd5d5")
Table 13: Consolidated Report — All Companies
Company Employees Avg Salary Avg KPI Avg Performance Top Performers Max KPI
Company 1 98 12785.39 76.29 74.94 22 99.83
Company 2 95 11197.06 76.14 74.10 25 99.61
Company 3 76 11693.68 74.08 74.52 14 98.50
Company 4 129 11852.91 73.62 74.82 17 99.97
Company 5 168 11934.82 75.65 75.51 41 99.99
Company 6 142 12184.23 75.98 73.79 30 99.96
Company 7 133 11858.23 75.76 73.67 34 98.52

8.5 Export to CSV

# ── Export ringkasan ke CSV ───────────────────────────
write.csv(consolidated,
          file      = "company_report_summary.csv",
          row.names = FALSE)

cat("Report exported successfully: company_report_summary.csv\n")
## Report exported successfully: company_report_summary.csv
cat("  Rows:", nrow(consolidated), "| Columns:", ncol(consolidated), "\n")
##   Rows: 7 | Columns: 7

8.6 Interpretation

The automated report generation system successfully loops through all 7 companies, producing a structured report for each without any manual repetition. Each company report includes a summary statistics table, department breakdown, and three visualizations: KPI tier distribution, salary histogram, and a performance vs KPI scatter plot with regression line. The consolidated summary table aggregates all companies into a single comparative view, making it easy to identify which companies have the highest average KPI, salary, or top performer count. The CSV export ensures the results can be shared or used in downstream reporting tools. This approach demonstrates how functions and loops together create scalable, reusable reporting pipelines — a core pattern in real-world data science workflows.