Data Science Programming

Prakticum Week 4

Logo

1 Dynamic Multi-Formula Function

1.1 Introduction

This task builds a function compute_formula(x, formula) that calculates four mathematical formulas — linear, quadratic, cubic, and exponential — for x = 1 to 20. Nested loops are used to compute all formulas at once, with input validation to ensure only valid formula names are accepted. Results are displayed in a table and visualized in a single combined plot.

1.2 Function Definition

# ── Fungsi compute_formula ─────────────────────────────
compute_formula <- function(x, formula) {

  # Validasi input formula
  valid_formulas <- c("linear", "quadratic", "cubic", "exponential")
  if (!(formula %in% valid_formulas)) {
    stop(paste("Invalid formula. Choose from:", paste(valid_formulas, collapse = ", ")))
  }

  # Hitung y berdasarkan formula yang dipilih
  if (formula == "linear")      return(2 * x + 1)
  if (formula == "quadratic")   return(x^2 + 3 * x + 2)
  if (formula == "cubic")       return(x^3 - 2*x^2 + x + 5)
  if (formula == "exponential") return(exp(0.3 * x))
}

1.3 Compute All Formulas Using Nested Loops

x_values <- 1:20
formulas  <- c("linear", "quadratic", "cubic", "exponential")
results   <- list()

# Nested loop: per formula → per nilai x
for (formula in formulas) {
  y_values <- c()
  for (x in x_values) {
    y        <- compute_formula(x, formula)
    y_values <- c(y_values, y)
  }
  results[[formula]] <- y_values
}

# Buat data frame
df_results <- data.frame(
  x           = x_values,
  Linear      = round(results[["linear"]],      2),
  Quadratic   = round(results[["quadratic"]],   2),
  Cubic       = round(results[["cubic"]],       2),
  Exponential = round(results[["exponential"]], 4)
)

# Tampilkan tabel dengan kableExtra
knitr::kable(df_results,
             col.names = c("x", "Linear", "Quadratic", "Cubic", "Exponential"),
             caption   = "Table 1: Computed Formula Values for x = 1 to 20",
             format    = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(df_results), 2), background = "#fcd5d5")

Table 1: Computed Formula Values for x = 1 to 20
x	Linear	Quadratic	Cubic	Exponential
1	3	6	5	1.3499
2	5	12	7	1.8221
3	7	20	17	2.4596
4	9	30	41	3.3201
5	11	42	85	4.4817
6	13	56	155	6.0496
7	15	72	257	8.1662
8	17	90	397	11.0232
9	19	110	581	14.8797
10	21	132	815	20.0855
11	23	156	1105	27.1126
12	25	182	1457	36.5982
13	27	210	1877	49.4024
14	29	240	2371	66.6863
15	31	272	2945	90.0171
16	33	306	3605	121.5104
17	35	342	4357	164.0219
18	37	380	5207	221.4064
19	39	420	6161	298.8674
20	41	462	7225	403.4288

1.4 Visualization

y_max  <- max(unlist(results))
y_min  <- min(unlist(results))
colors <- c("linear"      = "steelblue",
            "quadratic"   = "limegreen",
            "cubic"       = "tomato",
            "exponential" = "#7f1d1d")

plot(x_values, results[["linear"]],
     type = "b", col = colors["linear"],
     lwd = 2.5, pch = 16, cex = 1,
     ylim = c(y_min, y_max),
     xlab = "x values (1 to 20)",
     ylab = "y values",
     main = "Comparison of Four Mathematical Formulas (x = 1 to 20)",
     cex.main = 1.3, cex.lab = 1.1)

for (formula in c("quadratic", "cubic", "exponential")) {
  lines(x_values, results[[formula]],
        type = "b", col = colors[formula],
        lwd = 2.5, pch = 16, cex = 1)
}

grid(lty = "dashed", col = "gray85")
legend("topleft",
       legend = c("Linear", "Quadratic", "Cubic", "Exponential"),
       col    = colors,
       lwd = 2.5, pch = 16, bty = "n", cex = 1)

1.5 Interpretation

The plot clearly shows that each formula exhibits a distinctly different growth pattern as x increases from 1 to 20. The exponential formula grows the fastest, dominating all other formulas and rising steeply toward large values. The cubic formula also shows strong growth but at a notably slower rate than exponential. The quadratic formula grows moderately, while the linear formula appears nearly flat in comparison to the others. This highlights how the mathematical structure of a model directly determines the scale and rate of its output growth.

2 Nested Simulation - Multi-Sales & Discounts

2.1 Introduction

This task simulates daily sales data for multiple salespersons over several days. A nested function apply_discount() calculates conditional discount rates based on sales amount, while a cumulative_sales() function tracks running totals per salesperson. Results include summary statistics and cumulative sales visualization.

2.2 Function Definition

# ── Fungsi apply_discount (nested) ────────────────────
apply_discount <- function(sales_amount) {
  if      (sales_amount >= 900) return(0.20)
  else if (sales_amount >= 700) return(0.15)
  else if (sales_amount >= 500) return(0.10)
  else if (sales_amount >= 300) return(0.05)
  else                          return(0.00)
}

# ── Fungsi cumulative sales (nested) ──────────────────
cumulative_sales <- function(sales_list) {
  total  <- 0
  result <- c()
  for (s in sales_list) {
    total  <- total + s
    result <- c(result, total)
  }
  return(result)
}

# ── Fungsi utama simulate_sales ───────────────────────
simulate_sales <- function(n_salesperson, days) {
  data <- data.frame()

  # Loop per salesperson
  for (sp_id in 1:n_salesperson) {
    daily_sales <- c()

    # Loop per hari
    for (day in 1:days) {
      sales_amount  <- sample(100:1000, 1)
      discount_rate <- apply_discount(sales_amount)
      net_sales     <- sales_amount * (1 - discount_rate)
      daily_sales   <- c(daily_sales, sales_amount)

      data <- rbind(data, data.frame(
        sales_id      = sp_id,
        day           = day,
        sales_amount  = sales_amount,
        discount_rate = discount_rate,
        net_sales     = round(net_sales, 2)
      ))
    }
    cat("Salesperson", sp_id, "- Total Kumulatif:", sum(daily_sales), "\n")
  }
  return(data)
}

2.3 Simulate Sales Data

set.seed(42)
df_sales <- simulate_sales(n_salesperson = 5, days = 10)

## Salesperson 1 - Total Kumulatif: 3587 
## Salesperson 2 - Total Kumulatif: 6300 
## Salesperson 3 - Total Kumulatif: 6411 
## Salesperson 4 - Total Kumulatif: 4224 
## Salesperson 5 - Total Kumulatif: 5466

knitr::kable(head(df_sales, 15),
             col.names = c("Sales ID", "Day", "Sales Amount", "Discount Rate", "Net Sales"),
             caption   = "Table 2: Sales Simulation Data (First 15 Rows)",
             format    = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, 15, 2), background = "#fcd5d5")

Table 2: Sales Simulation Data (First 15 Rows)
Sales ID	Day	Sales Amount	Discount Rate	Net Sales
1	1	660	0.10	594.00
1	2	420	0.05	399.00
1	3	252	0.00	252.00
1	4	173	0.00	173.00
1	5	327	0.05	310.65
1	6	245	0.00	245.00
1	7	733	0.15	623.05
1	8	148	0.00	148.00
1	9	227	0.00	227.00
1	10	402	0.05	381.90
2	1	123	0.00	123.00
2	2	938	0.20	750.40
2	3	455	0.05	432.25
2	4	700	0.15	595.00
2	5	264	0.00	264.00

2.4 Summary Statistics

summary_sales <- aggregate(
  cbind(sales_amount, discount_rate, net_sales) ~ sales_id,
  data = df_sales,
  FUN  = function(x) round(mean(x), 2)
)
names(summary_sales) <- c("Sales ID", "Avg Sales", "Avg Discount", "Avg Net Sales")

knitr::kable(summary_sales,
             caption = "Table 3: Summary Statistics per Salesperson",
             format  = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(summary_sales), 2), background = "#fcd5d5")

Table 3: Summary Statistics per Salesperson
Sales ID	Avg Sales	Avg Discount	Avg Net Sales
1	358.7	0.04	335.36
2	630.0	0.12	537.07
3	641.1	0.12	547.84
4	422.4	0.06	390.29
5	546.6	0.09	486.40

2.5 Visualization

par(mar      = c(5, 5, 4, 2),
    cex.main = 1.6,
    cex.lab  = 1.4,
    cex.axis = 1.2)

colors <- c("steelblue", "limegreen", "tomato", "#7f1d1d", "orange")

# Hitung semua cumulative sales dulu untuk ylim
all_cum <- c()
for (i in 1:5) {
  sp_data <- df_sales[df_sales$sales_id == i, ]
  all_cum <- c(all_cum, cumsum(sp_data$sales_amount))
}

# Plot salesperson pertama
sp1     <- df_sales[df_sales$sales_id == 1, ]
cum_sp1 <- cumsum(sp1$sales_amount)

plot(1:10, cum_sp1,
     type = "b", col = colors[1],
     lwd = 3, pch = 16, cex = 1.3,
     ylim = c(0, max(all_cum) * 1.1),
     xlab = "Day",
     ylab = "Cumulative Sales",
     main = "Cumulative Sales per Salesperson (10 Days)")

# Loop salesperson lainnya
for (i in 2:5) {
  sp_data <- df_sales[df_sales$sales_id == i, ]
  cum_sp  <- cumsum(sp_data$sales_amount)
  lines(1:10, cum_sp,
        type = "b", col = colors[i],
        lwd = 3, pch = 16, cex = 1.3)
}

grid(lty = "dashed", col = "gray85")
legend("topleft",
       legend = paste("Salesperson", 1:5),
       col    = colors,
       lwd = 3, pch = 16, bty = "n", cex = 1.1)

2.6 Interpretation

The simulation shows sales performance across 5 salespersons over 10 days. Cumulative sales grow steadily for all salespersons, reflecting consistent daily transactions. Higher discount rates are applied automatically when sales amounts exceed certain thresholds, which reduces net sales but incentivizes higher volume transactions. Differences in cumulative totals across salespersons reflect the natural variation in randomly generated daily sales.

3 Multi-Level Performance Categorization

3.1 Introduction

This task builds a function categorize_performance() that classifies sales amounts into 5 performance levels — Excellent, Very Good, Good, Average, and Poor. A loop iterates through each value, calculates the percentage per category, and results are visualized using a bar chart and pie chart.

3.2 Function Definition

# ── Fungsi kategorisasi performa ──────────────────────
categorize_performance <- function(sales_amount) {
  categories <- c()

  # Loop per nilai sales
  for (sales in sales_amount) {
    if      (sales >= 900) categories <- c(categories, "Excellent")
    else if (sales >= 700) categories <- c(categories, "Very Good")
    else if (sales >= 500) categories <- c(categories, "Good")
    else if (sales >= 300) categories <- c(categories, "Average")
    else                   categories <- c(categories, "Poor")
  }
  return(categories)
}

3.3 Categorize Performance Data

set.seed(42)
sales_data   <- sample(100:1000, 200, replace = TRUE)
performance  <- categorize_performance(sales_data)

df_perf <- data.frame(
  sales_amount = sales_data,
  performance  = performance
)

# Hitung jumlah dan persentase per kategori
category_order <- c("Excellent", "Very Good", "Good", "Average", "Poor")
counts         <- sapply(category_order, function(cat) sum(df_perf$performance == cat))
percentages    <- round(counts / length(performance) * 100, 2)

df_summary <- data.frame(
  Category   = category_order,
  Count      = counts,
  Percentage = paste0(percentages, "%")
)

knitr::kable(df_summary,
             caption  = "Table 4: Performance Category Distribution",
             format   = "html",
             row.names = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(df_summary), 2), background = "#fcd5d5")

Table 4: Performance Category Distribution
Category	Count	Percentage
Excellent	21	10.5%
Very Good	49	24.5%
Good	38	19%
Average	49	24.5%
Poor	43	21.5%

3.4 Visualization

par(mfrow    = c(1, 2),
    mar      = c(5, 5, 4, 2),
    cex.main = 1.4,
    cex.lab  = 1.2,
    cex.axis = 1.1)

colors_cat <- c("Excellent" = "#7f1d1d",
                "Very Good" = "#b91c1c",
                "Good"      = "#ef4444",
                "Average"   = "#fca5a5",
                "Poor"      = "#fee2e2")

# ── Bar Chart ──────────────────────────────────────────
barplot(counts,
        names.arg = category_order,
        col       = colors_cat,
        border    = "white",
        main      = "Performance Category Distribution\n(Bar Chart)",
        xlab      = "Category",
        ylab      = "Count",
        ylim      = c(0, max(counts) * 1.2))

# Tambahkan label di atas bar
text(x      = seq(0.7, by = 1.2, length.out = 5),
     y      = counts + 1.5,
     labels = paste0(counts, "\n(", percentages, "%)"),
     cex    = 1, font = 2)

grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")

# ── Pie Chart ──────────────────────────────────────────
pie(counts,
    labels  = paste0(category_order, "\n", percentages, "%"),
    col     = colors_cat,
    main    = "Performance Category Distribution\n(Pie Chart)",
    cex     = 1.1)

par(mfrow = c(1, 1))

3.5 Interpretation

The distribution of performance categories reveals the overall sales quality across 200 data points. The Good and Average categories tend to dominate, indicating that most sales fall within the mid-range. Excellent performers represent the top tier with sales above 900, while Poor performers fall below 300. This categorization helps identify which segments need improvement and which are performing well, enabling more targeted sales strategies and performance evaluations.

4 Multi-Company Dataset Simulation

4.1 Introduction

This task simulates a multi-company employee dataset using nested loops. The function generate_company_data() generates employee records including salary, department, performance score, and KPI score for each company. Employees with KPI > 90 are flagged as top performers. Results are summarized per company and visualized through multiple plots.

4.2 Function Definition

# ── Fungsi generate_company_data ──────────────────────
generate_company_data <- function(n_company, n_employees) {
  departments <- c("HR", "Finance", "Engineering", "Marketing", "Operations")
  data        <- data.frame()

  # Nested loop: per perusahaan → per karyawan
  for (company_id in 1:n_company) {
    for (emp_num in 1:n_employees) {
      salary            <- sample(4000:20000, 1)
      department        <- sample(departments, 1)
      performance_score <- round(runif(1, 50, 100), 2)
      KPI_score         <- round(runif(1, 50, 100), 2)
      is_top_performer  <- KPI_score > 90

      data <- rbind(data, data.frame(
        company_id        = paste0("Company ", company_id),
        employee_id       = paste0("C", company_id, "_E", sprintf("%03d", emp_num)),
        salary            = salary,
        department        = department,
        performance_score = performance_score,
        KPI_score         = KPI_score,
        is_top_performer  = is_top_performer
      ))
    }
  }
  return(data)
}

4.3 Generate & Display Data

set.seed(42)
df_company <- generate_company_data(n_company = 5, n_employees = 30)

knitr::kable(head(df_company[, -7], 10),
             col.names = c("Company ID", "Employee ID", "Salary",
                           "Department", "Performance Score", "KPI Score"),
             caption   = "Table 5: Company Dataset (First 10 Rows)",
             format    = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, 10, 2), background = "#fcd5d5")

Table 5: Company Dataset (First 10 Rows)
Company ID	Employee ID	Salary	Department	Performance Score	KPI Score
Company 1	C1_E001	14800	Operations	64.31	91.52
Company 1	C1_E002	13289	Marketing	86.83	56.73
Company 1	C1_E003	14288	Marketing	73.11	97.00
Company 1	C1_E004	18957	Marketing	73.75	78.02
Company 1	C1_E005	14094	Engineering	99.44	97.33
Company 1	C1_E006	9402	Marketing	69.51	95.29
Company 1	C1_E007	16908	Operations	86.88	90.55
Company 1	C1_E008	13051	Engineering	91.65	50.37
Company 1	C1_E009	17609	Engineering	71.79	51.87
Company 1	C1_E010	18649	Marketing	94.39	82.00

4.4 Summary per Company

summary_company <- do.call(rbind, lapply(unique(df_company$company_id), function(comp) {
  subset_df <- df_company[df_company$company_id == comp, ]
  data.frame(
    Company        = comp,
    Total_Employees= nrow(subset_df),
    Avg_Salary     = round(mean(subset_df$salary), 2),
    Avg_Performance= round(mean(subset_df$performance_score), 2),
    Max_KPI        = round(max(subset_df$KPI_score), 2),
    Top_Performers = sum(subset_df$is_top_performer)
  )
}))

knitr::kable(summary_company,
             col.names = c("Company", "Total Employees", "Avg Salary",
                           "Avg Performance", "Max KPI", "Top Performers"),
             caption   = "Table 6: Summary Statistics per Company",
             format    = "html",
             row.names = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(summary_company), 2), background = "#fcd5d5")

Table 6: Summary Statistics per Company
Company	Total Employees	Avg Salary	Avg Performance	Max KPI	Top Performers
Company 1	30	13077.33	77.37	99.14	12
Company 2	30	12434.40	75.90	99.83	3
Company 3	30	13440.43	71.97	99.54	7
Company 4	30	11192.10	74.10	95.14	7
Company 5	30	11353.83	73.26	97.75	8

4.5 Visualization

par(mfrow    = c(2, 2),
    mar      = c(5, 5, 4, 2),
    cex.main = 1.3,
    cex.lab  = 1.2,
    cex.axis = 1.1)

colors   <- c("#7f1d1d", "#b91c1c", "#ef4444", "#fca5a5", "#fee2e2")
companies <- unique(df_company$company_id)

# ── Plot 1: Avg Salary ─────────────────────────────────
barplot(summary_company$Avg_Salary,
        names.arg = paste0("C", 1:5),
        col       = colors,
        border    = "white",
        main      = "Average Salary per Company",
        xlab      = "Company",
        ylab      = "Average Salary",
        ylim      = c(0, max(summary_company$Avg_Salary) * 1.2))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")

# ── Plot 2: Avg Performance ────────────────────────────
barplot(summary_company$Avg_Performance,
        names.arg = paste0("C", 1:5),
        col       = colors,
        border    = "white",
        main      = "Average Performance Score per Company",
        xlab      = "Company",
        ylab      = "Avg Performance Score",
        ylim      = c(0, 110))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")

# ── Plot 3: Top Performers ─────────────────────────────
barplot(summary_company$Top_Performers,
        names.arg = paste0("C", 1:5),
        col       = colors,
        border    = "white",
        main      = "Top Performers per Company (KPI > 90)",
        xlab      = "Company",
        ylab      = "Number of Top Performers",
        ylim      = c(0, max(summary_company$Top_Performers) * 1.3))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")

# ── Plot 4: Scatter KPI vs Performance ────────────────
plot(df_company$performance_score, df_company$KPI_score,
     col  = colors[as.numeric(as.factor(df_company$company_id))],
     pch  = 16, cex = 1.2,
     xlab = "Performance Score",
     ylab = "KPI Score",
     main = "Performance Score vs KPI Score")
abline(h = 90, col = "black", lty = 2, lwd = 2)
legend("topleft",
       legend = paste0("Company ", 1:5),
       col    = colors,
       pch    = 16, bty = "n", cex = 0.9)
grid(lty = "dashed", col = "gray85")

par(mfrow = c(1, 1))

4.6 Interpretation

The multi-company simulation reveals variation in salary, performance, and KPI scores across 5 companies. Average salaries differ between companies due to the random generation of employee data within the same range. The scatter plot shows no strong linear relationship between performance score and KPI score, suggesting these two metrics capture different aspects of employee contribution. Companies with more top performers (KPI > 90) tend to have stronger overall KPI averages, highlighting the impact of high-performing individuals on company-level outcomes.

5 Monte Carlo Simulation - Pi & Probability

5.1 Introduction

This task estimates the value of π (Pi) using the Monte Carlo method by randomly throwing points into a unit square and checking how many fall inside a unit circle. An additional probability analysis computes the chance of a point landing in a defined sub-square. Results are visualized showing points inside vs outside the circle, and a convergence plot of π estimates.

5.2 Function Definition

# ── Fungsi monte_carlo_pi ─────────────────────────────
monte_carlo_pi <- function(n_points) {

  # Generate titik acak (x, y) antara -1 dan 1
  x <- runif(n_points, -1, 1)
  y <- runif(n_points, -1, 1)

  # Hitung jarak dari pusat (0,0)
  distances     <- sqrt(x^2 + y^2)
  inside_circle <- distances <= 1

  # Estimasi Pi
  pi_estimate <- 4 * sum(inside_circle) / n_points

  # Probabilitas titik jatuh di sub-kotak (0 s.d. 0.5)
  in_subsquare   <- sum(x >= 0 & x <= 0.5 & y >= 0 & y <= 0.5)
  prob_subsquare <- in_subsquare / n_points

  return(list(
    pi_estimate    = pi_estimate,
    x_inside       = x[inside_circle],
    y_inside       = y[inside_circle],
    x_outside      = x[!inside_circle],
    y_outside      = y[!inside_circle],
    prob_subsquare = prob_subsquare
  ))
}

5.3 Monte Carlo Results

set.seed(42)
n_list <- c(100, 1000, 10000, 100000)

# Loop untuk berbagai jumlah titik
results_mc <- do.call(rbind, lapply(n_list, function(n) {
  res <- monte_carlo_pi(n)
  data.frame(
    N_Points       = format(n, big.mark = ","),
    Pi_Estimate    = round(res$pi_estimate, 6),
    Error          = round(abs(res$pi_estimate - pi), 6),
    Prob_SubSquare = round(res$prob_subsquare, 4)
  )
}))

knitr::kable(results_mc,
             col.names = c("N Points", "Pi Estimate", "Error", "Prob Sub-Square"),
             caption   = "Table 7: Monte Carlo Pi Estimation Results",
             format    = "html",
             row.names = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(results_mc), 2), background = "#fcd5d5")

Table 7: Monte Carlo Pi Estimation Results
N Points	Pi Estimate	Error	Prob Sub-Square
100	3.1600	0.018407	0.0700
1,000	3.1400	0.001593	0.0520
10,000	3.1396	0.001993	0.0639
1e+05	3.1390	0.002593	0.0606

5.4 Visualization

par(mfrow    = c(1, 2),
    mar      = c(5, 5, 4, 2),
    cex.main = 1.3,
    cex.lab  = 1.2,
    cex.axis = 1.1)

# ── Plot 1: Titik dalam vs luar lingkaran (n=1000) ────
set.seed(42)
res1000 <- monte_carlo_pi(1000)

plot(res1000$x_outside, res1000$y_outside,
     col = "#fca5a5", pch = 16, cex = 0.8,
     xlim = c(-1.1, 1.1), ylim = c(-1.1, 1.1),
     xlab = "x", ylab = "y",
     main = paste0("Monte Carlo Simulation (n=1,000)\nEstimated π = ",
                   round(res1000$pi_estimate, 5)),
     asp  = 1)
points(res1000$x_inside, res1000$y_inside,
       col = "#7f1d1d", pch = 16, cex = 0.8)

# Gambar lingkaran
theta <- seq(0, 2*pi, length.out = 300)
lines(cos(theta), sin(theta), col = "black", lwd = 2)

# Gambar sub-kotak
rect(0, 0, 0.5, 0.5, border = "darkgreen", lwd = 2, lty = 2)

legend("topleft",
       legend = c("Inside Circle", "Outside Circle", "Sub-Square"),
       col    = c("#7f1d1d", "#fca5a5", "darkgreen"),
       pch    = c(16, 16, NA), lty = c(NA, NA, 2),
       bty    = "n", cex = 0.95)
grid(lty = "dashed", col = "gray85")

# ── Plot 2: Konvergensi estimasi Pi ───────────────────
set.seed(42)
n_iter       <- 500
pi_conv      <- c()
inside_count <- 0

for (i in 1:n_iter) {
  x_r <- runif(1, -1, 1)
  y_r <- runif(1, -1, 1)
  if (x_r^2 + y_r^2 <= 1) inside_count <- inside_count + 1
  pi_conv <- c(pi_conv, 4 * inside_count / i)
}

plot(1:n_iter, pi_conv,
     type = "l", col = "#7f1d1d", lwd = 2,
     xlab = "Number of Iterations",
     ylab = "Estimated π",
     main = "Convergence of π Estimation\n(Monte Carlo)")
abline(h = pi, col = "black", lty = 2, lwd = 2)
legend("topright",
       legend = c("Estimated π", paste0("True π = ", round(pi, 5))),
       col    = c("#7f1d1d", "black"),
       lty    = c(1, 2), lwd = 2, bty = "n", cex = 0.95)
grid(lty = "dashed", col = "gray85")

par(mfrow = c(1, 1))

5.5 Interpretation

The Monte Carlo simulation demonstrates that as the number of random points increases, the estimated value of π converges closer to its true value (3.14159). With only 100 points the estimate is quite inaccurate, but with 100,000 points the error becomes very small. The convergence plot confirms this trend — the estimated π fluctuates widely at first, then stabilizes around the true value as iterations increase. The probability of a point landing in the sub-square (0 to 0.5) is approximately 0.0625, consistent with the theoretical value of (0.5 × 0.5) / (2 × 2) = 0.0625.

6 Advanced Data Transformation & Feature Engineering

6.1 Introduction

This task applies two transformation techniques — Min-Max Normalization and Z-Score Standardization — to numerical columns using loop-based functions. New features are also engineered from existing data: performance_category, salary_bracket, and KPI_tier. Distributions before and after transformation are compared using histograms and boxplots.

6.2 Function Definition

# ── Fungsi normalize_columns (Min-Max) ────────────────
normalize_columns <- function(df, columns) {
  df_norm <- df
  for (col in columns) {
    min_val <- min(df_norm[[col]])
    max_val <- max(df_norm[[col]])
    df_norm[[paste0(col, "_normalized")]] <- (df_norm[[col]] - min_val) / (max_val - min_val)
  }
  return(df_norm)
}

# ── Fungsi z_score ────────────────────────────────────
z_score <- function(df, columns) {
  df_z <- df
  for (col in columns) {
    mean_val <- mean(df_z[[col]])
    std_val  <- sd(df_z[[col]])
    df_z[[paste0(col, "_zscore")]] <- (df_z[[col]] - mean_val) / std_val
  }
  return(df_z)
}

# ── Fungsi feature engineering ────────────────────────
create_features <- function(df) {
  perf_cat    <- c()
  sal_bracket <- c()
  kpi_tier    <- c()

  for (i in 1:nrow(df)) {
    # Performance category
    if      (df$performance_score[i] >= 90) perf_cat <- c(perf_cat, "Excellent")
    else if (df$performance_score[i] >= 75) perf_cat <- c(perf_cat, "Very Good")
    else if (df$performance_score[i] >= 60) perf_cat <- c(perf_cat, "Good")
    else                                    perf_cat <- c(perf_cat, "Average")

    # Salary bracket
    if      (df$salary[i] >= 16000) sal_bracket <- c(sal_bracket, "Very High")
    else if (df$salary[i] >= 11000) sal_bracket <- c(sal_bracket, "High")
    else if (df$salary[i] >= 7000)  sal_bracket <- c(sal_bracket, "Medium")
    else                            sal_bracket <- c(sal_bracket, "Low")

    # KPI tier
    if      (df$KPI_score[i] >= 90) kpi_tier <- c(kpi_tier, "Platinum")
    else if (df$KPI_score[i] >= 75) kpi_tier <- c(kpi_tier, "Gold")
    else if (df$KPI_score[i] >= 60) kpi_tier <- c(kpi_tier, "Silver")
    else                            kpi_tier <- c(kpi_tier, "Bronze")
  }

  df$performance_category <- perf_cat
  df$salary_bracket       <- sal_bracket
  df$KPI_tier             <- kpi_tier
  return(df)
}

6.3 Generate Data & Apply Transformation

set.seed(42)
n           <- 200
departments <- c("HR", "Finance", "Engineering", "Marketing", "Operations")

df_raw <- data.frame(
  employee_id       = paste0("E", sprintf("%03d", 1:n)),
  salary            = sample(4000:20000, n, replace = TRUE),
  department        = sample(departments, n, replace = TRUE),
  performance_score = round(runif(n, 50, 100), 2),
  KPI_score         = round(runif(n, 50, 100), 2)
)

num_cols    <- c("salary", "performance_score", "KPI_score")
df_norm     <- normalize_columns(df_raw, num_cols)
df_z        <- z_score(df_raw, num_cols)
df_featured <- create_features(df_raw)

# Tampilkan tabel normalisasi
knitr::kable(head(df_norm[, c("employee_id", "salary", "salary_normalized",
                               "performance_score", "performance_score_normalized")], 8),
             col.names = c("Employee ID", "Salary", "Salary Normalized",
                           "Performance", "Performance Normalized"),
             caption   = "Table 8: Min-Max Normalization Results (First 8 Rows)",
             format    = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, 8, 2), background = "#fcd5d5")

Table 8: Min-Max Normalization Results (First 8 Rows)
Employee ID	Salary	Salary Normalized	Performance	Performance Normalized
E001	14800	0.6751596	87.96	0.7621535
E002	16260	0.7665582	65.26	0.3061470
E003	6368	0.1473019	58.28	0.1659301
E004	9272	0.3290973	51.64	0.0325432
E005	13289	0.5805684	56.83	0.1368019
E006	5251	0.0773757	58.86	0.1775814
E007	19505	0.9697008	75.98	0.5214946
E008	12825	0.5515212	90.56	0.8143833

6.4 Feature Engineering Results

# Ringkasan fitur baru
feat_summary <- data.frame(
  Feature    = c("performance_category", "salary_bracket", "KPI_tier"),
  Categories = c(paste(unique(df_featured$performance_category), collapse = ", "),
                 paste(unique(df_featured$salary_bracket),       collapse = ", "),
                 paste(unique(df_featured$KPI_tier),             collapse = ", "))
)

knitr::kable(feat_summary,
             col.names = c("New Feature", "Categories"),
             caption   = "Table 9: New Engineered Features",
             format    = "html",
             row.names = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(feat_summary), 2), background = "#fcd5d5")

Table 9: New Engineered Features
New Feature	Categories
performance_category	Very Good, Good, Average, Excellent
salary_bracket	High, Very High, Low, Medium
KPI_tier	Gold, Platinum, Silver, Bronze

6.5 Visualization

par(mfrow    = c(2, 3),
    mar      = c(5, 5, 4, 2),
    cex.main = 1.2,
    cex.lab  = 1.1,
    cex.axis = 1.0)

# ── Histogram: Sebelum vs Sesudah Normalisasi ──────────
cols_before <- c("salary", "performance_score", "KPI_score")
cols_after  <- c("salary_normalized", "performance_score_normalized", "KPI_score_normalized")
titles_b    <- c("Salary (Before)", "Performance Score (Before)", "KPI Score (Before)")
titles_a    <- c("Salary Normalized (After)", "Performance Normalized (After)", "KPI Normalized (After)")

for (i in 1:3) {
  hist(df_raw[[cols_before[i]]],
       col    = "#fca5a5",
       border = "white",
       main   = titles_b[i],
       xlab   = "Value",
       ylab   = "Frequency")
  grid(lty = "dashed", col = "gray85")
}

for (i in 1:3) {
  hist(df_norm[[cols_after[i]]],
       col    = "#7f1d1d",
       border = "white",
       main   = titles_a[i],
       xlab   = "Normalized Value (0-1)",
       ylab   = "Frequency")
  grid(lty = "dashed", col = "gray85")
}

par(mfrow = c(1, 1))

par(mfrow    = c(1, 2),
    mar      = c(6, 5, 4, 2),
    cex.main = 1.2,
    cex.lab  = 1.1,
    cex.axis = 1.0)

# ── Boxplot: Data Asli vs Z-Score ─────────────────────
boxplot(df_raw[, num_cols],
        col    = "#fca5a5",
        border = "#7f1d1d",
        main   = "Original Data (Boxplot)",
        ylab   = "Value",
        las    = 2)
grid(lty = "dashed", col = "gray85")

z_cols <- paste0(num_cols, "_zscore")
boxplot(df_z[, z_cols],
        col    = "#7f1d1d",
        border = "#b91c1c",
        main   = "After Z-Score Standardization (Boxplot)",
        names  = c("Salary\nZ-Score", "Performance\nZ-Score", "KPI\nZ-Score"),
        ylab   = "Z-Score Value",
        las    = 2)
grid(lty = "dashed", col = "gray85")

par(mfrow = c(1, 1))

6.6 Interpretation

Min-Max normalization rescales all numerical columns to a range of 0 to 1, making them directly comparable regardless of their original scale. Z-Score standardization transforms the data so that each column has a mean of 0 and standard deviation of 1, which is useful for algorithms sensitive to data scale. The histograms confirm that the shape of the distribution is preserved after both transformations — only the scale changes, not the underlying pattern. The newly engineered features — performance_category, salary_bracket, and KPI_tier — provide meaningful categorical labels that simplify further analysis and reporting.

7 Mini Project - Company KPI Dashboard & Simulation

7.1 Introduction

This mini project generates a complete employee dataset for 7 companies with 50–200 employees each. The dataset includes employee ID, company ID, salary, department, performance score, and KPI score. Employees are categorized into KPI tiers using a loop-based function. Results are summarized per company and visualized through multiple advanced plots including grouped bar charts, scatter plots with regression lines, and salary distribution boxplots.

7.2 Function Definition

# ── Fungsi generate dataset ───────────────────────────
generate_dashboard_data <- function(n_company, min_emp, max_emp) {
  departments <- c("HR", "Finance", "Engineering", "Marketing", "Operations")
  data        <- data.frame()

  # Nested loop: per perusahaan → per karyawan
  for (company_id in 1:n_company) {
    n_emp <- sample(min_emp:max_emp, 1)
    for (emp_num in 1:n_emp) {
      salary            <- sample(4000:20000, 1)
      department        <- sample(departments, 1)
      performance_score <- round(runif(1, 50, 100), 2)
      KPI_score         <- round(runif(1, 50, 100), 2)

      data <- rbind(data, data.frame(
        employee_id       = paste0("C", company_id, "_E", sprintf("%03d", emp_num)),
        company_id        = paste0("Company ", company_id),
        salary            = salary,
        department        = department,
        performance_score = performance_score,
        KPI_score         = KPI_score
      ))
    }
  }
  return(data)
}

# ── Fungsi kategorisasi KPI tier ──────────────────────
categorize_kpi <- function(df) {
  tiers <- c()
  for (kpi in df$KPI_score) {
    if      (kpi >= 90) tiers <- c(tiers, "Platinum")
    else if (kpi >= 75) tiers <- c(tiers, "Gold")
    else if (kpi >= 60) tiers <- c(tiers, "Silver")
    else                tiers <- c(tiers, "Bronze")
  }
  return(tiers)
}

# ── Fungsi summary per perusahaan ─────────────────────
summarize_companies <- function(df) {
  companies <- unique(df$company_id)
  summary   <- data.frame()

  for (comp in companies) {
    subset_df <- df[df$company_id == comp, ]
    summary   <- rbind(summary, data.frame(
      Company         = comp,
      Total_Employees = nrow(subset_df),
      Avg_Salary      = round(mean(subset_df$salary), 2),
      Avg_KPI         = round(mean(subset_df$KPI_score), 2),
      Avg_Performance = round(mean(subset_df$performance_score), 2),
      Top_Performers  = sum(subset_df$KPI_score > 90)
    ))
  }
  return(summary)
}

7.3 Generate Data

set.seed(42)
df <- generate_dashboard_data(n_company = 7, min_emp = 50, max_emp = 200)
df$KPI_tier <- categorize_kpi(df)

cat("Total Karyawan:", nrow(df), "\n")

## Total Karyawan: 841

cat("Total Perusahaan:", length(unique(df$company_id)), "\n")

## Total Perusahaan: 7

knitr::kable(head(df, 10),
             col.names = c("Employee ID", "Company ID", "Salary",
                           "Department", "Performance Score", "KPI Score", "KPI Tier"),
             caption   = "Table 10: Company Dataset Preview (First 10 Rows)",
             format    = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, 10, 2), background = "#fcd5d5")

Table 10: Company Dataset Preview (First 10 Rows)
Employee ID	Company ID	Salary	Department	Performance Score	KPI Score	KPI Tier
C1_E001	Company 1	16260	HR	91.52	82.09	Gold
C1_E002	Company 1	5251	Finance	56.73	82.85	Gold
C1_E003	Company 1	17439	Marketing	73.11	97.00	Platinum
C1_E004	Company 1	18957	Marketing	73.75	78.02	Gold
C1_E005	Company 1	14094	Engineering	99.44	97.33	Platinum
C1_E006	Company 1	9402	Marketing	69.51	95.29	Platinum
C1_E007	Company 1	16908	Operations	86.88	90.55	Platinum
C1_E008	Company 1	13051	Engineering	91.65	50.37	Bronze
C1_E009	Company 1	17609	Engineering	71.79	51.87	Bronze
C1_E010	Company 1	18649	Marketing	94.39	82.00	Gold

7.4 Summary per Company

summary_df <- summarize_companies(df)

knitr::kable(summary_df,
             col.names = c("Company", "Total Employees", "Avg Salary",
                           "Avg KPI", "Avg Performance", "Top Performers"),
             caption   = "Table 11: Summary Statistics per Company",
             format    = "html",
             row.names = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(summary_df), 2), background = "#fcd5d5")

Table 11: Summary Statistics per Company
Company	Total Employees	Avg Salary	Avg KPI	Avg Performance	Top Performers
Company 1	98	12785.39	76.29	74.94	22
Company 2	95	11197.06	76.14	74.10	25
Company 3	76	11693.68	74.08	74.52	14
Company 4	129	11852.91	73.62	74.82	17
Company 5	168	11934.82	75.65	75.51	41
Company 6	142	12184.23	75.98	73.79	30
Company 7	133	11858.23	75.76	73.67	34

7.5 Top Performers per Company

top_perf <- df[df$KPI_score > 90, ]

top_table <- do.call(rbind, lapply(unique(df$company_id), function(comp) {
  subset_top <- top_perf[top_perf$company_id == comp, ]
  if (nrow(subset_top) > 0) {
    head(subset_top[order(-subset_top$KPI_score),
                    c("employee_id", "company_id", "department",
                      "salary", "performance_score", "KPI_score", "KPI_tier")], 3)
  }
}))

knitr::kable(top_table,
             col.names = c("Employee ID", "Company", "Department",
                           "Salary", "Performance", "KPI Score", "KPI Tier"),
             caption   = "Table 12: Top 3 Performers per Company (KPI > 90)",
             format    = "html",
             row.names = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(top_table), 2), background = "#fcd5d5")

Table 12: Top 3 Performers per Company (KPI > 90)
Employee ID	Company	Department	Salary	Performance	KPI Score	KPI Tier
C1_E050	Company 1	Finance	10626	50.88	99.83	Platinum
C1_E068	Company 1	Finance	7070	95.05	99.54	Platinum
C1_E013	Company 1	HR	15618	83.78	99.14	Platinum
C2_E066	Company 2	HR	7489	97.23	99.61	Platinum
C2_E043	Company 2	Finance	10826	60.23	97.75	Platinum
C2_E089	Company 2	HR	6419	71.73	96.81	Platinum
C3_E042	Company 3	HR	12940	70.48	98.50	Platinum
C3_E040	Company 3	Finance	8226	73.71	97.46	Platinum
C3_E049	Company 3	Finance	5535	67.89	96.61	Platinum
C4_E091	Company 4	Engineering	7261	59.55	99.97	Platinum
C4_E072	Company 4	Engineering	16350	55.75	99.89	Platinum
C4_E123	Company 4	Finance	6518	55.15	99.27	Platinum
C5_E028	Company 5	Operations	18944	59.55	99.99	Platinum
C5_E060	Company 5	Marketing	17872	84.50	99.77	Platinum
C5_E032	Company 5	Marketing	10836	86.00	99.17	Platinum
C6_E009	Company 6	Engineering	10282	93.18	99.96	Platinum
C6_E139	Company 6	HR	9008	90.55	99.51	Platinum
C6_E078	Company 6	Operations	13529	58.37	99.24	Platinum
C7_E074	Company 7	Engineering	7871	84.06	98.52	Platinum
C7_E077	Company 7	Marketing	18225	53.03	98.52	Platinum
C7_E095	Company 7	HR	7629	97.21	98.41	Platinum

7.6 Visualization

colors    <- c("#7f1d1d","#b91c1c","#ef4444","#f97316","#fca5a5","#fcd5d5","#fee2e2")
companies <- unique(df$company_id)

par(mfrow    = c(2, 2),
    mar      = c(6, 5, 4, 2),
    cex.main = 1.2,
    cex.lab  = 1.1,
    cex.axis = 0.95)

# ── Plot 1: Avg Salary per Company ────────────────────
barplot(summary_df$Avg_Salary,
        names.arg = paste0("C", 1:7),
        col       = colors,
        border    = "white",
        main      = "Average Salary per Company",
        xlab      = "Company",
        ylab      = "Average Salary",
        ylim      = c(0, max(summary_df$Avg_Salary) * 1.2))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")

# ── Plot 2: Avg KPI per Company ───────────────────────
barplot(summary_df$Avg_KPI,
        names.arg = paste0("C", 1:7),
        col       = colors,
        border    = "white",
        main      = "Average KPI Score per Company",
        xlab      = "Company",
        ylab      = "Average KPI Score",
        ylim      = c(0, max(summary_df$Avg_KPI) * 1.2))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")

# ── Plot 3: Top Performers per Company ───────────────
barplot(summary_df$Top_Performers,
        names.arg = paste0("C", 1:7),
        col       = colors,
        border    = "white",
        main      = "Top Performers per Company (KPI > 90)",
        xlab      = "Company",
        ylab      = "Number of Top Performers",
        ylim      = c(0, max(summary_df$Top_Performers) * 1.3))
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")

# ── Plot 4: Scatter Performance vs KPI + Regression ──
plot(df$performance_score, df$KPI_score,
     col  = colors[as.numeric(as.factor(df$company_id))],
     pch  = 16, cex = 0.9,
     xlab = "Performance Score",
     ylab = "KPI Score",
     main = "Performance vs KPI Score\n(with Regression Line)")

# Regression line
fit <- lm(KPI_score ~ performance_score, data = df)
abline(fit, col = "black", lwd = 2.5, lty = 2)
legend("topleft",
       legend = c(paste0("C", 1:7), "Regression"),
       col    = c(colors, "black"),
       pch    = c(rep(16, 7), NA),
       lty    = c(rep(NA, 7), 2),
       lwd    = c(rep(NA, 7), 2),
       bty    = "n", cex = 0.8)
grid(lty = "dashed", col = "gray85")

par(mfrow = c(1, 1))

par(mfrow    = c(1, 2),
    mar      = c(6, 5, 4, 2),
    cex.main = 1.2,
    cex.lab  = 1.1,
    cex.axis = 0.95)

# ── Plot 5: Salary Distribution Boxplot ───────────────
salary_list <- lapply(companies, function(comp) {
  df[df$company_id == comp, "salary"]
})

boxplot(salary_list,
        names  = paste0("C", 1:7),
        col    = colors,
        border = "#7f1d1d",
        main   = "Salary Distribution per Company",
        xlab   = "Company",
        ylab   = "Salary",
        las    = 1)
grid(lty = "dashed", col = "gray85")

# ── Plot 6: KPI Tier Distribution ─────────────────────
tier_order  <- c("Platinum", "Gold", "Silver", "Bronze")
tier_colors <- c("#7f1d1d", "#b91c1c", "#ef4444", "#fca5a5")
tier_counts <- sapply(tier_order, function(t) sum(df$KPI_tier == t))

pie(tier_counts,
    labels  = paste0(tier_order, "\n", round(tier_counts/sum(tier_counts)*100, 1), "%"),
    col     = tier_colors,
    main    = "KPI Tier Distribution\n(All Companies)",
    cex     = 1.0)

par(mfrow = c(1, 1))

par(mar      = c(7, 5, 4, 2),
    cex.main = 1.2,
    cex.lab  = 1.1,
    cex.axis = 0.85)

# ── Plot 7: Grouped Bar Chart - Avg KPI per Dept ──────
dept_list   <- c("HR", "Finance", "Engineering", "Marketing", "Operations")
n_dept      <- length(dept_list)
n_comp      <- length(companies)
dept_matrix <- matrix(0, nrow = n_dept, ncol = n_comp)

for (i in 1:n_dept) {
  for (j in 1:n_comp) {
    subset_dj <- df[df$department == dept_list[i] & df$company_id == companies[j], ]
    dept_matrix[i, j] <- if (nrow(subset_dj) > 0) round(mean(subset_dj$KPI_score), 2) else 0
  }
}

barplot(dept_matrix,
        beside    = TRUE,
        names.arg = paste0("C", 1:7),
        col       = c("#7f1d1d","#b91c1c","#ef4444","#f97316","#fca5a5"),
        border    = "white",
        main      = "Average KPI Score per Department per Company\n(Grouped Bar Chart)",
        xlab      = "Company",
        ylab      = "Average KPI Score",
        ylim      = c(0, 110),
        las       = 1)

legend("topright",
       legend = dept_list,
       fill   = c("#7f1d1d","#b91c1c","#ef4444","#f97316","#fca5a5"),
       bty    = "n", cex = 0.9)
grid(nx = NA, ny = NULL, lty = "dashed", col = "gray85")

7.7 Interpretation

The KPI dashboard reveals meaningful differences in employee performance across 7 companies. Average salaries are relatively consistent across companies, reflecting a uniform salary range in the simulation. The scatter plot with regression line shows no strong linear relationship between performance score and KPI score, suggesting these metrics measure different dimensions of employee contribution. The grouped bar chart highlights that KPI scores vary across departments within each company, with Engineering and Finance typically showing stronger KPI performance. The salary boxplot confirms a wide spread of salaries within each company, indicating diverse employee compensation levels. The KPI tier pie chart shows that the majority of employees fall in the Silver and Gold tiers, with a smaller proportion reaching Platinum status.

8 Automated Report Generation per Company

8.1 Introduction

This bonus task builds an automated report generation system using functions and loops. For each company in the dataset, a structured summary is generated automatically — including key statistics, top performers, department breakdown, and visualizations. The system loops through all companies and compiles results into a unified HTML-ready output, simulating a real-world automated reporting pipeline.

8.2 Function Definition

# ── Fungsi generate_single_report ─────────────────────
generate_single_report <- function(df, company_name) {
  subset_df <- df[df$company_id == company_name, ]

  # Basic stats
  total_emp  <- nrow(subset_df)
  avg_salary <- round(mean(subset_df$salary), 2)
  avg_kpi    <- round(mean(subset_df$KPI_score), 2)
  avg_perf   <- round(mean(subset_df$performance_score), 2)
  top_count  <- sum(subset_df$KPI_score > 90)
  max_kpi    <- round(max(subset_df$KPI_score), 2)
  min_salary <- min(subset_df$salary)
  max_salary <- max(subset_df$salary)

  # KPI tier distribution
  tier_order  <- c("Platinum", "Gold", "Silver", "Bronze")
  tier_counts <- sapply(tier_order, function(t) sum(subset_df$KPI_tier == t))
  tier_pct    <- round(tier_counts / total_emp * 100, 1)

  # Department breakdown
  dept_summary <- do.call(rbind, lapply(unique(subset_df$department), function(dept) {
    dept_df <- subset_df[subset_df$department == dept, ]
    data.frame(
      Department     = dept,
      Count          = nrow(dept_df),
      Avg_Salary     = round(mean(dept_df$salary), 2),
      Avg_KPI        = round(mean(dept_df$KPI_score), 2),
      Top_Performers = sum(dept_df$KPI_score > 90)
    )
  }))
  dept_summary <- dept_summary[order(-dept_summary$Avg_KPI), ]

  return(list(
    company      = company_name,
    total_emp    = total_emp,
    avg_salary   = avg_salary,
    avg_kpi      = avg_kpi,
    avg_perf     = avg_perf,
    top_count    = top_count,
    max_kpi      = max_kpi,
    min_salary   = min_salary,
    max_salary   = max_salary,
    tier_counts  = tier_counts,
    tier_pct     = tier_pct,
    dept_summary = dept_summary,
    subset_df    = subset_df
  ))
}

# ── Fungsi print_report_table ─────────────────────────
print_report_table <- function(report) {
  summary_tbl <- data.frame(
    Metric = c("Total Employees", "Average Salary", "Average KPI Score",
               "Average Performance", "Top Performers (KPI > 90)",
               "Highest KPI Score", "Salary Range"),
    Value  = c(
      report$total_emp,
      paste0("$", format(report$avg_salary, big.mark = ",")),
      report$avg_kpi,
      report$avg_perf,
      paste0(report$top_count, " employees"),
      report$max_kpi,
      paste0("$", format(report$min_salary, big.mark = ","),
             " - $", format(report$max_salary, big.mark = ","))
    )
  )

  print(
    knitr::kable(summary_tbl,
                 col.names = c("Metric", "Value"),
                 format    = "html",
                 row.names = FALSE) %>%
      kable_styling(bootstrap_options = c("striped", "hover"),
                    full_width        = TRUE) %>%
      row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
      row_spec(seq(2, nrow(summary_tbl), 2), background = "#fcd5d5")
  )
}

# ── Fungsi print_dept_table ───────────────────────────
print_dept_table <- function(report) {
  print(
    knitr::kable(report$dept_summary,
                 col.names = c("Department", "Headcount", "Avg Salary",
                               "Avg KPI", "Top Performers"),
                 format    = "html",
                 row.names = FALSE) %>%
      kable_styling(bootstrap_options = c("striped", "hover"),
                    full_width        = TRUE) %>%
      row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
      row_spec(seq(2, nrow(report$dept_summary), 2), background = "#fcd5d5")
  )
}

# ── Fungsi plot_company_report ────────────────────────
plot_company_report <- function(report) {
  tier_order  <- c("Platinum", "Gold", "Silver", "Bronze")
  tier_colors <- c("#7f1d1d", "#b91c1c", "#ef4444", "#fca5a5")
  subset_df   <- report$subset_df

  par(mfrow    = c(1, 3),
      mar      = c(5, 5, 4, 2),
      cex.main = 1.2,
      cex.lab  = 1.1,
      cex.axis = 1.0)

  # ── Plot 1: KPI Tier Pie Chart ─────────────────────
  pie(report$tier_counts,
      labels = paste0(tier_order, "\n", report$tier_pct, "%"),
      col    = tier_colors,
      main   = paste0(report$company, "\nKPI Tier Distribution"),
      cex    = 0.95)

  # ── Plot 2: Salary Distribution Histogram ──────────
  hist(subset_df$salary,
       col    = "#fca5a5",
       border = "white",
       main   = paste0(report$company, "\nSalary Distribution"),
       xlab   = "Salary",
       ylab   = "Frequency")
  abline(v   = report$avg_salary,
         col = "#7f1d1d", lwd = 2, lty = 2)
  legend("topright",
         legend = paste0("Avg: $", format(report$avg_salary, big.mark = ",")),
         col    = "#7f1d1d", lty = 2, lwd = 2,
         bty    = "n", cex = 0.9)
  grid(lty = "dashed", col = "gray85")

  # ── Plot 3: Scatter Performance vs KPI ─────────────
  plot(subset_df$performance_score, subset_df$KPI_score,
       col  = "#7f1d1d", pch = 16, cex = 1.1,
       xlab = "Performance Score",
       ylab = "KPI Score",
       main = paste0(report$company, "\nPerformance vs KPI"))
  fit <- lm(KPI_score ~ performance_score, data = subset_df)
  abline(fit, col = "black", lwd = 2, lty = 2)
  abline(h = 90, col = "#b91c1c", lwd = 1.5, lty = 3)
  legend("topleft",
         legend = c("Regression", "KPI = 90"),
         col    = c("black", "#b91c1c"),
         lty    = c(2, 3), lwd = 2,
         bty    = "n", cex = 0.85)
  grid(lty = "dashed", col = "gray85")

  par(mfrow = c(1, 1))
}

8.3 Generate All Company Reports

The loop below automatically generates a complete structured report for each company in the dataset — including summary statistics, department breakdown table, and three visualizations. This simulates an automated reporting pipeline where output is produced per entity without manual intervention.

companies_list <- unique(df$company_id)

# ── Loop utama: generate report per perusahaan ────────
for (comp in companies_list) {

  # Header perusahaan
  cat(paste0("\n\n### ", comp, "\n\n"))

  # Generate report object
  report <- generate_single_report(df, comp)

  # Summary stats table
  cat("**Summary Statistics**\n\n")
  print_report_table(report)

  # Department breakdown table
  cat("\n\n**Department Breakdown**\n\n")
  print_dept_table(report)

  # Visualizations
  cat("\n\n")
  plot_company_report(report)

  cat("\n\n---\n\n")
}

8.3.1 Company 1

Summary Statistics

Metric	Value
Total Employees	98
Average Salary	$12,785.39
Average KPI Score	76.29
Average Performance	74.94
Top Performers (KPI > 90)	22 employees
Highest KPI Score	99.83
Salary Range	$4,001 - $19,721

Department Breakdown

Department	Headcount	Avg Salary	Avg KPI	Top Performers
Finance	17	10293.00	78.95	3
Marketing	19	12195.68	78.33	5
HR	28	13891.96	76.32	6
Engineering	14	11845.21	74.92	5
Operations	20	14573.05	73.03	3

8.3.2 Company 2

Summary Statistics

Metric	Value
Total Employees	95
Average Salary	$11,197.06
Average KPI Score	76.14
Average Performance	74.1
Top Performers (KPI > 90)	25 employees
Highest KPI Score	99.61
Salary Range	$4,109 - $19,928

Department Breakdown

Department	Headcount	Avg Salary	Avg KPI	Top Performers
Marketing	14	11035.57	79.80	4
Finance	28	11174.54	77.21	8
HR	18	9884.22	76.63	5
Operations	18	12555.61	74.55	4
Engineering	17	11318.76	72.56	4

8.3.3 Company 3

Summary Statistics

Metric	Value
Total Employees	76
Average Salary	$11,693.68
Average KPI Score	74.08
Average Performance	74.52
Top Performers (KPI > 90)	14 employees
Highest KPI Score	98.5
Salary Range	$4,416 - $19,949

Department Breakdown

Department	Headcount	Avg Salary	Avg KPI	Top Performers
Finance	17	9687.94	79.77	6
HR	18	10905.22	77.94	3
Marketing	14	12814.57	72.88	3
Engineering	19	13073.21	71.91	2
Operations	8	12492.00	60.57	0

8.3.4 Company 4

Summary Statistics

Metric	Value
Total Employees	129
Average Salary	$11,852.91
Average KPI Score	73.62
Average Performance	74.82
Top Performers (KPI > 90)	17 employees
Highest KPI Score	99.97
Salary Range	$4,083 - $19,855

Department Breakdown

Department	Headcount	Avg Salary	Avg KPI	Top Performers
Engineering	32	10565.44	78.36	8
Operations	24	12426.54	72.66	1
Marketing	23	12826.91	72.32	5
Finance	25	12323.04	72.03	3
HR	25	11584.00	71.28	0

8.3.5 Company 5

Summary Statistics

Metric	Value
Total Employees	168
Average Salary	$11,934.82
Average KPI Score	75.65
Average Performance	75.51
Top Performers (KPI > 90)	41 employees
Highest KPI Score	99.99
Salary Range	$4,091 - $19,916

Department Breakdown

Department	Headcount	Avg Salary	Avg KPI	Top Performers
Finance	31	12049.68	79.08	12
Engineering	29	11635.17	76.53	7
HR	37	12636.59	76.07	8
Marketing	39	11867.87	75.03	10
Operations	32	11365.28	71.79	4

8.3.6 Company 6

Summary Statistics

Metric	Value
Total Employees	142
Average Salary	$12,184.23
Average KPI Score	75.98
Average Performance	73.79
Top Performers (KPI > 90)	30 employees
Highest KPI Score	99.96
Salary Range	$4,096 - $19,955

Department Breakdown

Department	Headcount	Avg Salary	Avg KPI	Top Performers
Finance	23	12689.96	78.75	5
Marketing	27	11231.67	77.04	5
Operations	36	12376.19	75.72	9
Engineering	26	12510.42	75.64	7
HR	30	12140.73	73.48	4

8.3.7 Company 7

Summary Statistics

Metric	Value
Total Employees	133
Average Salary	$11,858.23
Average KPI Score	75.76
Average Performance	73.67
Top Performers (KPI > 90)	34 employees
Highest KPI Score	98.52
Salary Range	$4,002 - $19,922

Department Breakdown

Department	Headcount	Avg Salary	Avg KPI	Top Performers
HR	24	11913.12	82.09	9
Engineering	26	12432.00	77.60	10
Finance	28	11077.61	77.49	10
Operations	26	10503.73	71.76	2
Marketing	29	13266.45	70.79	3

8.4 Consolidated Report Summary

# ── Tabel ringkasan semua perusahaan ──────────────────
all_reports <- lapply(companies_list, function(comp) {
  r <- generate_single_report(df, comp)
  data.frame(
    Company         = r$company,
    Total_Employees = r$total_emp,
    Avg_Salary      = r$avg_salary,
    Avg_KPI         = r$avg_kpi,
    Avg_Performance = r$avg_perf,
    Top_Performers  = r$top_count,
    Max_KPI         = r$max_kpi
  )
})

consolidated <- do.call(rbind, all_reports)

knitr::kable(consolidated,
             col.names = c("Company", "Employees", "Avg Salary",
                           "Avg KPI", "Avg Performance",
                           "Top Performers", "Max KPI"),
             caption   = "Table 13: Consolidated Report — All Companies",
             format    = "html",
             row.names = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width        = TRUE) %>%
  row_spec(0, background = "#7f1d1d", color = "white", bold = TRUE) %>%
  row_spec(seq(2, nrow(consolidated), 2), background = "#fcd5d5")

Table 13: Consolidated Report — All Companies
Company	Employees	Avg Salary	Avg KPI	Avg Performance	Top Performers	Max KPI
Company 1	98	12785.39	76.29	74.94	22	99.83
Company 2	95	11197.06	76.14	74.10	25	99.61
Company 3	76	11693.68	74.08	74.52	14	98.50
Company 4	129	11852.91	73.62	74.82	17	99.97
Company 5	168	11934.82	75.65	75.51	41	99.99
Company 6	142	12184.23	75.98	73.79	30	99.96
Company 7	133	11858.23	75.76	73.67	34	98.52

8.5 Export to CSV

# ── Export ringkasan ke CSV ───────────────────────────
write.csv(consolidated,
          file      = "company_report_summary.csv",
          row.names = FALSE)

cat("Report exported successfully: company_report_summary.csv\n")

## Report exported successfully: company_report_summary.csv

cat("  Rows:", nrow(consolidated), "| Columns:", ncol(consolidated), "\n")

##   Rows: 7 | Columns: 7

8.6 Interpretation

The automated report generation system successfully loops through all 7 companies, producing a structured report for each without any manual repetition. Each company report includes a summary statistics table, department breakdown, and three visualizations: KPI tier distribution, salary histogram, and a performance vs KPI scatter plot with regression line. The consolidated summary table aggregates all companies into a single comparative view, making it easy to identify which companies have the highest average KPI, salary, or top performer count. The CSV export ensures the results can be shared or used in downstream reporting tools. This approach demonstrates how functions and loops together create scalable, reusable reporting pipelines — a core pattern in real-world data science workflows.