Syntax and Control Flow

Practice: Conditional Statement and Loops

Ni.MD Aurora Sekarningrum

Student at Bandung Institute of Science and Technology

Major: Data Science

NIM: 5225072

Date: 2026-03-02

R Programming
Statistics
Data Science

1 Setup Library

# Install jika belum ada
packages <- c("ggplot2", "dplyr", "tidyr", "scales",
              "gridExtra", "knitr", "kableExtra", "RColorBrewer")

for (pkg in packages) {
  if (!require(pkg, character.only = TRUE, quietly = TRUE)) {
    install.packages(pkg, quiet = TRUE)
    library(pkg, character.only = TRUE)
  }
}
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
set.seed(42)
cat("✅ Semua library berhasil dimuat!\n")
## ✅ Semua library berhasil dimuat!

2 Task 1: Dynamic Multi-Formula Function

Deskripsi: Membuat fungsi compute_formula(x, formula) untuk menghitung formula linear, quadratic, cubic, dan exponential, lalu plot semua formula dalam satu grafik untuk x = 1:20.

2.1 Fungsi & Komputasi

# ============================================================
# TASK 1: Dynamic Multi-Formula Function
# ============================================================

compute_formula <- function(x, formula) {
  # Validasi input formula
  valid_formulas <- c("linear", "quadratic", "cubic", "exponential")
  if (!(formula %in% valid_formulas)) {
    stop(paste("❌ Formula tidak valid:", formula,
               "\nPilih dari:", paste(valid_formulas, collapse = ", ")))
  }
  
  # Hitung berdasarkan jenis formula
  result <- switch(formula,
    "linear"      = 2 * x + 1,              # f(x) = 2x + 1
    "quadratic"   = x^2 + 3*x + 2,          # f(x) = x² + 3x + 2
    "cubic"       = x^3 - 2*x^2 + x,        # f(x) = x³ - 2x² + x
    "exponential" = exp(0.3 * x)             # f(x) = e^(0.3x)
  )
  return(result)
}

# ── Nested loop: hitung semua formula untuk x = 1:20 ──
x_values <- 1:20
formulas <- c("linear", "quadratic", "cubic", "exponential")

# Buat data frame menggunakan nested loop
df_formulas <- data.frame(x = x_values)

for (formula in formulas) {
  values <- c()
  for (x in x_values) {
    values <- c(values, compute_formula(x, formula))  # nested loop
  }
  df_formulas[[formula]] <- values
}

# Tampilkan tabel hasil
df_formulas %>%
  mutate(across(where(is.numeric), ~ round(., 2))) %>%
  kable(caption = "📊 Hasil Komputasi Multi-Formula (x = 1 sampai 20)",
        align = "c") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  column_spec(1, bold = TRUE, color = "#2c3e50")
📊 Hasil Komputasi Multi-Formula (x = 1 sampai 20)
x linear quadratic cubic exponential
1 3 6 0 1.35
2 5 12 2 1.82
3 7 20 12 2.46
4 9 30 36 3.32
5 11 42 80 4.48
6 13 56 150 6.05
7 15 72 252 8.17
8 17 90 392 11.02
9 19 110 576 14.88
10 21 132 810 20.09
11 23 156 1100 27.11
12 25 182 1452 36.60
13 27 210 1872 49.40
14 29 240 2366 66.69
15 31 272 2940 90.02
16 33 306 3600 121.51
17 35 342 4352 164.02
18 37 380 5202 221.41
19 39 420 6156 298.87
20 41 462 7220 403.43

2.2 Plot

=

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
# ── Reshape ke format long ──
df_long <- df_formulas %>%
  pivot_longer(cols = -x, names_to = "formula", values_to = "y") %>%
  mutate(formula = factor(formula,
                          levels = c("linear","quadratic","cubic","exponential"),
                          labels = c("Linear: f(x)=2x+1",
                                     "Quadratic: f(x)=x²+3x+2",
                                     "Cubic: f(x)=x³-2x²+x",
                                     "Exponential: f(x)=e^0.3x")))

# ── Plot ggplot ──
p <- ggplot(df_long, aes(x = x, y = y, color = formula, group = formula,
                         text = paste("x:", x, "<br>y:", round(y,2)))) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 2.5, alpha = 0.8) +
  scale_color_manual(values = c("#2196F3","#4CAF50","#FF5722","#9C27B0")) +
  scale_x_continuous(breaks = 1:20) +
  labs(
    title    = "📈 Task 1: Perbandingan Multi-Formula",
    x        = "x",
    y        = "f(x)",
    color    = "Formula"
  ) +
  theme_minimal()

ggplotly(p, tooltip = "text")

Task 1 Selesai! Fungsi compute_formula() berhasil menghitung 4 jenis formula menggunakan nested loop dan divalidasi input-nya.


3 Task 2: Nested Simulation — Multi-Sales & Discounts

Deskripsi: Simulasi data penjualan simulate_sales(n_salesperson, days) dengan conditional discount dan cumulative sales per salesperson.

3.1 Fungsi & Simulasi

# ============================================================
# TASK 2: Nested Simulation — Multi-Sales & Discounts
# ============================================================

# Nested helper function: tentukan discount berdasarkan sales amount
apply_discount <- function(sales_amount) {
  if (sales_amount >= 900) return(0.15)
  else if (sales_amount >= 700) return(0.10)
  else if (sales_amount >= 500) return(0.07)
  else if (sales_amount >= 300) return(0.05)
  else return(0.00)
}

# Fungsi utama simulasi penjualan
simulate_sales <- function(n_salesperson, days) {
  records <- list()
  idx     <- 1
  
  # Nested loop: per salesperson, per day
  for (sp_id in 1:n_salesperson) {
    for (day in 1:days) {
      sales_amount  <- round(runif(1, min = 100, max = 1000), 2)
      discount_rate <- apply_discount(sales_amount)  # nested function call
      net_sales     <- round(sales_amount * (1 - discount_rate), 2)
      
      records[[idx]] <- data.frame(
        sales_id      = sp_id,
        day           = day,
        sales_amount  = sales_amount,
        discount_rate = discount_rate,
        net_sales     = net_sales
      )
      idx <- idx + 1
    }
  }
  return(do.call(rbind, records))
}

# ── Simulasi: 5 salesperson × 30 hari ──
set.seed(42)
df_sales <- simulate_sales(n_salesperson = 5, days = 30)

# ── Hitung cumulative sales per salesperson ──
df_sales <- df_sales %>%
  group_by(sales_id) %>%
  mutate(cumulative_sales = cumsum(net_sales)) %>%
  ungroup()

# ── Summary statistics ──
summary_sales <- df_sales %>%
  group_by(sales_id) %>%
  summarise(
    Total_Sales   = round(sum(sales_amount), 2),
    Total_Net     = round(sum(net_sales), 2),
    Avg_Discount  = paste0(round(mean(discount_rate) * 100, 1), "%"),
    Avg_Daily_Sales = round(mean(sales_amount), 2),
    .groups = "drop"
  ) %>%
  rename("Salesperson ID" = sales_id)

summary_sales %>%
  kable(caption = "📊 Summary Statistik per Salesperson (30 Hari)",
        align = "c") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  row_spec(0, background = "#3498db", color = "white")
📊 Summary Statistik per Salesperson (30 Hari)
Salesperson ID Total_Sales Total_Net Avg_Discount Avg_Daily_Sales
1 19575.11 17486.71 8.8% 652.50
2 17220.40 15659.25 6.9% 574.01
3 14295.31 13237.04 5.2% 476.51
4 17910.00 16274.47 7.3% 597.00
5 18565.48 16815.22 7.6% 618.85

3.2 Plot

# ── Plot 1: Cumulative Sales per Salesperson ──
p1 <- ggplot(df_sales,
             aes(x = day, y = cumulative_sales,
                 color = factor(sales_id), group = factor(sales_id))) +
  geom_line(linewidth = 1.1) +
  geom_point(size = 1.5, alpha = 0.7) +
  scale_color_brewer(palette = "Set1") +
  labs(
    title  = "📈 Cumulative Net Sales per Salesperson",
    x      = "Day", y = "Cumulative Net Sales",
    color  = "Salesperson"
  ) +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(face = "bold"),
        legend.position = "bottom")

# ── Plot 2: Total Net Sales per Salesperson ──
total_net <- df_sales %>%
  group_by(sales_id) %>%
  summarise(total_net = sum(net_sales), .groups = "drop")

p2 <- ggplot(total_net, aes(x = factor(sales_id), y = total_net,
                             fill = factor(sales_id))) +
  geom_col(width = 0.65, show.legend = FALSE) +
  geom_text(aes(label = paste0("$", format(round(total_net), big.mark=","))),
            vjust = -0.4, size = 3.8, fontface = "bold") +
  scale_fill_brewer(palette = "Set1") +
  scale_x_discrete(labels = paste0("SP ", 1:5)) +
  labs(
    title = "💰 Total Net Sales per Salesperson",
    x     = "Salesperson", y = "Total Net Sales"
  ) +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(face = "bold"),
        panel.grid.major.x = element_blank())

grid.arrange(p1, p2, ncol = 2)

Task 2 Selesai! Fungsi simulate_sales() + nested function apply_discount() berhasil mensimulasikan penjualan dengan conditional discount.


4 Task 3: Multi-Level Performance Categorization

Deskripsi: Fungsi categorize_performance() dengan 5 kategori: Excellent, Very Good, Good, Average, Poor.

4.1 Fungsi & Kategorisasi

# ============================================================
# TASK 3: Multi-Level Performance Categorization
# ============================================================

categorize_performance <- function(sales_amount) {
  # Kategorisasi berdasarkan sales amount
  if      (sales_amount >= 850) return("Excellent")
  else if (sales_amount >= 700) return("Very Good")
  else if (sales_amount >= 500) return("Good")
  else if (sales_amount >= 300) return("Average")
  else                          return("Poor")
}

# ── Loop through vector untuk kategorisasi ──
set.seed(99)
sales_vector <- runif(200, min = 100, max = 1000)

categories <- c()
for (amount in sales_vector) {              # loop through vector
  categories <- c(categories, categorize_performance(amount))
}

df_perf <- data.frame(sales_amount = sales_vector, category = categories)

# ── Hitung persentase per kategori ──
cat_order   <- c("Excellent", "Very Good", "Good", "Average", "Poor")
cat_summary <- df_perf %>%
  group_by(category) %>%
  summarise(Count = n(), .groups = "drop") %>%
  mutate(
    category   = factor(category, levels = cat_order),
    Percentage = round(Count / sum(Count) * 100, 2),
    Label      = paste0(Count, " (", Percentage, "%)")
  ) %>%
  arrange(category)

cat_summary %>%
  select(Category = category, Count, Percentage) %>%
  mutate(Percentage = paste0(Percentage, "%")) %>%
  kable(caption = "📊 Distribusi Kategori Performa",
        align = "c") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE) %>%
  row_spec(1, background = "#d5f5e3") %>%
  row_spec(2, background = "#a9dfbf") %>%
  row_spec(3, background = "#fdebd0") %>%
  row_spec(4, background = "#fad7a0") %>%
  row_spec(5, background = "#f5b7b1")
📊 Distribusi Kategori Performa
Category Count Percentage
Excellent 30 15%
Very Good 37 18.5%
Good 55 27.5%
Average 36 18%
Poor 42 21%

4.2 Plot

colors_perf <- c("Excellent" = "#1a9641",
                 "Very Good" = "#a6d96a",
                 "Good"      = "#ffffbf",
                 "Average"   = "#fdae61",
                 "Poor"      = "#d7191c")

# ── Bar Chart ──
p_bar <- ggplot(cat_summary,
                aes(x = category, y = Count, fill = category)) +
  geom_col(width = 0.65, show.legend = FALSE) +
  geom_text(aes(label = paste0(Count, "\n(", Percentage, "%)")),
            vjust = -0.3, size = 3.8, fontface = "bold") +
  scale_fill_manual(values = colors_perf) +
  scale_x_discrete(limits = cat_order) +
  labs(title = "📊 Bar Chart: Distribusi Kategori Performa",
       x = "Kategori", y = "Jumlah") +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(face = "bold"),
        panel.grid.major.x = element_blank())

# ── Pie Chart ──
p_pie <- ggplot(cat_summary,
                aes(x = "", y = Count, fill = category)) +
  geom_col(width = 1, color = "white", linewidth = 0.8) +
  coord_polar("y", start = 0) +
  geom_text(aes(label = paste0(Percentage, "%")),
            position = position_stack(vjust = 0.5),
            size = 4, fontface = "bold") +
  scale_fill_manual(values = colors_perf,
                    limits = cat_order) +
  labs(title = "🥧 Pie Chart: Distribusi Kategori Performa",
       fill = "Kategori") +
  theme_void(base_size = 12) +
  theme(plot.title = element_text(face = "bold", hjust = 0.5),
        legend.position = "right")

grid.arrange(p_bar, p_pie, ncol = 2)

Task 3 Selesai! Fungsi categorize_performance() berhasil mengkategorikan 200 data penjualan ke dalam 5 tier performa.


5 Task 4: Multi-Company Dataset Simulation

Deskripsi: Fungsi generate_company_data() dengan nested loop per company & employee, conditional logic untuk top performers (KPI > 90).

5.1 Fungsi & Simulasi

# ============================================================
# TASK 4: Multi-Company Dataset Simulation
# ============================================================

departments <- c("Engineering", "Marketing", "Finance", "HR", "Operations")

generate_company_data <- function(n_company, n_employees) {
  records <- list()
  emp_id  <- 1
  
  # Nested loop: per company, per employee
  for (c_id in 1:n_company) {
    for (e in 1:n_employees) {
      salary            <- max(2000, round(rnorm(1, mean = 5000, sd = 1500), 2))
      performance_score <- round(runif(1, 50, 100), 2)
      kpi_score         <- round(runif(1, 50, 100), 2)
      department        <- sample(departments, 1)
      
      # Conditional logic: top performers (KPI > 90)
      is_top <- kpi_score > 90
      
      records[[emp_id]] <- data.frame(
        company_id        = paste0("C", sprintf("%02d", c_id)),
        employee_id       = paste0("E", sprintf("%04d", emp_id)),
        salary            = salary,
        department        = department,
        performance_score = performance_score,
        KPI_score         = kpi_score,
        top_performer     = is_top
      )
      emp_id <- emp_id + 1
    }
  }
  return(do.call(rbind, records))
}

# ── Generate: 4 perusahaan × 50 karyawan ──
set.seed(7)
df_company <- generate_company_data(n_company = 4, n_employees = 50)

# ── Summary per company ──
company_summary <- df_company %>%
  group_by(company_id) %>%
  summarise(
    Avg_Salary      = round(mean(salary), 2),
    Avg_Performance = round(mean(performance_score), 2),
    Max_KPI         = round(max(KPI_score), 2),
    Top_Performers  = sum(top_performer),
    Total_Employees = n(),
    .groups = "drop"
  )

company_summary %>%
  kable(caption = "📊 Summary per Perusahaan",
        align = "c",
        col.names = c("Company", "Avg Salary", "Avg Performance",
                      "Max KPI", "Top Performers", "Total Employees")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) %>%
  row_spec(0, background = "#2c3e50", color = "white")
📊 Summary per Perusahaan
Company Avg Salary Avg Performance Max KPI Top Performers Total Employees
C01 5385.29 77.46 97.45 7 50
C02 4541.99 69.13 99.29 12 50
C03 4939.54 75.80 99.25 7 50
C04 4964.16 76.17 98.51 9 50

5.2 Plot

pal4 <- brewer.pal(4, "Set2")

# 1. Avg Salary per Company
p4a <- ggplot(company_summary,
              aes(x = company_id, y = Avg_Salary, fill = company_id)) +
  geom_col(show.legend = FALSE, width = 0.6) +
  geom_text(aes(label = paste0("$", format(Avg_Salary, big.mark=","))),
            vjust = -0.3, size = 4, fontface = "bold") +
  scale_fill_manual(values = pal4) +
  labs(title = "💵 Avg Salary per Company", x = "Company", y = "Avg Salary") +
  theme_minimal() + theme(plot.title = element_text(face="bold"),
                          panel.grid.major.x = element_blank())

# 2. Avg Performance Score
p4b <- ggplot(company_summary,
              aes(x = company_id, y = Avg_Performance, fill = company_id)) +
  geom_col(show.legend = FALSE, width = 0.6) +
  geom_text(aes(label = round(Avg_Performance, 1)),
            vjust = -0.3, size = 4, fontface = "bold") +
  scale_fill_manual(values = pal4) +
  labs(title = "🏆 Avg Performance Score", x = "Company", y = "Score") +
  theme_minimal() + theme(plot.title = element_text(face="bold"),
                          panel.grid.major.x = element_blank())

# 3. KPI Score — Boxplot
p4c <- ggplot(df_company,
              aes(x = company_id, y = KPI_score, fill = company_id)) +
  geom_boxplot(show.legend = FALSE, alpha = 0.85, outlier.color = "red") +
  scale_fill_manual(values = pal4) +
  labs(title = "📦 KPI Score Distribution (Boxplot)",
       x = "Company", y = "KPI Score") +
  theme_minimal() + theme(plot.title = element_text(face="bold"))

# 4. Top Performers Count
p4d <- ggplot(company_summary,
              aes(x = company_id, y = Top_Performers, fill = company_id)) +
  geom_col(show.legend = FALSE, width = 0.6) +
  geom_text(aes(label = Top_Performers), vjust = -0.3, size = 4.5, fontface = "bold") +
  scale_fill_manual(values = pal4) +
  labs(title = "⭐ Top Performers (KPI > 90)", x = "Company", y = "Count") +
  theme_minimal() + theme(plot.title = element_text(face="bold"),
                          panel.grid.major.x = element_blank())

grid.arrange(p4a, p4b, p4c, p4d, ncol = 2,
             top = "Task 4: Multi-Company Dataset Analysis")

Task 4 Selesai! Fungsi generate_company_data() menggunakan nested loop dan conditional logic untuk mengidentifikasi top performers.


6 Task 5: Monte Carlo Simulation — Pi & Probability

Deskripsi: Estimasi π menggunakan Monte Carlo, hitung probabilitas titik di sub-square, dan visualisasi titik inside vs outside circle.

6.1 Fungsi & Hasil

# ============================================================
# TASK 5: Monte Carlo Simulation — Pi & Probability
# ============================================================

monte_carlo_pi <- function(n_points, seed = 42) {
  set.seed(seed)
  inside_circle  <- 0
  in_subsquare   <- 0
  
  x_in <- c(); y_in <- c()
  x_out <- c(); y_out <- c()
  
  # Loop untuk iterasi
  for (i in 1:n_points) {
    x <- runif(1, -1, 1)
    y <- runif(1, -1, 1)
    
    # Cek apakah di dalam unit circle
    if (x^2 + y^2 <= 1) {
      inside_circle <- inside_circle + 1
      x_in <- c(x_in, x); y_in <- c(y_in, y)
    } else {
      x_out <- c(x_out, x); y_out <- c(y_out, y)
    }
    
    # Cek sub-square [0, 0.5] × [0, 0.5]
    if (x >= 0 && x <= 0.5 && y >= 0 && y <= 0.5) {
      in_subsquare <- in_subsquare + 1
    }
  }
  
  pi_estimate    <- 4 * inside_circle / n_points
  subsquare_prob <- in_subsquare / n_points
  
  list(
    pi_estimate    = pi_estimate,
    inside_circle  = inside_circle,
    outside_circle = n_points - inside_circle,
    subsquare_prob = subsquare_prob,
    x_in = x_in, y_in = y_in,
    x_out = x_out, y_out = y_out
  )
}

# ── Uji dengan berbagai n_points ──
n_list    <- c(100, 500, 1000, 5000, 10000)
pi_results <- data.frame(
  n_points    = n_list,
  pi_estimate = sapply(n_list, function(n) monte_carlo_pi(n)$pi_estimate)
) %>%
  mutate(error = round(abs(pi_estimate - pi), 6),
         pi_estimate = round(pi_estimate, 6))

pi_results %>%
  kable(caption = paste("🎯 Konvergensi Estimasi π (nilai asli π =", round(pi, 6), ")"),
        align = "c",
        col.names = c("n Points", "π Estimate", "Absolute Error")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE) %>%
  row_spec(5, bold = TRUE, background = "#d5f5e3")
🎯 Konvergensi Estimasi π (nilai asli π = 3.141593 )
n Points π Estimate Absolute Error
100 3.0400 0.101593
500 3.0880 0.053593
1000 3.0920 0.049593
5000 3.0992 0.042393
10000 3.1208 0.020793
final <- monte_carlo_pi(10000)
cat(sprintf("\n🎯 Estimasi π (n=10.000) : %.5f\n", final$pi_estimate))
## 
## 🎯 Estimasi π (n=10.000) : 3.12080
cat(sprintf("📦 Probabilitas Sub-square: %.4f  (expected: 0.0625)\n",
            final$subsquare_prob))
## 📦 Probabilitas Sub-square: 0.0625  (expected: 0.0625)

6.2 Plot

# ── Plot 1: Scatter Inside vs Outside Circle ──
df_circle <- rbind(
  data.frame(x = final$x_in,  y = final$y_in,  status = "Inside Circle"),
  data.frame(x = final$x_out, y = final$y_out, status = "Outside Circle")
)

# Buat circle & sub-square data
theta  <- seq(0, 2*pi, length.out = 300)
circle_df  <- data.frame(x = cos(theta), y = sin(theta))
square_df  <- data.frame(
  x = c(0, 0.5, 0.5, 0, 0),
  y = c(0, 0, 0.5, 0.5, 0)
)

p5a <- ggplot(df_circle, aes(x = x, y = y, color = status)) +
  geom_point(size = 0.3, alpha = 0.4) +
  geom_path(data = circle_df, aes(x, y),
            color = "navy", linewidth = 1.2, inherit.aes = FALSE) +
  geom_path(data = square_df, aes(x, y),
            color = "orange", linewidth = 1.2, linetype = "dashed",
            inherit.aes = FALSE) +
  scale_color_manual(values = c("Inside Circle" = "#27ae60",
                                "Outside Circle" = "#e74c3c")) +
  coord_fixed() +
  labs(
    title = sprintf("🎯 Monte Carlo π ≈ %.5f (n=10.000)", final$pi_estimate),
    color = NULL
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"),
        legend.position = "bottom")

# ── Plot 2: Konvergensi Estimasi π ──
p5b <- ggplot(pi_results, aes(x = n_points, y = pi_estimate)) +
  geom_line(color = "#2196F3", linewidth = 1.2) +
  geom_point(color = "#2196F3", size = 3) +
  geom_hline(yintercept = pi, color = "red",
             linetype = "dashed", linewidth = 1) +
  annotate("text", x = max(n_list)*0.6, y = pi + 0.015,
           label = paste0("True π = ", round(pi, 5)),
           color = "red", size = 3.8) +
  scale_x_continuous(labels = scales::comma) +
  labs(
    title = "📉 Konvergensi Estimasi π",
    x     = "Jumlah Titik",
    y     = "Estimasi π"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

grid.arrange(p5a, p5b, ncol = 2)

Task 5 Selesai! Fungsi monte_carlo_pi() berhasil mengestimasi π menggunakan loop iterasi, sekaligus menghitung probabilitas sub-square.


7 Task 6: Advanced Data Transformation & Feature Engineering

Deskripsi: Fungsi normalize_columns() & z_score() dengan loop, feature engineering, dan perbandingan distribusi before & after.

7.1 Fungsi & Transformasi

# ============================================================
# TASK 6: Advanced Data Transformation & Feature Engineering
# ============================================================

normalize_columns <- function(df, cols) {
  # Min-Max normalization menggunakan loop
  for (col in cols) {
    col_min <- min(df[[col]], na.rm = TRUE)
    col_max <- max(df[[col]], na.rm = TRUE)
    df[[paste0(col, "_norm")]] <- (df[[col]] - col_min) / (col_max - col_min)
  }
  return(df)
}

z_score <- function(df, cols) {
  # Z-score standardization menggunakan loop
  for (col in cols) {
    mu    <- mean(df[[col]], na.rm = TRUE)
    sigma <- sd(df[[col]], na.rm = TRUE)
    df[[paste0(col, "_zscore")]] <- (df[[col]] - mu) / sigma
  }
  return(df)
}

# ── Gunakan df_company dari Task 4 ──
df_t6 <- df_company

# ── Feature Engineering ──
# 1. performance_category
df_t6$performance_category <- ifelse(
  df_t6$performance_score >= 80, "High",
  ifelse(df_t6$performance_score >= 65, "Mid", "Low")
)

# 2. salary_bracket
df_t6$salary_bracket <- cut(
  df_t6$salary,
  breaks = c(0, 3000, 5000, 7000, Inf),
  labels = c("Low", "Medium", "High", "Very High"),
  include.lowest = TRUE
)

# ── Terapkan transformasi ──
num_cols <- c("salary", "performance_score", "KPI_score")
df_t6    <- normalize_columns(df_t6, num_cols)
df_t6    <- z_score(df_t6, num_cols)

# Tampilkan preview
df_t6 %>%
  select(employee_id, salary, salary_norm, salary_zscore,
         performance_score, performance_score_norm,
         performance_category, salary_bracket) %>%
  head(8) %>%
  mutate(across(where(is.numeric), ~round(., 3))) %>%
  kable(caption = "📊 Preview Data Setelah Transformasi & Feature Engineering",
        align = "c") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE, font_size = 12) %>%
  row_spec(0, background = "#8e44ad", color = "white")
📊 Preview Data Setelah Transformasi & Feature Engineering
employee_id salary salary_norm salary_zscore performance_score performance_score_norm performance_category salary_bracket
E0001 8430.87 0.886 2.314 55.78 0.107 Low Very High
E0002 4381.56 0.328 -0.384 58.29 0.158 Low Medium
E0003 6122.21 0.568 0.776 72.67 0.449 Mid High
E0004 8284.97 0.866 2.217 81.97 0.638 High Very High
E0005 8422.18 0.885 2.308 81.35 0.625 High Very High
E0006 5701.52 0.510 0.496 59.29 0.178 Low High
E0007 6535.63 0.625 1.051 89.53 0.791 High High
E0008 4549.43 0.351 -0.272 71.84 0.433 Mid Medium

7.2 Plot — Before & After

# ── Histogram: Before & After untuk 3 kolom ──
plot_list <- list()
col_labels <- c("Salary", "Performance Score", "KPI Score")

for (i in seq_along(num_cols)) {
  col <- num_cols[i]
  lbl <- col_labels[i]
  
  # Original
  p_orig <- ggplot(df_t6, aes_string(x = col)) +
    geom_histogram(bins = 15, fill = "#3498db", color = "white", alpha = 0.85) +
    labs(title = paste(lbl, "— Original"), x = col, y = "Freq") +
    theme_minimal(base_size = 11) +
    theme(plot.title = element_text(face = "bold", size = 10))
  
  # Normalized
  p_norm <- ggplot(df_t6, aes_string(x = paste0(col, "_norm"))) +
    geom_histogram(bins = 15, fill = "#2ecc71", color = "white", alpha = 0.85) +
    labs(title = paste(lbl, "— Min-Max Norm"), x = "Normalized", y = "Freq") +
    theme_minimal(base_size = 11) +
    theme(plot.title = element_text(face = "bold", size = 10))
  
  # Z-Score
  p_z <- ggplot(df_t6, aes_string(x = paste0(col, "_zscore"))) +
    geom_histogram(bins = 15, fill = "#e74c3c", color = "white", alpha = 0.85) +
    labs(title = paste(lbl, "— Z-Score"), x = "Z-Score", y = "Freq") +
    theme_minimal(base_size = 11) +
    theme(plot.title = element_text(face = "bold", size = 10))
  
  plot_list <- c(plot_list, list(p_orig, p_norm, p_z))
}
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
do.call(grid.arrange, c(plot_list, ncol = 3,
        top = "Task 6: Distribusi Sebelum & Sesudah Transformasi"))

7.3 Plot — Boxplot

ggplot(df_t6, aes(x = salary_bracket, y = KPI_score, fill = salary_bracket)) +
  geom_boxplot(alpha = 0.8, outlier.color = "red", show.legend = FALSE) +
  scale_fill_brewer(palette = "Set3") +
  scale_x_discrete(limits = c("Low", "Medium", "High", "Very High")) +
  labs(
    title = "📦 KPI Score berdasarkan Salary Bracket",
    x     = "Salary Bracket",
    y     = "KPI Score"
  ) +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Task 6 Selesai! Berhasil membuat fungsi normalisasi dan z-score berbasis loop, feature engineering, serta visualisasi before vs after.


8 Task 7: Mini Project — Company KPI Dashboard

Deskripsi: Dataset lengkap 7 perusahaan × 80 karyawan, KPI tier, grouped bar charts, scatter plot dengan regression lines, dan department analysis.

8.1 Data & Summary

# ============================================================
# TASK 7: Mini Project — Company KPI Dashboard
# ============================================================

set.seed(2024)
df_dash <- generate_company_data(n_company = 7, n_employees = 80)

# ── KPI Tier Categorization (loop-based) ──
kpi_tier <- function(score) {
  if      (score >= 90) return("Platinum")
  else if (score >= 75) return("Gold")
  else if (score >= 60) return("Silver")
  else                  return("Bronze")
}

kpi_tiers <- c()
for (score in df_dash$KPI_score) {   # loop per employee
  kpi_tiers <- c(kpi_tiers, kpi_tier(score))
}
df_dash$KPI_tier <- factor(kpi_tiers,
                            levels = c("Platinum","Gold","Silver","Bronze"))

# ── Summary per Company ──
dash_summary <- df_dash %>%
  group_by(company_id) %>%
  summarise(
    Avg_Salary    = round(mean(salary), 0),
    Avg_KPI       = round(mean(KPI_score), 2),
    Top_Performers = sum(top_performer),
    Total_Emp     = n(),
    .groups = "drop"
  )

dash_summary %>%
  kable(caption = "📋 Company KPI Summary (7 Perusahaan × 80 Karyawan)",
        align   = "c",
        col.names = c("Company","Avg Salary","Avg KPI","Top Performers","Total Emp")) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = FALSE) %>%
  row_spec(0, background = "#2c3e50", color = "white") %>%
  column_spec(3, bold = TRUE, color = "#2980b9")
📋 Company KPI Summary (7 Perusahaan × 80 Karyawan)
Company Avg Salary Avg KPI Top Performers Total Emp
C01 4925 72.50 11 80
C02 5022 76.43 15 80
C03 4990 74.02 14 80
C04 4860 73.43 12 80
C05 4759 73.37 12 80
C06 5102 76.33 15 80
C07 5448 75.66 17 80

8.2 KPI Tier & Top Performers

tier_colors <- c("Platinum" = "#8e44ad",
                 "Gold"     = "#f39c12",
                 "Silver"   = "#95a5a6",
                 "Bronze"   = "#a04000")

# ── Grouped Bar: KPI Tier per Company ──
tier_counts <- df_dash %>%
  group_by(company_id, KPI_tier) %>%
  summarise(count = n(), .groups = "drop")

p7a <- ggplot(tier_counts,
              aes(x = company_id, y = count, fill = KPI_tier)) +
  geom_col(position = "dodge", width = 0.75) +
  scale_fill_manual(values = tier_colors) +
  labs(title = "🏅 KPI Tier Distribution per Company",
       x = "Company", y = "Count", fill = "KPI Tier") +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(face="bold"),
        legend.position = "bottom")

# ── Top Performers per Company ──
p7b <- ggplot(dash_summary,
              aes(x = company_id, y = Top_Performers, fill = company_id)) +
  geom_col(show.legend = FALSE, width = 0.6) +
  geom_text(aes(label = Top_Performers), vjust = -0.3,
            size = 4.5, fontface = "bold") +
  scale_fill_brewer(palette = "Set1") +
  labs(title = "⭐ Top Performers per Company",
       x = "Company", y = "Count") +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(face="bold"),
        panel.grid.major.x = element_blank())

grid.arrange(p7a, p7b, ncol = 2)

8.3 Scatter + Regression

ggplot(df_dash, aes(x = salary, y = KPI_score,
                    color = company_id)) +
  geom_point(size = 1.8, alpha = 0.55) +
  geom_smooth(method = "lm", se = TRUE, linewidth = 1,
              aes(fill = company_id), alpha = 0.1) +
  scale_color_brewer(palette = "Set1") +
  scale_fill_brewer(palette = "Set1") +
  labs(
    title    = "📊 Salary vs KPI Score dengan Regression Lines per Company",
    subtitle = "Setiap warna mewakili 1 perusahaan; garis = regresi linear",
    x        = "Salary",
    y        = "KPI Score",
    color    = "Company",
    fill     = "Company"
  ) +
  theme_minimal(base_size = 12) +
  theme(plot.title    = element_text(face = "bold"),
        legend.position = "bottom") +
  guides(color = guide_legend(nrow = 1))
## `geom_smooth()` using formula = 'y ~ x'

8.4 Salary & Department Analysis

# ── Boxplot Salary per Company ──
p_sal <- ggplot(df_dash,
                aes(x = company_id, y = salary, fill = company_id)) +
  geom_boxplot(alpha = 0.85, show.legend = FALSE,
               outlier.color = "red", outlier.size = 1.5) +
  scale_fill_brewer(palette = "Set1") +
  labs(title = "💰 Distribusi Salary per Company",
       x = "Company", y = "Salary") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

# ── Department: Avg KPI per Company (Grouped Bar) ──
dept_kpi <- df_dash %>%
  group_by(department, company_id) %>%
  summarise(avg_kpi = round(mean(KPI_score), 2), .groups = "drop")

p_dept <- ggplot(dept_kpi,
                 aes(x = department, y = avg_kpi, fill = company_id)) +
  geom_col(position = "dodge", width = 0.75) +
  scale_fill_brewer(palette = "Set1") +
  labs(title = "🏢 Avg KPI per Department per Company",
       x = "Department", y = "Avg KPI Score",
       fill = "Company") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"),
        axis.text.x = element_text(angle = 15, hjust = 1),
        legend.position = "bottom")

grid.arrange(p_sal, p_dept, nrow = 2,
             top = "Task 7: Company KPI Dashboard — Salary & Department Analysis")

Task 7 Selesai! Dashboard lengkap dengan grouped bar chart, scatter+regression, boxplot, dan analisis departemen.


9 Task 8 (BONUS): Automated Report Generation

Deskripsi: Gunakan fungsi + loop untuk menghasilkan summary report otomatis per perusahaan, lengkap dengan tabel, statistik, dan plot.

9.1 Fungsi Report

# ============================================================
# TASK 8 (BONUS): Automated Report Generation
# ============================================================

# Fungsi: buat summary per company secara otomatis
generate_company_report <- function(df, c_id) {
  df_sub <- df %>% filter(company_id == c_id)
  
  # Statistik utama
  stats <- list(
    company      = c_id,
    n_emp        = nrow(df_sub),
    avg_salary   = round(mean(df_sub$salary), 0),
    avg_kpi      = round(mean(df_sub$KPI_score), 2),
    avg_perf     = round(mean(df_sub$performance_score), 2),
    n_top        = sum(df_sub$top_performer),
    max_kpi      = round(max(df_sub$KPI_score), 2),
    min_salary   = round(min(df_sub$salary), 0),
    max_salary   = round(max(df_sub$salary), 0)
  )
  
  # Top 5 performers
  top5 <- df_sub %>%
    arrange(desc(KPI_score)) %>%
    select(employee_id, department, salary, performance_score, KPI_score) %>%
    head(5) %>%
    mutate(across(where(is.numeric), ~round(., 2)))
  
  # KPI Tier distribution
  tier_dist <- df_sub %>%
    group_by(KPI_tier) %>%
    summarise(count = n(), pct = round(n()/nrow(df_sub)*100, 1), .groups="drop") %>%
    arrange(KPI_tier)
  
  # Department summary
  dept_sum <- df_sub %>%
    group_by(department) %>%
    summarise(
      Avg_KPI    = round(mean(KPI_score), 2),
      Avg_Salary = round(mean(salary), 0),
      Count      = n(),
      .groups = "drop"
    ) %>%
    arrange(desc(Avg_KPI))
  
  return(list(stats = stats, top5 = top5,
              tier_dist = tier_dist, dept_sum = dept_sum,
              data = df_sub))
}

cat("✅ Fungsi generate_company_report() siap digunakan.\n")
## ✅ Fungsi generate_company_report() siap digunakan.
cat("✅ Loop akan dijalankan di bawah untuk menghasilkan report semua perusahaan.\n")
## ✅ Loop akan dijalankan di bawah untuk menghasilkan report semua perusahaan.

9.2 Loop — Auto Report

company_ids <- sort(unique(df_dash$company_id))
pal_tier    <- c("Platinum"="#8e44ad","Gold"="#f39c12","Silver"="#95a5a6","Bronze"="#a04000")

# ── LOOP: Generate report untuk setiap perusahaan ──
for (c_id in company_ids) {
  
  rpt <- generate_company_report(df_dash, c_id)
  s   <- rpt$stats
  
  # ── Section Header ──
  cat(sprintf("\n\n### 📌 %s — Automated Report\n\n", c_id))
  
  # ── KPI Stat Cards (dalam tabel kecil) ──
  stats_df <- data.frame(
    Metric = c("Total Karyawan", "Avg Salary", "Avg KPI Score",
               "Avg Performance", "Top Performers", "Max KPI",
               "Min Salary", "Max Salary"),
    Value  = c(s$n_emp,
               paste0("$", format(s$avg_salary, big.mark=",")),
               s$avg_kpi, s$avg_perf, s$n_top, s$max_kpi,
               paste0("$", format(s$min_salary, big.mark=",")),
               paste0("$", format(s$max_salary, big.mark=",")))
  )
  
  print(
    stats_df %>%
      kable(align = "c",
            caption = paste("📊 Key Metrics —", c_id)) %>%
      kable_styling(bootstrap_options = c("striped","hover"),
                    full_width = FALSE, font_size = 12) %>%
      row_spec(0, background = "#2c3e50", color = "white") %>%
      column_spec(2, bold = TRUE, color = "#2980b9")
  )
  
  # ── Top 5 Performers ──
  cat("\n**🥇 Top 5 Performers:**\n\n")
  print(
    rpt$top5 %>%
      kable(align = "c",
            col.names = c("Employee ID","Department","Salary","Performance","KPI Score")) %>%
      kable_styling(bootstrap_options = c("striped","hover","condensed"),
                    full_width = FALSE, font_size = 12) %>%
      row_spec(1, bold = TRUE, background = "#d5f5e3")
  )
  
  # ── KPI Tier Distribution ──
  cat("\n**🏷️ KPI Tier Breakdown:**\n\n")
  print(
    rpt$tier_dist %>%
      mutate(label = paste0(count, " karyawan (", pct, "%)")) %>%
      select(KPI_tier, label) %>%
      kable(align = "c", col.names = c("KPI Tier","Count & Percentage")) %>%
      kable_styling(bootstrap_options = c("striped"),
                    full_width = FALSE, font_size = 12)
  )
  
  # ── Department Summary ──
  cat("\n**🏢 Department Summary:**\n\n")
  print(
    rpt$dept_sum %>%
      kable(align = "c",
            col.names = c("Department","Avg KPI","Avg Salary","Employee Count")) %>%
      kable_styling(bootstrap_options = c("striped","hover"),
                    full_width = FALSE, font_size = 12)
  )
  
  # ── Plot untuk company ini ──
  df_sub <- rpt$data
  
  p_hist <- ggplot(df_sub, aes(x = KPI_score)) +
    geom_histogram(bins = 15, fill = "#3498db", color = "white", alpha = 0.85) +
    geom_vline(xintercept = mean(df_sub$KPI_score),
               color = "red", linetype = "dashed", linewidth = 1) +
    annotate("text", x = mean(df_sub$KPI_score) + 1.5, y = Inf,
             label = paste0("Mean: ", round(mean(df_sub$KPI_score),1)),
             vjust = 1.5, color = "red", size = 3.5) +
    labs(title = paste(c_id, "— KPI Score Distribution"),
         x = "KPI Score", y = "Count") +
    theme_minimal(base_size = 11) +
    theme(plot.title = element_text(face = "bold"))
  
  p_dept2 <- ggplot(rpt$dept_sum,
                    aes(x = reorder(department, Avg_KPI), y = Avg_KPI,
                        fill = Avg_KPI)) +
    geom_col(show.legend = FALSE) +
    geom_text(aes(label = Avg_KPI), hjust = -0.2, size = 3.5, fontface = "bold") +
    scale_fill_gradient(low = "#f39c12", high = "#27ae60") +
    coord_flip() +
    labs(title = paste(c_id, "— Avg KPI per Department"),
         x = "Department", y = "Avg KPI") +
    theme_minimal(base_size = 11) +
    theme(plot.title = element_text(face = "bold"))
  
  grid.arrange(p_hist, p_dept2, ncol = 2)
  
  cat("\n---\n")
}

9.2.1 📌 C01 — Automated Report

📊 Key Metrics — C01
Metric Value
Total Karyawan 80
Avg Salary $4,925
Avg KPI Score 72.5
Avg Performance 76.41
Top Performers 11
Max KPI 99.71
Min Salary $2,000
Max Salary $8,897

🥇 Top 5 Performers:

Employee ID Department Salary Performance KPI Score
E0034 Operations 4218.15 62.02 99.71
E0074 Operations 3307.80 93.34 99.46
E0019 HR 3645.48 75.34 98.69
E0044 Marketing 4602.81 90.89 96.84
E0075 HR 2280.26 75.04 94.30

🏷️ KPI Tier Breakdown:

KPI Tier Count & Percentage
Platinum 11 karyawan (13.8%)
Gold 22 karyawan (27.5%)
Silver 26 karyawan (32.5%)
Bronze 21 karyawan (26.2%)

🏢 Department Summary:

Department Avg KPI Avg Salary Employee Count
Operations 76.87 5027 13
HR 74.28 4913 18
Marketing 73.30 4707 17
Finance 72.36 4911 16
Engineering 66.26 5101 16

9.3

9.3.1 📌 C02 — Automated Report

📊 Key Metrics — C02
Metric Value
Total Karyawan 80
Avg Salary $5,022
Avg KPI Score 76.43
Avg Performance 73.31
Top Performers 15
Max KPI 97.79
Min Salary $2,000
Max Salary $8,249

🥇 Top 5 Performers:

Employee ID Department Salary Performance KPI Score
E0095 Marketing 5828.61 56.24 97.79
E0142 Finance 6809.20 75.29 97.77
E0112 Finance 5873.98 63.51 97.22
E0155 Marketing 5687.39 56.00 96.39
E0157 Operations 6650.24 93.63 96.29

🏷️ KPI Tier Breakdown:

KPI Tier Count & Percentage
Platinum 15 karyawan (18.8%)
Gold 31 karyawan (38.8%)
Silver 23 karyawan (28.7%)
Bronze 11 karyawan (13.8%)

🏢 Department Summary:

Department Avg KPI Avg Salary Employee Count
Operations 79.69 4229 11
Finance 77.09 5477 19
Engineering 75.92 5440 16
HR 75.90 4853 15
Marketing 74.74 4810 19

9.4

9.4.1 📌 C03 — Automated Report

📊 Key Metrics — C03
Metric Value
Total Karyawan 80
Avg Salary $4,990
Avg KPI Score 74.02
Avg Performance 74.35
Top Performers 14
Max KPI 98.94
Min Salary $2,215
Max Salary $8,350

🥇 Top 5 Performers:

Employee ID Department Salary Performance KPI Score
E0194 Operations 3271.21 97.97 98.94
E0203 HR 4868.60 79.44 98.76
E0219 Operations 5195.30 79.28 97.33
E0239 Operations 4527.36 71.29 96.66
E0216 HR 4342.96 68.31 96.58

🏷️ KPI Tier Breakdown:

KPI Tier Count & Percentage
Platinum 14 karyawan (17.5%)
Gold 19 karyawan (23.8%)
Silver 32 karyawan (40%)
Bronze 15 karyawan (18.8%)

🏢 Department Summary:

Department Avg KPI Avg Salary Employee Count
Operations 80.54 5183 18
Engineering 74.18 5314 18
HR 73.70 4623 23
Finance 69.42 5098 9
Marketing 68.07 4837 12

9.5

9.5.1 📌 C04 — Automated Report

📊 Key Metrics — C04
Metric Value
Total Karyawan 80
Avg Salary $4,860
Avg KPI Score 73.43
Avg Performance 75.24
Top Performers 12
Max KPI 99.22
Min Salary $2,000
Max Salary $8,284

🥇 Top 5 Performers:

Employee ID Department Salary Performance KPI Score
E0305 Finance 6471.22 77.83 99.22
E0308 HR 5725.36 99.56 99.08
E0273 Operations 3487.90 74.52 98.65
E0284 Finance 7078.38 71.67 97.44
E0248 Engineering 5757.39 80.87 96.58

🏷️ KPI Tier Breakdown:

KPI Tier Count & Percentage
Platinum 12 karyawan (15%)
Gold 24 karyawan (30%)
Silver 27 karyawan (33.8%)
Bronze 17 karyawan (21.2%)

🏢 Department Summary:

Department Avg KPI Avg Salary Employee Count
Finance 78.06 5155 13
HR 73.94 5060 12
Engineering 73.71 4864 13
Marketing 72.70 4826 22
Operations 70.73 4582 20

9.6

9.6.1 📌 C05 — Automated Report

📊 Key Metrics — C05
Metric Value
Total Karyawan 80
Avg Salary $4,759
Avg KPI Score 73.37
Avg Performance 74.44
Top Performers 12
Max KPI 99.92
Min Salary $2,000
Max Salary $8,772

🥇 Top 5 Performers:

Employee ID Department Salary Performance KPI Score
E0323 Finance 6395.96 67.19 99.92
E0336 Engineering 2520.98 89.65 99.79
E0344 Engineering 5202.98 61.86 97.65
E0349 Finance 2121.47 65.93 95.82
E0339 Engineering 5385.21 95.54 95.59

🏷️ KPI Tier Breakdown:

KPI Tier Count & Percentage
Platinum 12 karyawan (15%)
Gold 24 karyawan (30%)
Silver 26 karyawan (32.5%)
Bronze 18 karyawan (22.5%)

🏢 Department Summary:

Department Avg KPI Avg Salary Employee Count
Engineering 79.69 4578 14
Finance 74.59 4622 16
Operations 71.65 4578 17
Marketing 71.29 5489 15
HR 70.71 4583 18

9.7

9.7.1 📌 C06 — Automated Report

📊 Key Metrics — C06
Metric Value
Total Karyawan 80
Avg Salary $5,102
Avg KPI Score 76.33
Avg Performance 75.51
Top Performers 15
Max KPI 98.38
Min Salary $2,000
Max Salary $9,073

🥇 Top 5 Performers:

Employee ID Department Salary Performance KPI Score
E0477 Marketing 3000.72 61.14 98.38
E0431 Operations 4932.59 72.28 98.05
E0479 HR 4738.47 70.08 98.05
E0447 Marketing 6179.19 83.02 97.77
E0421 Finance 4926.45 50.29 97.14

🏷️ KPI Tier Breakdown:

KPI Tier Count & Percentage
Platinum 15 karyawan (18.8%)
Gold 26 karyawan (32.5%)
Silver 26 karyawan (32.5%)
Bronze 13 karyawan (16.2%)

🏢 Department Summary:

Department Avg KPI Avg Salary Employee Count
Marketing 81.20 5607 15
Operations 77.73 4927 16
Engineering 76.06 6087 15
HR 75.53 4376 17
Finance 71.73 4677 17

9.8

9.8.1 📌 C07 — Automated Report

📊 Key Metrics — C07
Metric Value
Total Karyawan 80
Avg Salary $5,448
Avg KPI Score 75.66
Avg Performance 76.54
Top Performers 17
Max KPI 99.58
Min Salary $2,239
Max Salary $8,809

🥇 Top 5 Performers:

Employee ID Department Salary Performance KPI Score
E0556 HR 7311.07 74.50 99.58
E0523 HR 5941.86 70.68 99.30
E0489 Engineering 3911.32 54.13 98.98
E0499 HR 4040.96 70.72 97.72
E0514 Finance 4785.85 69.57 96.49

🏷️ KPI Tier Breakdown:

KPI Tier Count & Percentage
Platinum 17 karyawan (21.2%)
Gold 26 karyawan (32.5%)
Silver 20 karyawan (25%)
Bronze 17 karyawan (21.2%)

🏢 Department Summary:

Department Avg KPI Avg Salary Employee Count
HR 82.29 5212 14
Marketing 78.63 5757 9
Engineering 76.35 5703 22
Finance 74.90 5608 16
Operations 69.21 5049 19

9.9

Task 8 (BONUS) Selesai! Loop berhasil menghasilkan report otomatis untuk setiap perusahaan lengkap dengan tabel statistik, top performers, KPI tier, department summary, dan visualisasi.


10 Ringkasan Semua Task

📋 Ringkasan Seluruh Task Practicum
No Task Konsep Utama Status
1 Dynamic Multi-Formula Function Nested loop, validasi input, multi-plot ✅ Selesai |
2 Nested Simulation: Multi-Sales & Discounts Nested function, conditional discount, cumulative stats ✅ Selesai |
3 Multi-Level Performance Categorization Loop vector, persentase, bar+pie chart ✅ Selesai |
4 Multi-Company Dataset Simulation Nested loop, conditional KPI flag, summary table ✅ Selesai |
5 Monte Carlo Simulation: Pi & Probability Loop iterasi, estimasi π, probabilitas sub-square ✅ Selesai |
6 Advanced Data Transformation & Feature Engineering Loop normalisasi, z-score, feature engineering ✅ Selesai |
7 Mini Project: Company KPI Dashboard Dashboard lengkap, grouped bar, scatter+regression ✅ Selesai |
8 Automated Report Generation (BONUS) Loop otomatis, tabel+plot per perusahaan ✅ Selesai |

Submitted for Data Science Programming — Instructor: Bakti Siregar, M.Sc
07 April 2026, 00:41 WIB