Prakticum Week ~ 5

Risky Nurhidayah

RISKY NURHIDAYAH

ADVANCED PRACTICUM

FUNCTION & LOOPS + DATA SCIENCE ~ Week 5


Program Studi
Sains Data
Universitas
Institut Teknologi Sains Bandung
Dosen
Bakti Siregar, M.Sc., CSD

1 Introduction

This practicum focuses on the application of functions, loops, and conditional logic in the context of data science. Students are encouraged to build structured workflows, from raw data simulations to automated reporting through interactive visualizations.

2 Task 1 - Dynamic Multi-Formula Function

In this task, we build a function called compute formula that dynamically computes four types of mathematical formulas linear, quadratic, cubic, and exponential for input values x = 1:20.


2.1 Step 1 — Build & Compute Formulas

library(ggplot2)
library(plotly)
library(reshape2)
library(dplyr)

#  Function: compute_formula 
compute_formula <- function(x, formula) {
  valid_formulas <- c("linear", "quadratic", "cubic", "exponential")
  
  if (!(formula %in% valid_formulas)) {
    stop(paste("Formula tidak valid! Pilih:", paste(valid_formulas, collapse = ", ")))
  }
  
  result <- if (formula == "linear") {
    2 * x + 3
  } else if (formula == "quadratic") {
    x^2 + 2 * x + 1
  } else if (formula == "cubic") {
    x^3 - x^2 + x
  } else if (formula == "exponential") {
    exp(0.3 * x)
  }
  return(result)
}

# Nested Loop Computation 
x_vals   <- 1:20
formulas <- c("linear", "quadratic", "cubic", "exponential")
results  <- data.frame(x = x_vals)

for (f in formulas) {
  values <- c()
  for (x in x_vals) {
    values <- c(values, compute_formula(x, f))
  }
  results[[f]] <- values
}

head(results, 5)
##   x linear quadratic cubic exponential
## 1 1      5         4     1    1.349859
## 2 2      7         9     6    1.822119
## 3 3      9        16    21    2.459603
## 4 4     11        25    52    3.320117
## 5 5     13        36   105    4.481689

2.2 Step 2 — Visual Comparison

results_long <- melt(results, id.vars = "x", variable.name = "Formula", value.name = "Value")
results_long$Label <- factor(results_long$Formula,
  levels = c("linear","quadratic","cubic","exponential"),
  labels = c("Linear: 2x+3","Quadratic: x²+2x+1","Cubic: x³-x²+x","Exponential: e^0.3x"))

colors_plt <- c("#a78bfa","#38bdf8","#f472b6","#fb923c")

plot_ly(results_long, x = ~x, y = ~Value, color = ~Label, colors = colors_plt,
        type = "scatter", mode = "lines+markers") %>%
  layout(
    title = list(text = "<b>Dynamic Multi-Formula Analysis</b>", font = list(color = "white")),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e",
    xaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.1)"),
    yaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.1)"),
    legend = list(font = list(color = "white"))
  )

2.3 Conclusion — Task 1

The compute formula function evaluates four mathematical models for \(x = 1,2,\ldots,20\) using nested loops. The formulas include linear \(f(x)=2x+3\), quadratic \(f(x)=x^2+2x+1\), cubic \(f(x)=x^3-x^2+x\), and exponential \(f(x)=e^{0.3x}\).

At \(x=20\), the cubic function produces the largest value (approximately \(7600\)), followed by the exponential function (\(\approx 403\)) and the linear function (\(43\)). This shows that polynomial functions of higher degree can dominate growth within a finite range. The nested loop structure allows efficient computation of multiple formulas in a single process.

3 Task 2 - Nested Simulation: Multi Sales & Discounts

In this task, we build simulate_sales() that simulates daily sales data with conditional discounts and tracks cumulative sales per salesperson.

3.1 Step 1 — Simulation Logic

get_discount <- function(sales_amount) {
  if      (sales_amount >= 1000) return(0.20)
  else if (sales_amount >= 500)  return(0.10)
  else if (sales_amount >= 200)  return(0.05)
  else                           return(0.00)
}

simulate_sales <- function(n_salesperson, days) {
  set.seed(42)
  all_data <- data.frame()
  
  for (i in 1:n_salesperson) {
    sales_id   <- paste0("SP-", sprintf("%02d", i))
    cumulative <- 0
    for (d in 1:days) {
      sales_amount  <- round(runif(1, 100, 1500), 2)
      discount_rate <- get_discount(sales_amount)
      cumulative    <- cumulative + sales_amount
      all_data      <- rbind(all_data, data.frame(
        sales_id = sales_id, day = d, sales_amount = sales_amount,
        discount_rate = discount_rate, cumulative = round(cumulative, 2)
      ))
    }
  }
  return(all_data)
}

sales_data <- simulate_sales(n_salesperson = 3, days = 30)
head(sales_data, 5)
##   sales_id day sales_amount discount_rate cumulative
## 1    SP-01   1      1380.73           0.2    1380.73
## 2    SP-01   2      1411.91           0.2    2792.64
## 3    SP-01   3       500.60           0.1    3293.24
## 4    SP-01   4      1262.63           0.2    4555.87
## 5    SP-01   5       998.44           0.1    5554.31

3.2 Step 2 — Performance Tracking

plot_ly(sales_data, x = ~day, y = ~cumulative, color = ~sales_id,
        colors = c("#a78bfa", "#38bdf8", "#f472b6"),
        type = "scatter", mode = "lines+markers") %>%
  layout(
    title = list(text = "<b>Cumulative Sales Performance</b>", font = list(color = "white")),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e",
    xaxis = list(title = "Day", color = "white"),
    yaxis = list(title = "Cumulative USD", color = "white"),
    legend = list(font = list(color = "white"))
  )

3.3 Conclusion — Task 2

The simulate_sales() function models sales activity using nested loops across salespersons and days. A conditional function applies discount rates based on transaction value:

\[ d(s) = \begin{cases} 0.20 & s \geq 1000 \\ 0.10 & s \geq 500 \\ 0.05 & s \geq 200 \\ 0 & s < 200 \end{cases} \]

Cumulative sales follow:

\[ C_d = \sum_{i=1}^{d} s_i \]

After 30 days, total sales per salesperson range approximately between $18,000 and $22,000, depending on daily variation.

4 Task 3 - Multi Level Performance Categorization

In this task, categorize performance classifies sales into 5 levels and visualizes distribution.

4.1 Step 1 — Build the Function

library(plotly)
library(dplyr)

categorize_performance <- function(sales_amount) {
  categories <- c()
  for (sales in sales_amount) {
    if      (sales >= 1200) categories <- c(categories, "Excellent")
    else if (sales >= 900)  categories <- c(categories, "Very Good")
    else if (sales >= 600)  categories <- c(categories, "Good")
    else if (sales >= 300)  categories <- c(categories, "Average")
    else                    categories <- c(categories, "Poor")
  }
  return(categories)
}

4.2 Step 2 — Apply Function & Calculate Distribution

set.seed(42)
sales_vector <- round(runif(150, 100, 1500), 2)
performance_category <- categorize_performance(sales_vector)

perf_data <- data.frame(sales_amount = sales_vector, category = performance_category)

category_summary <- as.data.frame(table(perf_data$category))
colnames(category_summary) <- c("Category", "Count")
category_summary$Percentage <- round(category_summary$Count / sum(category_summary$Count) * 100, 1)
category_summary$Category <- factor(category_summary$Category,
                                    levels = c("Excellent","Very Good","Good","Average","Poor"))
category_summary <- category_summary[order(category_summary$Category), ]

print(category_summary)
##    Category Count Percentage
## 2 Excellent    35       23.3
## 5 Very Good    41       27.3
## 3      Good    29       19.3
## 1   Average    26       17.3
## 4      Poor    19       12.7

4.3 Step 3 — Interactive Bar Plot

cat_colors <- c("Excellent"="#34d399","Very Good"="#38bdf8",
                "Good"="#a78bfa","Average"="#fb923c","Poor"="#f472b6")

plot_ly(
  data = category_summary, x = ~Category, y = ~Count,
  type = "bar", color = ~Category, colors = unname(cat_colors),
  text  = ~paste0(Count, " records | ", Percentage, "%"),
  hovertemplate = "<b>%{x}</b><br>Count: %{y}<br>%{text}<extra></extra>",
  marker = list(line = list(color = "#0f0f1e", width = 1.5))
) %>% layout(
  title = list(text = "<b>Performance Category Distribution</b><br><sup>150 Observations</sup>",
               font = list(color = "white", size = 15)),
  xaxis = list(title = "Category", color = "white",
               gridcolor = "rgba(255,255,255,0.08)",
               categoryorder = "array",
               categoryarray = c("Excellent","Very Good","Good","Average","Poor")),
  yaxis = list(title = "Count", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
  showlegend = FALSE,
  paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)

4.4 Step 4 — Interactive Pie Chart

plot_ly(
  data = category_summary, labels = ~Category, values = ~Count,
  type = "pie",
  marker = list(colors = unname(cat_colors),
                line = list(color = "#0f0f1e", width = 2)),
  textinfo = "label+percent",
  textfont = list(color = "white", size = 13),
  hovertemplate = "<b>%{label}</b><br>Count: %{value}<br>%{percent}<extra></extra>",
  pull = c(0.05, 0, 0, 0, 0)
) %>% layout(
  title = list(text = "<b>Performance Category — Pie Chart</b>",
               font = list(color = "white", size = 15)),
  legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)",
                bordercolor = "rgba(255,255,255,0.15)", borderwidth = 1),
  paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)

4.5 Step 5 — Interactive Box Plot by Category

perf_data$category <- factor(perf_data$category,
  levels = c("Excellent","Very Good","Good","Average","Poor"))

plot_ly(
  data = perf_data, x = ~category, y = ~sales_amount,
  color = ~category, colors = unname(cat_colors),
  type = "box", boxpoints = "all", jitter = 0.3, pointpos = 0,
  marker = list(size = 4, opacity = 0.5),
  hovertemplate = "<b>%{x}</b><br>Sales: $%{y:,.2f}<extra></extra>"
) %>% layout(
  title = list(text = "<b>Sales Distribution per Performance Category</b>",
               font = list(color = "white", size = 15)),
  xaxis = list(title = "Category", color = "white",
               gridcolor = "rgba(255,255,255,0.08)"),
  yaxis = list(title = "Sales Amount (USD)", color = "white",
               gridcolor = "rgba(255,255,255,0.1)", tickprefix = "$"),
  showlegend = FALSE,
  paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)

4.6 Conclusion — Task 3

The categorize_performance() function classifies sales into five categories based on threshold values:

\[ \text{Category}(x) = \begin{cases} \text{Excellent} & x \geq 1200 \\ \text{Very Good} & x \geq 900 \\ \text{Good} & x \geq 600 \\ \text{Average} & x \geq 300 \\ \text{Poor} & x < 300 \end{cases} \]

Because the data is uniformly distributed over \([100,1500]\), each category covers a similar range, resulting in an approximately uniform distribution across all categories.

5 Task 4 - Multi Company Dataset Simulation

generate company data uses nested loops to generate company & employee data with KPI based conditional logic.

5.1 Step 1 — Build the Function

library(plotly)
library(dplyr)

generate_company_data <- function(n_company, n_employees) {
  set.seed(123)
  departments <- c("HR","Finance","Engineering","Marketing","Operations")
  all_data    <- data.frame()
  
  for (i in 1:n_company) {
    company_id <- paste0("COMP-", sprintf("%02d", i))
    for (j in 1:n_employees) {
      salary            <- round(runif(1, 3000, 15000), 2)
      department        <- sample(departments, 1)
      performance_score <- round(runif(1, 50, 100), 1)
      KPI_score         <- round(runif(1, 60, 100), 1)
      is_top            <- ifelse(KPI_score > 90, "Top Performer", "Regular")
      
      all_data <- rbind(all_data, data.frame(
        company_id = company_id,
        employee_id = paste0("EMP-", sprintf("%03d", j)),
        salary = salary, department = department,
        performance_score = performance_score,
        KPI_score = KPI_score, performer_status = is_top
      ))
    }
  }
  return(all_data)
}

cat(" Function generate_company_data() berhasil dibuat!\n")
##  Function generate_company_data() berhasil dibuat!
cat("   Conditional: KPI > 90 → Top Performer\n")
##    Conditional: KPI > 90 → Top Performer

5.2 Step 2 — Generate Dataset

company_data <- generate_company_data(n_company = 5, n_employees = 50)
write.csv(company_data, "company_data.csv", row.names = FALSE)

# Show a concise summary in the HTML output
print(data.frame(
  Message = "Dataset generated successfully!",
  Rows    = nrow(company_data),
  Columns = ncol(company_data)
))
##                           Message Rows Columns
## 1 Dataset generated successfully!  250       7

5.3 Step 3 — Summary per Company

library(knitr)
library(kableExtra)

company_summary <- company_data %>%
  group_by(company_id) %>%
  summarise(
    Avg_Salary      = round(mean(salary), 2),
    Avg_Performance = round(mean(performance_score), 1),
    Avg_KPI         = round(mean(KPI_score), 1),
    Max_KPI         = round(max(KPI_score), 1),
    .groups = "drop"
  )

# Tampilkan tabel rapi
company_summary %>%
  kable("html", caption = "Summary Per Company") %>%
  kable_styling(full_width = FALSE, bootstrap_options = c("striped","hover","condensed"),
                position = "center")
Summary Per Company
company_id Avg_Salary Avg_Performance Avg_KPI Max_KPI
COMP-01 8696.40 74.4 82.5 99.4
COMP-02 8345.26 74.5 79.4 98.9
COMP-03 9456.89 76.3 78.7 99.7
COMP-04 8620.38 71.4 78.7 97.7
COMP-05 9274.18 76.7 77.4 99.7

5.4 Step 4 — Interactive Summary Table

plot_ly(
  type = "table",
  header = list(
    values = list("<b>Company</b>","<b>Avg Salary</b>",
                  "<b>Avg Performance</b>","<b>Avg KPI</b>","<b>Max KPI</b>"),
    fill = list(color = "#1e1b4b"),
    font = list(color = "white", size = 12),
    align = "center",
    line = list(color = "#0f0f1e", width = 1)
  ),
  cells = list(
    values = list(
      company_summary$company_id,
      paste0("$", format(company_summary$Avg_Salary, big.mark = ",")),
      company_summary$Avg_Performance,
      company_summary$Avg_KPI,
      company_summary$Max_KPI
    ),
    fill = list(color = list("#0f0f1e","#111128")),
    font = list(color = "white", size = 11),
    align = "center",
    line = list(color = "#1e1b4b", width = 1)
  )
) %>%
  layout(paper_bgcolor = "#0f0f1e")

5.5 Step 5 — Interactive Bar: Avg Salary & KPI

comp_colors <- c("#a78bfa","#38bdf8","#f472b6","#fb923c","#34d399")

p4a <- plot_ly(company_summary, x = ~company_id, y = ~Avg_Salary,
               type = "bar", name = "Avg Salary",
               marker = list(color = "#a78bfa",
                             line = list(color = "#0f0f1e", width = 1)),
               hovertemplate = "<b>%{x}</b><br>Avg Salary: $%{y:,.2f}<extra></extra>")

p4b <- plot_ly(company_summary, x = ~company_id, y = ~Avg_KPI,
               type = "bar", name = "Avg KPI",
               marker = list(color = "#38bdf8",
                             line = list(color = "#0f0f1e", width = 1)),
               hovertemplate = "<b>%{x}</b><br>Avg KPI: %{y:.1f}<extra></extra>")

subplot(p4a, p4b, nrows = 1, shareX = TRUE, titleX = TRUE) %>%
  layout(
    title = list(text = "<b>Avg Salary & Avg KPI per Company</b>",
                 font = list(color = "white", size = 15)),
    xaxis  = list(title = "Company", color = "white",
                  gridcolor = "rgba(255,255,255,0.08)"),
    xaxis2 = list(title = "Company", color = "white",
                  gridcolor = "rgba(255,255,255,0.08)"),
    yaxis  = list(title = "Avg Salary (USD)", color = "white",
                  gridcolor = "rgba(255,255,255,0.1)", tickprefix = "$"),
    yaxis2 = list(title = "Avg KPI Score", color = "white",
                  gridcolor = "rgba(255,255,255,0.1)"),
    legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

5.6 Step 6 — Interactive Scatter: Performance vs KPI

p_scatter <- plot_ly()

for (i in seq_along(unique(company_data$company_id))) {
  comp <- unique(company_data$company_id)[i]
  df_c <- company_data[company_data$company_id == comp, ]
  
  p_scatter <- p_scatter %>% add_trace(
    data = df_c, x = ~performance_score, y = ~KPI_score,
    type = "scatter", mode = "markers", name = comp,
    marker = list(color = comp_colors[i], size = 8, opacity = 0.75,
                  symbol = ifelse(df_c$performer_status == "Top Performer",
                                  "star","circle"),
                  line = list(color = "white", width = 0.5)),
    hovertemplate = paste0("<b>", comp, "</b><br>",
                           "Employee: %{customdata}<br>",
                           "Performance: %{x:.1f}<br>KPI: %{y:.1f}<extra></extra>"),
    customdata = ~employee_id
  )
}

p_scatter %>%
  add_lines(x = c(50,100), y = c(90,90),
            line = list(color = "rgba(251,146,60,0.6)", dash = "dash", width = 1.5),
            name = "KPI = 90 threshold", showlegend = TRUE, hoverinfo = "skip") %>%
  layout(
    title = list(text = "<b>Performance vs KPI Score</b><br><sup>⭐ = Top Performer</sup>",
                 font = list(color = "white", size = 15)),
    xaxis = list(title = "Performance Score", color = "white",
                 gridcolor = "rgba(255,255,255,0.08)", range = c(45,105)),
    yaxis = list(title = "KPI Score", color = "white",
                 gridcolor = "rgba(255,255,255,0.08)", range = c(55,105)),
    legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

5.7 Conclusion — Task 4

The generate company data function simulates structured employee data using nested loops across companies and employees. Each observation includes salary, performance score, and KPI score.

The classification rule is defined as:

\[ \text{status} = \begin{cases} \text{Top Performer} & \text{if } KPI > 90 \\ \text{Regular} & \text{otherwise} \end{cases} \]

The generated dataset consists of \(5 \times 50 = 250\) observations, with salary ranging from $3,000 to $15,000. Summary statistics such as mean salary, mean performance, and maximum KPI are computed efficiently.

6 Task 5 - Monte Carlo Simulation: Pi & Probability

monte carlo pi estimates π using random point simulation and computes sub-square probability.

6.1 Step 1 — Build the Function

library(plotly)

monte_carlo_pi <- function(n_points) {
  set.seed(99)
  x <- c(); y <- c(); inside_circle <- c(); in_subsquare <- c()
  
  for (i in 1:n_points) {
    xi <- runif(1, -1, 1)
    yi <- runif(1, -1, 1)
    x  <- c(x, xi); y <- c(y, yi)
    inside_circle <- c(inside_circle, sqrt(xi^2 + yi^2) <= 1)
    in_subsquare  <- c(in_subsquare, xi >= 0 & xi <= 0.5 & yi >= 0 & yi <= 0.5)
  }
  
  pi_estimate    <- 4 * sum(inside_circle) / n_points
  prob_subsquare <- sum(in_subsquare) / n_points
  
  cat(sprintf("  estimate : %.6f\n", pi_estimate))
  cat(sprintf("   Actual π   : %.6f\n", pi))
  cat(sprintf("   Error      : %.4f%%\n", abs(pi_estimate - pi) / pi * 100))
  cat(sprintf("   P(sub-sq)  : %.6f (theoretical: 0.062500)\n", prob_subsquare))
  
  return(list(x = x, y = y, inside_circle = inside_circle,
              in_subsquare = in_subsquare, pi_estimate = pi_estimate,
              prob_subsquare = prob_subsquare, n_points = n_points))
}

6.2 Step 2 — Run Simulation

mc_result <- monte_carlo_pi(3000)
##   estimate : 3.145333
##    Actual π   : 3.141593
##    Error      : 0.1191%
##    P(sub-sq)  : 0.059000 (theoretical: 0.062500)

6.3 Step 3 — Interactive Scatter: Inside vs Outside Circle

x      <- mc_result$x; y      <- mc_result$y
inside <- mc_result$inside_circle; in_sub <- mc_result$in_subsquare
theta  <- seq(0, 2*pi, length.out = 300)

plot_ly() %>%
  add_trace(x = x[!inside], y = y[!inside], type = "scatter", mode = "markers",
            name = "Outside Circle",
            marker = list(color = "#f472b6", size = 3, opacity = 0.5),
            hovertemplate = "Outside<br>x:%{x:.3f} y:%{y:.3f}<extra></extra>") %>%
  add_trace(x = x[inside], y = y[inside], type = "scatter", mode = "markers",
            name = "Inside Circle",
            marker = list(color = "#a78bfa", size = 3, opacity = 0.6),
            hovertemplate = "Inside<br>x:%{x:.3f} y:%{y:.3f}<extra></extra>") %>%
  add_trace(x = x[in_sub], y = y[in_sub], type = "scatter", mode = "markers",
            name = "Sub-Square [0,0.5]²",
            marker = list(color = "#34d399", size = 4, symbol = "diamond"),
            hovertemplate = "Sub-sq<br>x:%{x:.3f} y:%{y:.3f}<extra></extra>") %>%
  add_trace(x = cos(theta), y = sin(theta), type = "scatter", mode = "lines",
            name = "Unit Circle",
            line = list(color = "#fb923c", width = 2, dash = "dot"),
            hoverinfo = "skip") %>%
  add_trace(x = c(0,0.5,0.5,0,0), y = c(0,0,0.5,0.5,0),
            type = "scatter", mode = "lines", name = "Sub-Square Border",
            line = list(color = "#34d399", width = 2, dash = "dash"),
            hoverinfo = "skip") %>%
  layout(
    title = list(
      text = paste0("<b>Monte Carlo — π Estimation</b><br>",
                    "<sup>n=", mc_result$n_points,
                    " | π≈", round(mc_result$pi_estimate, 5), "</sup>"),
      font = list(color = "white", size = 14)),
    xaxis = list(title = "x", color = "white", range = c(-1.1,1.1),
                 gridcolor = "rgba(255,255,255,0.08)", zeroline = FALSE),
    yaxis = list(title = "y", color = "white", range = c(-1.1,1.1),
                 gridcolor = "rgba(255,255,255,0.08)", zeroline = FALSE,
                 scaleanchor = "x"),
    legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

6.4 Step 4 — π Convergence Plot

set.seed(99)
iter_sizes   <- c(10, 50, 100, 250, 500, 1000, 2000, 3000, 5000)
pi_estimates <- c()
x_all <- runif(5000, -1, 1); y_all <- runif(5000, -1, 1)

for (n in iter_sizes) {
  pi_estimates <- c(pi_estimates,
                    4 * sum(sqrt(x_all[1:n]^2 + y_all[1:n]^2) <= 1) / n)
}

conv_df <- data.frame(n = iter_sizes, pi_estimate = pi_estimates, actual_pi = pi)

plot_ly(conv_df) %>%
  add_trace(x = ~n, y = ~pi_estimate, type = "scatter", mode = "lines+markers",
            name = "π Estimate",
            line = list(color = "#a78bfa", width = 2.5),
            marker = list(color = "#a78bfa", size = 8,
                          line = list(color = "white", width = 1)),
            hovertemplate = "n=%{x}<br>π≈%{y:.5f}<extra></extra>") %>%
  add_trace(x = ~n, y = ~actual_pi, type = "scatter", mode = "lines",
            name = "Actual π",
            line = list(color = "#fb923c", width = 2, dash = "dash"),
            hoverinfo = "skip") %>%
  layout(
    title = list(text = "<b>π Estimate Convergence</b>",
                 font = list(color = "white", size = 15)),
    xaxis = list(title = "n", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
    yaxis = list(title = "π Estimate", color = "white",
                 gridcolor = "rgba(255,255,255,0.08)", range = c(2.5,4.0)),
    legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

6.5 Step 5 — Probability Bar Chart

prob_df <- data.frame(Type = c("Estimated","Theoretical"),
                      Probability = c(mc_result$prob_subsquare, 0.0625))

plot_ly(prob_df, x = ~Type, y = ~Probability, type = "bar",
        marker = list(color = c("#38bdf8","#fb923c"),
                      line = list(color = "#0f0f1e", width = 1.5)),
        text = ~round(Probability, 5), textposition = "outside",
        textfont = list(color = "white"),
        hovertemplate = "<b>%{x}</b><br>P = %{y:.5f}<extra></extra>") %>%
  layout(
    title = list(text = "<b>P(point in sub-square [0,0.5]²)</b>",
                 font = list(color = "white", size = 14)),
    xaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.08)"),
    yaxis = list(title = "Probability", color = "white",
                 gridcolor = "rgba(255,255,255,0.1)",
                 range = c(0, max(prob_df$Probability) * 1.3)),
    showlegend = FALSE,
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

6.6 Conclusion — Task 5

The monte carlo pi function estimates \(\pi\) using random sampling within a square region:

\[ \hat{\pi} = 4 \cdot \frac{\text{number of points inside the circle}}{n} \]

This method is based on the ratio between the area of a unit circle and the enclosing square. As the number of points increases, the estimate converges to \(\pi\), consistent with the Law of Large Numbers. For example, with \(n=5000\), the estimate approaches \(3.143\) with very small error.

7 Task 6 - Advanced Data Transformation & Feature Engineering

normalize columns and z score() transform data with loop-based normalization plus new feature creation.

7.1 Step 1 — Build Transformation Functions

library(plotly)
library(dplyr)
library(knitr)

normalize_columns <- function(df) {
  df_norm <- df
  for (col in names(df)) {
    if (is.numeric(df[[col]])) {
      mn <- min(df[[col]], na.rm = TRUE)
      mx <- max(df[[col]], na.rm = TRUE)
      df_norm[[col]] <- if (mx - mn == 0) 0 else (df[[col]] - mn) / (mx - mn)
    }
  }
  return(df_norm)
}

z_score <- function(df) {
  df_z <- df
  for (col in names(df)) {
    if (is.numeric(df[[col]])) {
      m <- mean(df[[col]], na.rm = TRUE)
      s <- sd(df[[col]], na.rm = TRUE)
      df_z[[col]] <- if (s == 0) 0 else (df[[col]] - m) / s
    }
  }
  return(df_z)
}

kable(data.frame(
  Function = c("normalize_columns()", "z_score()"),
  Description = c("Min-Max Scaling [0–1]", "Standardization (Mean=0, SD=1)")
), caption = "Transformation Functions Overview")
Transformation Functions Overview
Function Description
normalize_columns() Min-Max Scaling [0–1]
z_score() Standardization (Mean=0, SD=1)

7.2 Step 2 — Prepare Dataset

set.seed(123)
departments <- c("HR","Finance","Engineering","Marketing","Operations")

raw_data <- data.frame(
  employee_id       = paste0("EMP-", sprintf("%03d", 1:250)),
  company_id        = rep(paste0("COMP-", sprintf("%02d", 1:5)), each = 50),
  salary            = round(runif(250, 3000, 15000), 2),
  performance_score = round(runif(250, 50, 100), 1),
  KPI_score         = round(runif(250, 60, 100), 1),
  department        = sample(departments, 250, replace = TRUE)
)

numeric_cols <- raw_data %>% select(salary, performance_score, KPI_score)

kable(data.frame(
  Total_Rows = nrow(numeric_cols),
  Numeric_Columns = ncol(numeric_cols)
), caption = "Dataset Summary")
Dataset Summary
Total_Rows Numeric_Columns
250 3

7.3 Step 3 — Apply Transformations

library(knitr)

# WAJIB ADA INI (kamu tadi hilangin)
df_normalized <- normalize_columns(numeric_cols)
df_zscore     <- z_score(numeric_cols)

# Min-Max Summary
minmax_summary <- summary(df_normalized$salary)
df_minmax <- data.frame(
  Statistic = names(minmax_summary),
  Value = as.numeric(minmax_summary)
)

kable(df_minmax, caption = "Summary Min-Max (Salary)")
Summary Min-Max (Salary)
Statistic Value
Min. 0.0000000
1st Qu. 0.2753457
Median 0.4849292
Mean 0.5102624
3rd Qu. 0.7301807
Max. 1.0000000
# Z-Score Summary
zscore_summary <- summary(df_zscore$salary)
df_zscore_tbl <- data.frame(
  Statistic = names(zscore_summary),
  Value = round(as.numeric(zscore_summary), 3)
)

kable(df_zscore_tbl, caption = "Summary Z-Score (Salary)")
Summary Z-Score (Salary)
Statistic Value
Min. -1.855
1st Qu. -0.854
Median -0.092
Mean 0.000
3rd Qu. 0.800
Max. 1.781

7.4 Step 4 — Feature Engineering

engineered_data <- raw_data %>%
  mutate(
    performance_category = case_when(
      performance_score >= 90 ~ "Excellent",
      performance_score >= 75 ~ "Good",
      performance_score >= 60 ~ "Average",
      TRUE ~ "Poor"
    ),
    salary_bracket = case_when(
      salary >= 12000 ~ "High (>=12k)",
      salary >= 8000  ~ "Mid (8k-12k)",
      salary >= 5000  ~ "Low-Mid (5k-8k)",
      TRUE ~ "Low (<5k)"
    ),
    KPI_tier = case_when(
      KPI_score >= 90 ~ "Top Tier",
      KPI_score >= 75 ~ "Mid Tier",
      TRUE ~ "Base Tier"
    )
  )

kable(as.data.frame(table(engineered_data$performance_category)),
      col.names = c("Performance Category", "Count"),
      caption = "Performance Category Distribution")
Performance Category Distribution
Performance Category Count
Average 82
Excellent 47
Good 66
Poor 55
kable(as.data.frame(table(engineered_data$salary_bracket)),
      col.names = c("Salary Bracket", "Count"),
      caption = "Salary Bracket Distribution")
Salary Bracket Distribution
Salary Bracket Count
High (>=12k) 57
Low-Mid (5k-8k) 76
Low (<5k) 31
Mid (8k-12k) 86
kable(as.data.frame(table(engineered_data$KPI_tier)),
      col.names = c("KPI Tier", "Count"),
      caption = "KPI Tier Distribution")
KPI Tier Distribution
KPI Tier Count
Base Tier 90
Mid Tier 87
Top Tier 73

7.5 Step 5 — Histogram: Before vs After

p6a <- plot_ly(alpha = 0.75) %>%
  add_histogram(x = numeric_cols$salary, name = "Original",
                marker = list(color = "#a78bfa"),
                hovertemplate = "Original<br>Range:%{x}<br>Count:%{y}<extra></extra>")

p6b <- plot_ly(alpha = 0.75) %>%
  add_histogram(x = df_normalized$salary, name = "Min-Max",
                marker = list(color = "#38bdf8"),
                hovertemplate = "Min-Max<br>Range:%{x:.3f}<br>Count:%{y}<extra></extra>")

p6c <- plot_ly(alpha = 0.75) %>%
  add_histogram(x = df_zscore$salary, name = "Z-Score",
                marker = list(color = "#f472b6"),
                hovertemplate = "Z-Score<br>Range:%{x:.3f}<br>Count:%{y}<extra></extra>")

subplot(p6a, p6b, p6c, nrows = 1, shareY = TRUE, titleX = TRUE) %>%
  layout(
    title = list(text = "<b>Salary Distribution: Before vs After Transformation</b>",
                 font = list(color = "white", size = 15)),
    showlegend = FALSE,
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

7.6 Step 6 — Boxplot: Before vs After

# FIX: gunakan add_trace type="box", hapus add_boxplot yang tidak valid
plot_ly() %>%
  add_trace(y = numeric_cols$salary, type = "box", name = "Original",
            marker = list(color = "#a78bfa"), line = list(color = "#a78bfa"),
            hovertemplate = "Original<br>Value:%{y:.2f}<extra></extra>") %>%
  add_trace(y = df_normalized$salary, type = "box", name = "Min-Max",
            marker = list(color = "#38bdf8"), line = list(color = "#38bdf8"),
            hovertemplate = "Min-Max<br>Value:%{y:.4f}<extra></extra>") %>%
  add_trace(y = df_zscore$salary, type = "box", name = "Z-Score",
            marker = list(color = "#f472b6"), line = list(color = "#f472b6"),
            hovertemplate = "Z-Score<br>Value:%{y:.4f}<extra></extra>") %>%
  layout(
    title = list(text = "<b>Salary: Before vs After Transformation</b>",
                 font = list(color = "white", size = 15)),
    yaxis = list(title = "Value", color = "white",
                 gridcolor = "rgba(255,255,255,0.1)"),
    xaxis = list(color = "white"),
    legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

7.7 Step 7 — New Features Distribution

# FIX: gunakan subplot agar kedua plot tampil sekaligus
perf_dist   <- as.data.frame(table(engineered_data$performance_category))
salary_dist <- as.data.frame(table(engineered_data$salary_bracket))
colnames(perf_dist)   <- c("Category","Count")
colnames(salary_dist) <- c("Category","Count")

p_perf <- plot_ly(perf_dist, x = ~Category, y = ~Count, type = "bar",
                  name = "Performance",
                  marker = list(color = c("#34d399","#38bdf8","#fb923c","#f472b6"),
                                line = list(color = "#0f0f1e", width = 1)),
                  hovertemplate = "<b>%{x}</b><br>Count: %{y}<extra></extra>") %>%
  layout(xaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.06)"),
         yaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.08)"),
         paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e")

p_sal <- plot_ly(salary_dist, x = ~Category, y = ~Count, type = "bar",
                 name = "Salary Bracket",
                 marker = list(color = c("#a78bfa","#38bdf8","#fb923c","#34d399"),
                               line = list(color = "#0f0f1e", width = 1)),
                 hovertemplate = "<b>%{x}</b><br>Count: %{y}<extra></extra>") %>%
  layout(xaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.06)"),
         yaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.08)"),
         paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e")

subplot(p_perf, p_sal, nrows = 1, shareY = FALSE, titleX = TRUE, margin = 0.06) %>%
  layout(
    title = list(text = "<b>New Features Distribution</b><br><sup>performance_category | salary_bracket</sup>",
                 font = list(color = "white", size = 15)),
    showlegend = FALSE,
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

7.8 Conclusion — Task 6

Two data transformation methods are applied:

Min-Max normalization: \[ x' = \frac{x - x_{\min}}{x_{\max} - x_{\min}} \]

Z-score standardization: \[ z = \frac{x - \mu}{\sigma} \]

Min-Max rescales data into the range \([0,1]\), while Z-score standardizes data to have mean \(0\) and standard deviation \(1\). These transformations improve comparability across variables.

Additional categorical features are created using conditional logic, including performance categories, salary brackets, and KPI tiers.

8 Task 7 - Mini Project: Company KPI Dashboard & Simulation

Generate dataset for 7 companies with 50–200 employees. Full KPI dashboard with advanced visualizations.

8.1 Step 1 — Generate Dataset

library(dplyr)
library(plotly)

set.seed(123)
n_companies           <- 7
employees_per_company <- sample(50:200, n_companies, replace = TRUE)
departments           <- c("HR","Finance","Engineering","Marketing","Operations")
company_data7         <- data.frame()

for (i in 1:n_companies) {
  n_emp <- employees_per_company[i]
  temp  <- data.frame(
    employee_id       = paste0("EMP-", sprintf("%04d", seq_len(n_emp) + (i*1000))),
    company_id        = paste0("COMP-", sprintf("%02d", i)),
    salary            = round(runif(n_emp, 3000, 15000), 2),
    performance_score = round(runif(n_emp, 50, 100), 1),
    KPI_score         = round(runif(n_emp, 60, 100), 1),
    department        = sample(departments, n_emp, replace = TRUE)
  )
  company_data7 <- rbind(company_data7, temp)
}

kable(data.frame(
  Total_Rows = nrow(company_data7),
  Total_Companies = dplyr::n_distinct(company_data7$company_id)
))
Total_Rows Total_Companies
790 7
kable(head(company_data7, 5), caption = "Sample Data (First 5 Rows)")
Sample Data (First 5 Rows)
employee_id company_id salary performance_score KPI_score department
EMP-1001 COMP-01 13797.90 67.6 76.4 Engineering
EMP-1002 COMP-01 5953.05 55.6 60.4 Marketing
EMP-1003 COMP-01 3504.71 62.2 67.4 Engineering
EMP-1004 COMP-01 6935.05 83.4 93.7 Operations
EMP-1005 COMP-01 14454.04 70.9 69.2 Marketing

8.2 Step 2 — KPI Tier (Loop-Based)

library(knitr)
company_data7$KPI_tier <- ""
for (i in 1:nrow(company_data7)) {
  s <- company_data7$KPI_score[i]
  company_data7$KPI_tier[i] <- if (s >= 90) "Top Tier" else
                                if (s >= 75) "Mid Tier" else "Base Tier"
}
company_data7$KPI_tier <- factor(company_data7$KPI_tier,
  levels = c("Top Tier","Mid Tier","Base Tier"))
print(table(company_data7$KPI_tier))
## 
##  Top Tier  Mid Tier Base Tier 
##       192       300       298
kpi_table <- as.data.frame(table(company_data7$KPI_tier))
colnames(kpi_table) <- c("KPI Tier", "Count")

kable(kpi_table, caption = "KPI Tier Distribution")
KPI Tier Distribution
KPI Tier Count
Top Tier 192
Mid Tier 300
Base Tier 298
kable(kpi_table,
      caption = "KPI Tier Distribution",
      align = "c")
KPI Tier Distribution
KPI Tier Count
Top Tier 192
Mid Tier 300
Base Tier 298

8.3 Step 3 — Company Summary

company_summary7 <- company_data7 %>%
  group_by(company_id) %>%
  summarise(
    Total_Employees = n(),
    Avg_Salary      = round(mean(salary), 2),
    Avg_KPI         = round(mean(KPI_score), 1),
    Avg_Performance = round(mean(performance_score), 1),
    Top_Performers  = sum(KPI_score >= 90),
    .groups = "drop"
  ) %>%
  mutate(Top_Pct = round(Top_Performers / Total_Employees * 100, 1))

print(company_summary7)
## # A tibble: 7 × 7
##   company_id Total_Employees Avg_Salary Avg_KPI Avg_Performance Top_Performers
##   <chr>                <int>      <dbl>   <dbl>           <dbl>          <int>
## 1 COMP-01                 63      8925.    79.6            76.2             10
## 2 COMP-02                 99      8871.    80.9            74.1             30
## 3 COMP-03                167      9021.    79.1            73.9             39
## 4 COMP-04                 92      9048.    81.1            77.2             23
## 5 COMP-05                 63      9020.    79.6            74.7             14
## 6 COMP-06                167      8900.    79.7            75.1             43
## 7 COMP-07                139      8922.    79.9            75               33
## # ℹ 1 more variable: Top_Pct <dbl>
kable(company_summary7, caption = "Company KPI Summary")
Company KPI Summary
company_id Total_Employees Avg_Salary Avg_KPI Avg_Performance Top_Performers Top_Pct
COMP-01 63 8924.97 79.6 76.2 10 15.9
COMP-02 99 8870.87 80.9 74.1 30 30.3
COMP-03 167 9021.36 79.1 73.9 39 23.4
COMP-04 92 9048.25 81.1 77.2 23 25.0
COMP-05 63 9020.40 79.6 74.7 14 22.2
COMP-06 167 8900.16 79.7 75.1 43 25.7
COMP-07 139 8921.98 79.9 75.0 33 23.7

8.4 Step 4 — Interactive Summary Table

plot_ly(type = "table",
  header = list(
    values = list("<b>Company</b>","<b>Employees</b>","<b>Avg Salary</b>",
                  "<b>Avg KPI</b>","<b>Avg Perf</b>","<b>Top Performers</b>","<b>Top %</b>"),
    fill = list(color = "#1e1b4b"), font = list(color = "white", size = 12),
    align = "center", line = list(color = "#0f0f1e", width = 1)
  ),
  cells = list(
    values = list(
      company_summary7$company_id, company_summary7$Total_Employees,
      paste0("$", format(round(company_summary7$Avg_Salary), big.mark = ",")),
      company_summary7$Avg_KPI, company_summary7$Avg_Performance,
      company_summary7$Top_Performers, paste0(company_summary7$Top_Pct, "%")
    ),
    fill = list(color = list("#0f0f1e","#111128")),
    font = list(color = "white", size = 11),
    align = "center", line = list(color = "#1e1b4b", width = 1)
  )
) %>% layout(paper_bgcolor = "#0f0f1e")

8.5 Step 5 — Top Performers Table

top_performers7 <- company_data7 %>%
  filter(KPI_score >= 90) %>%
  arrange(desc(KPI_score)) %>%
  select(employee_id, company_id, department, KPI_score, performance_score, salary)

kable(data.frame(Total_Top_Performers = nrow(top_performers7)))
Total_Top_Performers
192
kable(head(top_performers7, 10),
      caption = "Top 10 Performers",
      digits = 2)
Top 10 Performers
employee_id company_id department KPI_score performance_score salary
EMP-3106 COMP-03 Operations 100.0 62.1 11318.39
EMP-6025 COMP-06 Finance 100.0 66.0 7441.39
EMP-5040 COMP-05 Finance 99.9 71.2 11938.03
EMP-4058 COMP-04 Operations 99.8 71.8 5140.69
EMP-6074 COMP-06 HR 99.8 76.5 11564.85
EMP-7096 COMP-07 Marketing 99.8 80.4 5721.67
EMP-4003 COMP-04 Marketing 99.7 87.3 4003.88
EMP-2030 COMP-02 Operations 99.6 93.1 10406.82
EMP-5042 COMP-05 HR 99.6 84.0 5097.30
EMP-3100 COMP-03 HR 99.5 71.3 5037.23
plot_ly(type = "table",
  header = list(
    values = list("<b>Employee</b>","<b>Company</b>","<b>Department</b>",
                  "<b>KPI</b>","<b>Performance</b>","<b>Salary</b>"),
    fill = list(color = "#1e1b4b"), font = list(color = "white", size = 12),
    align = "center", line = list(color = "#0f0f1e", width = 1)
  ),
  cells = list(
    values = list(
      head(top_performers7$employee_id, 15),
      head(top_performers7$company_id, 15),
      head(top_performers7$department, 15),
      head(top_performers7$KPI_score, 15),
      head(top_performers7$performance_score, 15),
      paste0("$", format(round(head(top_performers7$salary,15)), big.mark=","))
    ),
    fill = list(color = list("#0f0f1e","#111128")),
    font = list(color = "white", size = 11),
    align = "center", line = list(color = "#1e1b4b", width = 1)
  )
) %>% layout(paper_bgcolor = "#0f0f1e")

8.6 Step 6 — Salary Distribution (Histogram)

comp_colors7 <- c("#a78bfa","#38bdf8","#f472b6","#fb923c","#34d399","#fbbf24","#60a5fa")

p7_hist <- plot_ly()
for (i in seq_along(unique(company_data7$company_id))) {
  comp <- unique(company_data7$company_id)[i]
  df_c <- company_data7[company_data7$company_id == comp, ]
  p7_hist <- p7_hist %>% add_histogram(
    x = df_c$salary, name = comp, nbinsx = 25, opacity = 0.6,
    marker = list(color = comp_colors7[i],
                  line = list(color = "#0f0f1e", width = 0.5)),
    hovertemplate = paste0("<b>", comp, "</b><br>Range:%{x}<br>Count:%{y}<extra></extra>")
  )
}
p7_hist %>% layout(
  title = list(text = "<b>Salary Distribution by Company</b>",
               font = list(color = "white", size = 15)),
  barmode = "overlay",
  xaxis = list(title = "Salary (USD)", color = "white",
               gridcolor = "rgba(255,255,255,0.08)", tickprefix = "$"),
  yaxis = list(title = "Count", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
  legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
  paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)

8.7 Step 7 — Grouped Bar: Avg KPI & Top Performers

p7a <- plot_ly(company_summary7, x = ~company_id, y = ~Avg_KPI, type = "bar",
               name = "Avg KPI",
               marker = list(color = "#a78bfa", line = list(color = "#0f0f1e", width=1)),
               text = ~round(Avg_KPI,1), textposition = "outside",
               textfont = list(color = "white"),
               hovertemplate = "<b>%{x}</b><br>Avg KPI: %{y:.1f}<extra></extra>")

p7b <- plot_ly(company_summary7, x = ~company_id, y = ~Top_Performers, type = "bar",
               name = "Top Performers",
               marker = list(color = "#34d399", line = list(color = "#0f0f1e", width=1)),
               text = ~Top_Performers, textposition = "outside",
               textfont = list(color = "white"),
               hovertemplate = "<b>%{x}</b><br>Top Performers: %{y}<extra></extra>")

subplot(p7a, p7b, nrows = 1, shareX = FALSE, titleX = TRUE) %>%
  layout(
    title = list(text = "<b>Avg KPI & Top Performers per Company</b>",
                 font = list(color = "white", size = 15)),
    xaxis  = list(title = "Company", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
    xaxis2 = list(title = "Company", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
    yaxis  = list(title = "Avg KPI", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
    yaxis2 = list(title = "Top Performers", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
    legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

8.8 Step 8 — Scatter: Salary vs KPI + Regression Line

lm_model <- lm(KPI_score ~ salary, data = company_data7)
reg_x    <- seq(min(company_data7$salary), max(company_data7$salary), length.out = 200)
reg_y    <- predict(lm_model, newdata = data.frame(salary = reg_x))

p7_scatter <- plot_ly()
for (i in seq_along(unique(company_data7$company_id))) {
  comp <- unique(company_data7$company_id)[i]
  df_c <- company_data7[company_data7$company_id == comp, ]
  p7_scatter <- p7_scatter %>% add_trace(
    data = df_c, x = ~salary, y = ~KPI_score,
    type = "scatter", mode = "markers", name = comp,
    marker = list(color = comp_colors7[i], size = 6, opacity = 0.65,
                  line = list(color = "white", width = 0.4)),
    hovertemplate = paste0("<b>", comp, "</b><br>",
                           "Employee: %{customdata}<br>",
                           "Salary: $%{x:,.0f}<br>KPI: %{y:.1f}<extra></extra>"),
    customdata = ~employee_id
  )
}

p7_scatter %>%
  add_trace(x = reg_x, y = reg_y, type = "scatter", mode = "lines",
            name = "Regression Line",
            line = list(color = "#fb923c", width = 2.5, dash = "dash"),
            hoverinfo = "skip") %>%
  layout(
    title = list(text = "<b>Salary vs KPI Score + Regression Line</b>",
                 font = list(color = "white", size = 15)),
    xaxis = list(title = "Salary (USD)", color = "white",
                 gridcolor = "rgba(255,255,255,0.08)", tickprefix = "$"),
    yaxis = list(title = "KPI Score", color = "white",
                 gridcolor = "rgba(255,255,255,0.08)"),
    legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

8.9 Step 9 — Department Analysis

dept_summary7 <- company_data7 %>%
  group_by(company_id, department) %>%
  summarise(avg_KPI = round(mean(KPI_score),1), count = n(), .groups = "drop")

dept_colors <- c("HR"="#a78bfa","Finance"="#38bdf8","Engineering"="#f472b6",
                 "Marketing"="#fb923c","Operations"="#34d399")

p7_dept <- plot_ly()
for (dept in unique(dept_summary7$department)) {
  df_d <- dept_summary7[dept_summary7$department == dept, ]
  p7_dept <- p7_dept %>% add_trace(
    data = df_d, x = ~company_id, y = ~avg_KPI, type = "bar", name = dept,
    marker = list(color = dept_colors[dept],
                  line = list(color = "#0f0f1e", width = 0.8)),
    hovertemplate = paste0("<b>", dept, " — %{x}</b><br>",
                           "Avg KPI: %{y:.1f}<extra></extra>")
  )
}

p7_dept %>% layout(
  title = list(text = "<b>Avg KPI per Department per Company</b>",
               font = list(color = "white", size = 15)),
  barmode = "group",
  xaxis = list(title = "Company", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
  yaxis = list(title = "Avg KPI", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
  legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
  paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)

8.10 Step 10 — KPI Tier Pie Chart

kpi_dist7 <- company_data7 %>%
  count(KPI_tier) %>% rename(Tier = KPI_tier, Count = n) %>%
  mutate(Pct = round(Count / sum(Count) * 100, 1))

tier_colors <- c("Top Tier"="#34d399","Mid Tier"="#a78bfa","Base Tier"="#f472b6")

plot_ly(kpi_dist7, labels = ~Tier, values = ~Count, type = "pie",
  marker = list(colors = unname(tier_colors[as.character(kpi_dist7$Tier)]),
                line = list(color = "#0f0f1e", width = 2)),
  textinfo = "label+percent", textfont = list(color = "white", size = 13),
  hovertemplate = "<b>%{label}</b><br>Count:%{value}<br>%{percent}<extra></extra>",
  pull = c(0.05, 0, 0)
) %>% layout(
  title = list(text = "<b>KPI Tier Distribution</b>",
               font = list(color = "white", size = 15)),
  legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
  paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)

8.11 Conclusion — Task 7

A KPI dashboard is generated for multiple companies using simulated data. The relationship between salary and KPI is modeled as:

\[ KPI = \beta_0 + \beta_1 \cdot salary + \varepsilon \]

The resulting regression shows a weak relationship, indicating that salary alone is not a strong predictor of employee performance. Various visualizations support this analysis.

9 Task 8 (Bonus) - Automated Report Generation

Use functions + loops to auto-generate summary reports for all companies.

9.1 Step 1 — Build Report Function

library(dplyr)
library(plotly)

generate_company_report <- function(data, company_name) {
  df        <- data %>% filter(company_id == company_name)
  total_emp <- nrow(df)
  
  df$KPI_tier <- ""
  for (i in 1:nrow(df)) {
    s <- df$KPI_score[i]
    df$KPI_tier[i] <- if (s >= 90) "Top Tier" else if (s >= 75) "Mid Tier" else "Base Tier"
  }
  
  return(list(
    company = company_name, data = df,
    stats = list(
      total_emp  = total_emp,
      avg_salary = round(mean(df$salary), 2),
      avg_kpi    = round(mean(df$KPI_score), 1),
      avg_perf   = round(mean(df$performance_score), 1),
      top_count  = sum(df$KPI_score >= 90),
      top_pct    = round(sum(df$KPI_score >= 90) / total_emp * 100, 1),
      max_kpi    = round(max(df$KPI_score), 1),
      min_salary = round(min(df$salary), 2),
      max_salary = round(max(df$salary), 2)
    )
  ))
}

cat(" Function generate_company_report() berhasil dibuat!\n")
##  Function generate_company_report() berhasil dibuat!

9.2 Step 2 — Generate All Reports

# FIX: gunakan company_data7 (bukan company_data)
all_companies <- unique(company_data7$company_id)
all_reports   <- list()

for (comp in all_companies) {
  all_reports[[comp]] <- generate_company_report(company_data7, comp)
  cat(sprintf(" %s | %d employees | Avg KPI: %.1f\n",
              comp,
              all_reports[[comp]]$stats$total_emp,
              all_reports[[comp]]$stats$avg_kpi))
}
##  COMP-01 | 63 employees | Avg KPI: 79.6
##  COMP-02 | 99 employees | Avg KPI: 80.9
##  COMP-03 | 167 employees | Avg KPI: 79.1
##  COMP-04 | 92 employees | Avg KPI: 81.1
##  COMP-05 | 63 employees | Avg KPI: 79.6
##  COMP-06 | 167 employees | Avg KPI: 79.7
##  COMP-07 | 139 employees | Avg KPI: 79.9
report_log <- data.frame()

for (comp in all_companies) {
  all_reports[[comp]] <- generate_company_report(company_data7, comp)
  
  report_log <- rbind(report_log, data.frame(
    Company = comp,
    Employees = all_reports[[comp]]$stats$total_emp,
    Avg_KPI = all_reports[[comp]]$stats$avg_kpi
  ))
}

kable(report_log, caption = "Generated Reports Summary")
Generated Reports Summary
Company Employees Avg_KPI
COMP-01 63 79.6
COMP-02 99 80.9
COMP-03 167 79.1
COMP-04 92 81.1
COMP-05 63 79.6
COMP-06 167 79.7
COMP-07 139 79.9

9.3 Step 3 — Automated Summary Table

summary_compiled <- data.frame()

for (comp in names(all_reports)) {
  s <- all_reports[[comp]]$stats
  summary_compiled <- rbind(summary_compiled, data.frame(
    Company        = comp,
    Employees      = s$total_emp,
    Avg_Salary     = s$avg_salary,
    Avg_KPI        = s$avg_kpi,
    Avg_Perf       = s$avg_perf,
    Top_Performers = s$top_count,
    Top_Pct        = paste0(s$top_pct, "%"),
    Max_KPI        = s$max_kpi,
    Salary_Range   = paste0("$", format(round(s$min_salary), big.mark=","),
                            "–$", format(round(s$max_salary), big.mark=","))
  ))
}

plot_ly(type = "table",
  header = list(
    values = list("<b>Company</b>","<b>Employees</b>","<b>Avg Salary</b>",
                  "<b>Avg KPI</b>","<b>Avg Perf</b>","<b>Top</b>","<b>Top%</b>",
                  "<b>Max KPI</b>","<b>Salary Range</b>"),
    fill = list(color = "#1e1b4b"), font = list(color = "white", size = 11),
    align = "center", line = list(color = "#0f0f1e", width = 1)
  ),
  cells = list(
    values = list(
      summary_compiled$Company, summary_compiled$Employees,
      paste0("$", format(round(summary_compiled$Avg_Salary), big.mark=",")),
      summary_compiled$Avg_KPI, summary_compiled$Avg_Perf,
      summary_compiled$Top_Performers, summary_compiled$Top_Pct,
      summary_compiled$Max_KPI, summary_compiled$Salary_Range
    ),
    fill = list(color = list("#0f0f1e","#111128")),
    font = list(color = "white", size = 11),
    align = "center", line = list(color = "#1e1b4b", width = 1)
  )
) %>% layout(paper_bgcolor = "#0f0f1e")
kable(summary_compiled, caption = "Compiled Summary (Static View)")
Compiled Summary (Static View)
Company Employees Avg_Salary Avg_KPI Avg_Perf Top_Performers Top_Pct Max_KPI Salary_Range
COMP-01 63 8924.97 79.6 76.2 10 15.9% 99.4 $3,008–$14,931
COMP-02 99 8870.87 80.9 74.1 30 30.3% 99.6 $3,099–$14,993
COMP-03 167 9021.36 79.1 73.9 39 23.4% 100.0 $3,047–$14,901
COMP-04 92 9048.25 81.1 77.2 23 25% 99.8 $3,073–$14,863
COMP-05 63 9020.40 79.6 74.7 14 22.2% 99.9 $3,042–$14,972
COMP-06 167 8900.16 79.7 75.1 43 25.7% 100.0 $3,060–$14,944
COMP-07 139 8921.98 79.9 75.0 33 23.7% 99.8 $3,192–$14,850

9.4 Step 4 — Automated KPI Bar Chart

comp_colors8 <- c("#a78bfa","#38bdf8","#f472b6","#fb923c","#34d399","#fbbf24","#60a5fa")

plot_ly(summary_compiled, x = ~Company, y = ~Avg_KPI, type = "bar",
        color = ~Company, colors = comp_colors8,
        text  = ~paste0(Avg_KPI, "\n(Top:", Top_Performers, ")"),
        textposition = "outside", textfont = list(color = "white", size = 10),
        hovertemplate = paste0("<b>%{x}</b><br>Avg KPI: %{y:.1f}<br>",
                               "Employees: ", summary_compiled$Employees,
                               "<extra></extra>"),
        marker = list(line = list(color = "#0f0f1e", width = 1))
) %>% layout(
  title = list(text = "<b>Automated KPI Report — All Companies</b>",
               font = list(color = "white", size = 15)),
  xaxis = list(title = "Company", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
  yaxis = list(title = "Avg KPI Score", color = "white",
               gridcolor = "rgba(255,255,255,0.1)",
               range = c(0, max(summary_compiled$Avg_KPI) * 1.2)),
  showlegend = FALSE,
  paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)

9.5 Step 5 — Automated Box Plot (Loop)

p8_box <- plot_ly()

for (i in seq_along(names(all_reports))) {
  comp   <- names(all_reports)[i]
  df_rep <- all_reports[[comp]]$data
  
  p8_box <- p8_box %>% add_trace(
    y = df_rep$KPI_score, type = "box", name = comp,
    marker    = list(color = comp_colors8[i], size = 4),
    line      = list(color = comp_colors8[i]),
    fillcolor = paste0(comp_colors8[i], "40"),
    boxpoints = "outliers",
    hovertemplate = paste0("<b>", comp, "</b><br>KPI:%{y:.1f}<extra></extra>")
  )
}

p8_box %>%
  add_lines(x = c(-0.5, length(all_reports) - 0.5), y = c(90, 90),
            line = list(color = "rgba(251,191,36,0.7)", dash = "dash", width = 1.5),
            name = "Top Tier (90)", hoverinfo = "skip") %>%
  layout(
    title = list(text = "<b>KPI Distribution per Company (Auto-Generated)</b><br><sup>Garis kuning = Top Tier ≥ 90</sup>",
                 font = list(color = "white", size = 15)),
    xaxis = list(title = "Company", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
    yaxis = list(title = "KPI Score", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
    legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

9.6 Step 6 — Export to CSV

# FIX: gunakan company_data7
write.csv(summary_compiled, "automated_report_summary.csv", row.names = FALSE)
write.csv(company_data7,    "company_data_full.csv",        row.names = FALSE)

kable(data.frame(
  Files = c("automated_report_summary.csv", "company_data_full.csv")
), caption = "Exported Files")
Exported Files
Files
automated_report_summary.csv
company_data_full.csv

9.7 Conclusion — Task 8 (Bonus)

The generate company report function automates the reporting pipeline using loops.

All results, including summary statistics, visualizations, and exported files, are generated programmatically. This approach demonstrates a scalable workflow where adding new companies only requires adjusting input parameters without modifying the core logic.