Prakticum Week ~ 5

RISKY NURHIDAYAH

ADVANCED PRACTICUM

FUNCTION & LOOPS + DATA SCIENCE ~ Week 5

Program Studi
Sains Data

Universitas
Institut Teknologi Sains Bandung

Dosen
Bakti Siregar, M.Sc., CSD

1 Introduction

This practicum focuses on the application of functions, loops, and conditional logic in the context of data science. Students are encouraged to build structured workflows, from raw data simulations to automated reporting through interactive visualizations.

2 Task 1 - Dynamic Multi-Formula Function

In this task, we build a function called compute formula that dynamically computes four types of mathematical formulas linear, quadratic, cubic, and exponential for input values x = 1:20.

2.1 Step 1 — Build & Compute Formulas

library(ggplot2)
library(plotly)
library(reshape2)
library(dplyr)

#  Function: compute_formula 
compute_formula <- function(x, formula) {
  valid_formulas <- c("linear", "quadratic", "cubic", "exponential")
  
  if (!(formula %in% valid_formulas)) {
    stop(paste("Formula tidak valid! Pilih:", paste(valid_formulas, collapse = ", ")))
  }
  
  result <- if (formula == "linear") {
    2 * x + 3
  } else if (formula == "quadratic") {
    x^2 + 2 * x + 1
  } else if (formula == "cubic") {
    x^3 - x^2 + x
  } else if (formula == "exponential") {
    exp(0.3 * x)
  }
  return(result)
}

# Nested Loop Computation 
x_vals   <- 1:20
formulas <- c("linear", "quadratic", "cubic", "exponential")
results  <- data.frame(x = x_vals)

for (f in formulas) {
  values <- c()
  for (x in x_vals) {
    values <- c(values, compute_formula(x, f))
  }
  results[[f]] <- values
}

head(results, 5)

##   x linear quadratic cubic exponential
## 1 1      5         4     1    1.349859
## 2 2      7         9     6    1.822119
## 3 3      9        16    21    2.459603
## 4 4     11        25    52    3.320117
## 5 5     13        36   105    4.481689

2.2 Step 2 — Visual Comparison

results_long <- melt(results, id.vars = "x", variable.name = "Formula", value.name = "Value")
results_long$Label <- factor(results_long$Formula,
  levels = c("linear","quadratic","cubic","exponential"),
  labels = c("Linear: 2x+3","Quadratic: x²+2x+1","Cubic: x³-x²+x","Exponential: e^0.3x"))

colors_plt <- c("#a78bfa","#38bdf8","#f472b6","#fb923c")

plot_ly(results_long, x = ~x, y = ~Value, color = ~Label, colors = colors_plt,
        type = "scatter", mode = "lines+markers") %>%
  layout(
    title = list(text = "<b>Dynamic Multi-Formula Analysis</b>", font = list(color = "white")),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e",
    xaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.1)"),
    yaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.1)"),
    legend = list(font = list(color = "white"))
  )

2.3 Conclusion — Task 1

The compute formula function evaluates four mathematical models for $x = 1,2,\ldots,20$ using nested loops. The formulas include linear $f(x)=2x+3$, quadratic $f(x)=x^2+2x+1$, cubic $f(x)=x^3-x^2+x$, and exponential $f(x)=e^{0.3x}$.

At $x=20$, the cubic function produces the largest value (approximately $7600$), followed by the exponential function ($\approx 403$) and the linear function ($43$). This shows that polynomial functions of higher degree can dominate growth within a finite range. The nested loop structure allows efficient computation of multiple formulas in a single process.

3 Task 2 - Nested Simulation: Multi Sales & Discounts

In this task, we build simulate_sales() that simulates daily sales data with conditional discounts and tracks cumulative sales per salesperson.

3.1 Step 1 — Simulation Logic

get_discount <- function(sales_amount) {
  if      (sales_amount >= 1000) return(0.20)
  else if (sales_amount >= 500)  return(0.10)
  else if (sales_amount >= 200)  return(0.05)
  else                           return(0.00)
}

simulate_sales <- function(n_salesperson, days) {
  set.seed(42)
  all_data <- data.frame()
  
  for (i in 1:n_salesperson) {
    sales_id   <- paste0("SP-", sprintf("%02d", i))
    cumulative <- 0
    for (d in 1:days) {
      sales_amount  <- round(runif(1, 100, 1500), 2)
      discount_rate <- get_discount(sales_amount)
      cumulative    <- cumulative + sales_amount
      all_data      <- rbind(all_data, data.frame(
        sales_id = sales_id, day = d, sales_amount = sales_amount,
        discount_rate = discount_rate, cumulative = round(cumulative, 2)
      ))
    }
  }
  return(all_data)
}

sales_data <- simulate_sales(n_salesperson = 3, days = 30)
head(sales_data, 5)

##   sales_id day sales_amount discount_rate cumulative
## 1    SP-01   1      1380.73           0.2    1380.73
## 2    SP-01   2      1411.91           0.2    2792.64
## 3    SP-01   3       500.60           0.1    3293.24
## 4    SP-01   4      1262.63           0.2    4555.87
## 5    SP-01   5       998.44           0.1    5554.31

3.2 Step 2 — Performance Tracking

plot_ly(sales_data, x = ~day, y = ~cumulative, color = ~sales_id,
        colors = c("#a78bfa", "#38bdf8", "#f472b6"),
        type = "scatter", mode = "lines+markers") %>%
  layout(
    title = list(text = "<b>Cumulative Sales Performance</b>", font = list(color = "white")),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e",
    xaxis = list(title = "Day", color = "white"),
    yaxis = list(title = "Cumulative USD", color = "white"),
    legend = list(font = list(color = "white"))
  )

3.3 Conclusion — Task 2

The simulate_sales() function models sales activity using nested loops across salespersons and days. A conditional function applies discount rates based on transaction value:

\[ d(s) = \begin{cases} 0.20 & s \geq 1000 \\ 0.10 & s \geq 500 \\ 0.05 & s \geq 200 \\ 0 & s < 200 \end{cases} \]

Cumulative sales follow:

\[ C_d = \sum_{i=1}^{d} s_i \]

After 30 days, total sales per salesperson range approximately between $18,000 and $22,000, depending on daily variation.

4 Task 3 - Multi Level Performance Categorization

In this task, categorize performance classifies sales into 5 levels and visualizes distribution.

4.1 Step 1 — Build the Function

library(plotly)

categorize_performance <- function(sales_amount) {
  categories <- c()
  for (sales in sales_amount) {
    if      (sales >= 1200) categories <- c(categories, "Excellent")
    else if (sales >= 900)  categories <- c(categories, "Very Good")
    else if (sales >= 600)  categories <- c(categories, "Good")
    else if (sales >= 300)  categories <- c(categories, "Average")
    else                    categories <- c(categories, "Poor")
  }
  return(categories)
}

4.2 Step 2 — Apply Function & Calculate Distribution

set.seed(42)
sales_vector <- round(runif(150, 100, 1500), 2)
performance_category <- categorize_performance(sales_vector)

perf_data <- data.frame(sales_amount = sales_vector, category = performance_category)

category_summary <- as.data.frame(table(perf_data$category))
colnames(category_summary) <- c("Category", "Count")
category_summary$Percentage <- round(
  category_summary$Count / sum(category_summary$Count) * 100, 1)
category_summary$Category <- factor(category_summary$Category,
  levels = c("Excellent","Very Good","Good","Average","Poor"))
category_summary <- category_summary[order(category_summary$Category), ]

print(category_summary)

##    Category Count Percentage
## 2 Excellent    35       23.3
## 5 Very Good    41       27.3
## 3      Good    29       19.3
## 1   Average    26       17.3
## 4      Poor    19       12.7

4.3 Step 3 — Interactive Bar Plot

cat_colors <- c("Excellent"="#34d399","Very Good"="#38bdf8",
                "Good"="#a78bfa","Average"="#fb923c","Poor"="#f472b6")

plot_ly(
  data = category_summary, x = ~Category, y = ~Count,
  type = "bar", color = ~Category, colors = unname(cat_colors),
  text  = ~paste0(Count, " records | ", Percentage, "%"),
  hovertemplate = "<b>%{x}</b><br>Count: %{y}<br>%{text}<extra></extra>",
  marker = list(line = list(color = "#0f0f1e", width = 1.5))
) %>% layout(
  title = list(text = "<b>Performance Category Distribution</b><br><sup>150 Observations</sup>",
               font = list(color = "white", size = 15)),
  xaxis = list(title = "Category", color = "white",
               gridcolor = "rgba(255,255,255,0.08)",
               categoryorder = "array",
               categoryarray = c("Excellent","Very Good","Good","Average","Poor")),
  yaxis = list(title = "Count", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
  showlegend = FALSE,
  paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)

4.4 Step 4 — Interactive Pie Chart

plot_ly(
  data = category_summary, labels = ~Category, values = ~Count,
  type = "pie",
  marker = list(colors = unname(cat_colors),
                line = list(color = "#0f0f1e", width = 2)),
  textinfo = "label+percent",
  textfont = list(color = "white", size = 13),
  hovertemplate = "<b>%{label}</b><br>Count: %{value}<br>%{percent}<extra></extra>",
  pull = c(0.05, 0, 0, 0, 0)
) %>% layout(
  title = list(text = "<b>Performance Category — Pie Chart</b>",
               font = list(color = "white", size = 15)),
  legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)",
                bordercolor = "rgba(255,255,255,0.15)", borderwidth = 1),
  paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)

4.5 Step 5 — Interactive Box Plot by Category

perf_data$category <- factor(perf_data$category,
  levels = c("Excellent","Very Good","Good","Average","Poor"))

plot_ly(
  data = perf_data, x = ~category, y = ~sales_amount,
  color = ~category, colors = unname(cat_colors),
  type = "box", boxpoints = "all", jitter = 0.3, pointpos = 0,
  marker = list(size = 4, opacity = 0.5),
  hovertemplate = "<b>%{x}</b><br>Sales: $%{y:,.2f}<extra></extra>"
) %>% layout(
  title = list(text = "<b>Sales Distribution per Performance Category</b>",
               font = list(color = "white", size = 15)),
  xaxis = list(title = "Category", color = "white",
               gridcolor = "rgba(255,255,255,0.08)"),
  yaxis = list(title = "Sales Amount (USD)", color = "white",
               gridcolor = "rgba(255,255,255,0.1)", tickprefix = "$"),
  showlegend = FALSE,
  paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)

4.6 Conclusion — Task 3

The categorize_performance() function classifies sales into five categories based on threshold values:

\[ \text{Category}(x) = \begin{cases} \text{Excellent} & x \geq 1200 \\ \text{Very Good} & x \geq 900 \\ \text{Good} & x \geq 600 \\ \text{Average} & x \geq 300 \\ \text{Poor} & x < 300 \end{cases} \]

Because the data is uniformly distributed over $[100,1500]$, each category covers a similar range, resulting in an approximately uniform distribution across all categories.

5 Task 4 - Multi Company Dataset Simulation

generate company data uses nested loops to generate company & employee data with KPI based conditional logic.

5.1 Step 1 — Build the Function

library(plotly)
library(dplyr)

generate_company_data <- function(n_company, n_employees) {
  set.seed(123)
  departments <- c("HR","Finance","Engineering","Marketing","Operations")
  all_data    <- data.frame()
  
  for (i in 1:n_company) {
    company_id <- paste0("COMP-", sprintf("%02d", i))
    for (j in 1:n_employees) {
      salary            <- round(runif(1, 3000, 15000), 2)
      department        <- sample(departments, 1)
      performance_score <- round(runif(1, 50, 100), 1)
      KPI_score         <- round(runif(1, 60, 100), 1)
      is_top            <- ifelse(KPI_score > 90, "Top Performer", "Regular")
      
      all_data <- rbind(all_data, data.frame(
        company_id = company_id,
        employee_id = paste0("EMP-", sprintf("%03d", j)),
        salary = salary, department = department,
        performance_score = performance_score,
        KPI_score = KPI_score, performer_status = is_top
      ))
    }
  }
  return(all_data)
}

cat(" Function generate_company_data() berhasil dibuat!\n")

##  Function generate_company_data() berhasil dibuat!

cat("   Conditional: KPI > 90 → Top Performer\n")

##    Conditional: KPI > 90 → Top Performer

5.2 Step 2 — Generate Dataset

company_data <- generate_company_data(n_company = 5, n_employees = 50)
write.csv(company_data, "company_data.csv", row.names = FALSE)
cat(sprintf(" Dataset: %d rows × %d cols | Saved to company_data.csv\n",
            nrow(company_data), ncol(company_data)))

##  Dataset: 250 rows × 7 cols | Saved to company_data.csv

5.3 Step 3 — Summary per Company

company_summary <- company_data %>%
  group_by(company_id) %>%
  summarise(
    Avg_Salary      = round(mean(salary), 2),
    Avg_Performance = round(mean(performance_score), 1),
    Avg_KPI         = round(mean(KPI_score), 1),
    Max_KPI         = round(max(KPI_score), 1),
    .groups = "drop"
  )
print(company_summary)

## # A tibble: 5 × 5
##   company_id Avg_Salary Avg_Performance Avg_KPI Max_KPI
##   <chr>           <dbl>           <dbl>   <dbl>   <dbl>
## 1 COMP-01         8696.            74.4    82.5    99.4
## 2 COMP-02         8345.            74.5    79.4    98.9
## 3 COMP-03         9457.            76.3    78.7    99.7
## 4 COMP-04         8620.            71.4    78.7    97.7
## 5 COMP-05         9274.            76.7    77.4    99.7

5.4 Step 4 — Interactive Summary Table

plot_ly(
  type = "table",
  header = list(
    values = list("<b>Company</b>","<b>Avg Salary</b>",
                  "<b>Avg Performance</b>","<b>Avg KPI</b>","<b>Max KPI</b>"),
    fill = list(color = "#1e1b4b"), font = list(color = "white", size = 12),
    align = "center", line = list(color = "#0f0f1e", width = 1)
  ),
  cells = list(
    values = list(
      company_summary$company_id,
      paste0("$", format(company_summary$Avg_Salary, big.mark = ",")),
      company_summary$Avg_Performance,
      company_summary$Avg_KPI,
      company_summary$Max_KPI
    ),
    fill = list(color = list("#0f0f1e","#111128")),
    font = list(color = "white", size = 11),
    align = "center", line = list(color = "#1e1b4b", width = 1)
  )
) %>% layout(paper_bgcolor = "#0f0f1e")

5.5 Step 5 — Interactive Bar: Avg Salary & KPI

comp_colors <- c("#a78bfa","#38bdf8","#f472b6","#fb923c","#34d399")

p4a <- plot_ly(company_summary, x = ~company_id, y = ~Avg_Salary,
               type = "bar", name = "Avg Salary",
               marker = list(color = "#a78bfa",
                             line = list(color = "#0f0f1e", width = 1)),
               hovertemplate = "<b>%{x}</b><br>Avg Salary: $%{y:,.2f}<extra></extra>")

p4b <- plot_ly(company_summary, x = ~company_id, y = ~Avg_KPI,
               type = "bar", name = "Avg KPI",
               marker = list(color = "#38bdf8",
                             line = list(color = "#0f0f1e", width = 1)),
               hovertemplate = "<b>%{x}</b><br>Avg KPI: %{y:.1f}<extra></extra>")

subplot(p4a, p4b, nrows = 1, shareX = TRUE, titleX = TRUE) %>%
  layout(
    title = list(text = "<b>Avg Salary & Avg KPI per Company</b>",
                 font = list(color = "white", size = 15)),
    xaxis  = list(title = "Company", color = "white",
                  gridcolor = "rgba(255,255,255,0.08)"),
    xaxis2 = list(title = "Company", color = "white",
                  gridcolor = "rgba(255,255,255,0.08)"),
    yaxis  = list(title = "Avg Salary (USD)", color = "white",
                  gridcolor = "rgba(255,255,255,0.1)", tickprefix = "$"),
    yaxis2 = list(title = "Avg KPI Score", color = "white",
                  gridcolor = "rgba(255,255,255,0.1)"),
    legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

5.6 Step 6 — Interactive Scatter: Performance vs KPI

p_scatter <- plot_ly()

for (i in seq_along(unique(company_data$company_id))) {
  comp <- unique(company_data$company_id)[i]
  df_c <- company_data[company_data$company_id == comp, ]
  
  p_scatter <- p_scatter %>% add_trace(
    data = df_c, x = ~performance_score, y = ~KPI_score,
    type = "scatter", mode = "markers", name = comp,
    marker = list(color = comp_colors[i], size = 8, opacity = 0.75,
                  symbol = ifelse(df_c$performer_status == "Top Performer",
                                  "star","circle"),
                  line = list(color = "white", width = 0.5)),
    hovertemplate = paste0("<b>", comp, "</b><br>",
                           "Employee: %{customdata}<br>",
                           "Performance: %{x:.1f}<br>KPI: %{y:.1f}<extra></extra>"),
    customdata = ~employee_id
  )
}

p_scatter %>%
  add_lines(x = c(50,100), y = c(90,90),
            line = list(color = "rgba(251,146,60,0.6)", dash = "dash", width = 1.5),
            name = "KPI = 90 threshold", showlegend = TRUE, hoverinfo = "skip") %>%
  layout(
    title = list(text = "<b>Performance vs KPI Score</b><br><sup>⭐ = Top Performer</sup>",
                 font = list(color = "white", size = 15)),
    xaxis = list(title = "Performance Score", color = "white",
                 gridcolor = "rgba(255,255,255,0.08)", range = c(45,105)),
    yaxis = list(title = "KPI Score", color = "white",
                 gridcolor = "rgba(255,255,255,0.08)", range = c(55,105)),
    legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

5.7 Conclusion — Task 4

The generate company data function simulates structured employee data using nested loops across companies and employees. Each observation includes salary, performance score, and KPI score.

The classification rule is defined as:

\[ \text{status} = \begin{cases} \text{Top Performer} & \text{if } KPI > 90 \\ \text{Regular} & \text{otherwise} \end{cases} \]

The generated dataset consists of $5 \times 50 = 250$ observations, with salary ranging from $3,000 to $15,000. Summary statistics such as mean salary, mean performance, and maximum KPI are computed efficiently.

6 Task 5 - Monte Carlo Simulation: Pi & Probability

monte carlo pi estimates π using random point simulation and computes sub-square probability.

6.1 Step 1 — Build the Function

library(plotly)

monte_carlo_pi <- function(n_points) {
  set.seed(99)
  x <- c(); y <- c(); inside_circle <- c(); in_subsquare <- c()
  
  for (i in 1:n_points) {
    xi <- runif(1, -1, 1)
    yi <- runif(1, -1, 1)
    x  <- c(x, xi); y <- c(y, yi)
    inside_circle <- c(inside_circle, sqrt(xi^2 + yi^2) <= 1)
    in_subsquare  <- c(in_subsquare, xi >= 0 & xi <= 0.5 & yi >= 0 & yi <= 0.5)
  }
  
  pi_estimate    <- 4 * sum(inside_circle) / n_points
  prob_subsquare <- sum(in_subsquare) / n_points
  
  cat(sprintf("  estimate : %.6f\n", pi_estimate))
  cat(sprintf("   Actual π   : %.6f\n", pi))
  cat(sprintf("   Error      : %.4f%%\n", abs(pi_estimate - pi) / pi * 100))
  cat(sprintf("   P(sub-sq)  : %.6f (theoretical: 0.062500)\n", prob_subsquare))
  
  return(list(x = x, y = y, inside_circle = inside_circle,
              in_subsquare = in_subsquare, pi_estimate = pi_estimate,
              prob_subsquare = prob_subsquare, n_points = n_points))
}

6.2 Step 2 — Run Simulation

mc_result <- monte_carlo_pi(3000)

##   estimate : 3.145333
##    Actual π   : 3.141593
##    Error      : 0.1191%
##    P(sub-sq)  : 0.059000 (theoretical: 0.062500)

6.3 Step 3 — Interactive Scatter: Inside vs Outside Circle

x      <- mc_result$x; y      <- mc_result$y
inside <- mc_result$inside_circle; in_sub <- mc_result$in_subsquare
theta  <- seq(0, 2*pi, length.out = 300)

plot_ly() %>%
  add_trace(x = x[!inside], y = y[!inside], type = "scatter", mode = "markers",
            name = "Outside Circle",
            marker = list(color = "#f472b6", size = 3, opacity = 0.5),
            hovertemplate = "Outside<br>x:%{x:.3f} y:%{y:.3f}<extra></extra>") %>%
  add_trace(x = x[inside], y = y[inside], type = "scatter", mode = "markers",
            name = "Inside Circle",
            marker = list(color = "#a78bfa", size = 3, opacity = 0.6),
            hovertemplate = "Inside<br>x:%{x:.3f} y:%{y:.3f}<extra></extra>") %>%
  add_trace(x = x[in_sub], y = y[in_sub], type = "scatter", mode = "markers",
            name = "Sub-Square [0,0.5]²",
            marker = list(color = "#34d399", size = 4, symbol = "diamond"),
            hovertemplate = "Sub-sq<br>x:%{x:.3f} y:%{y:.3f}<extra></extra>") %>%
  add_trace(x = cos(theta), y = sin(theta), type = "scatter", mode = "lines",
            name = "Unit Circle",
            line = list(color = "#fb923c", width = 2, dash = "dot"),
            hoverinfo = "skip") %>%
  add_trace(x = c(0,0.5,0.5,0,0), y = c(0,0,0.5,0.5,0),
            type = "scatter", mode = "lines", name = "Sub-Square Border",
            line = list(color = "#34d399", width = 2, dash = "dash"),
            hoverinfo = "skip") %>%
  layout(
    title = list(
      text = paste0("<b>Monte Carlo — π Estimation</b><br>",
                    "<sup>n=", mc_result$n_points,
                    " | π≈", round(mc_result$pi_estimate, 5), "</sup>"),
      font = list(color = "white", size = 14)),
    xaxis = list(title = "x", color = "white", range = c(-1.1,1.1),
                 gridcolor = "rgba(255,255,255,0.08)", zeroline = FALSE),
    yaxis = list(title = "y", color = "white", range = c(-1.1,1.1),
                 gridcolor = "rgba(255,255,255,0.08)", zeroline = FALSE,
                 scaleanchor = "x"),
    legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

6.4 Step 4 — π Convergence Plot

set.seed(99)
iter_sizes   <- c(10, 50, 100, 250, 500, 1000, 2000, 3000, 5000)
pi_estimates <- c()
x_all <- runif(5000, -1, 1); y_all <- runif(5000, -1, 1)

for (n in iter_sizes) {
  pi_estimates <- c(pi_estimates,
                    4 * sum(sqrt(x_all[1:n]^2 + y_all[1:n]^2) <= 1) / n)
}

conv_df <- data.frame(n = iter_sizes, pi_estimate = pi_estimates, actual_pi = pi)

plot_ly(conv_df) %>%
  add_trace(x = ~n, y = ~pi_estimate, type = "scatter", mode = "lines+markers",
            name = "π Estimate",
            line = list(color = "#a78bfa", width = 2.5),
            marker = list(color = "#a78bfa", size = 8,
                          line = list(color = "white", width = 1)),
            hovertemplate = "n=%{x}<br>π≈%{y:.5f}<extra></extra>") %>%
  add_trace(x = ~n, y = ~actual_pi, type = "scatter", mode = "lines",
            name = "Actual π",
            line = list(color = "#fb923c", width = 2, dash = "dash"),
            hoverinfo = "skip") %>%
  layout(
    title = list(text = "<b>π Estimate Convergence</b>",
                 font = list(color = "white", size = 15)),
    xaxis = list(title = "n", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
    yaxis = list(title = "π Estimate", color = "white",
                 gridcolor = "rgba(255,255,255,0.08)", range = c(2.5,4.0)),
    legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

6.5 Step 5 — Probability Bar Chart

prob_df <- data.frame(Type = c("Estimated","Theoretical"),
                      Probability = c(mc_result$prob_subsquare, 0.0625))

plot_ly(prob_df, x = ~Type, y = ~Probability, type = "bar",
        marker = list(color = c("#38bdf8","#fb923c"),
                      line = list(color = "#0f0f1e", width = 1.5)),
        text = ~round(Probability, 5), textposition = "outside",
        textfont = list(color = "white"),
        hovertemplate = "<b>%{x}</b><br>P = %{y:.5f}<extra></extra>") %>%
  layout(
    title = list(text = "<b>P(point in sub-square [0,0.5]²)</b>",
                 font = list(color = "white", size = 14)),
    xaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.08)"),
    yaxis = list(title = "Probability", color = "white",
                 gridcolor = "rgba(255,255,255,0.1)",
                 range = c(0, max(prob_df$Probability) * 1.3)),
    showlegend = FALSE,
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

6.6 Conclusion — Task 5

The monte carlo pi function estimates $\pi$ using random sampling within a square region:

\[ \hat{\pi} = 4 \cdot \frac{\text{number of points inside the circle}}{n} \]

This method is based on the ratio between the area of a unit circle and the enclosing square. As the number of points increases, the estimate converges to $\pi$, consistent with the Law of Large Numbers. For example, with $n=5000$, the estimate approaches $3.143$ with very small error.

7 Task 6 - Advanced Data Transformation & Feature Engineering

normalize columns and z score() transform data with loop-based normalization plus new feature creation.

7.1 Step 1 — Build Transformation Functions

library(plotly)
library(dplyr)

normalize_columns <- function(df) {
  df_norm <- df
  for (col in names(df)) {
    if (is.numeric(df[[col]])) {
      mn <- min(df[[col]], na.rm = TRUE)
      mx <- max(df[[col]], na.rm = TRUE)
      df_norm[[col]] <- if (mx - mn == 0) 0 else (df[[col]] - mn) / (mx - mn)
    }
  }
  return(df_norm)
}

z_score <- function(df) {
  df_z <- df
  for (col in names(df)) {
    if (is.numeric(df[[col]])) {
      m <- mean(df[[col]], na.rm = TRUE)
      s <- sd(df[[col]], na.rm = TRUE)
      df_z[[col]] <- if (s == 0) 0 else (df[[col]] - m) / s
    }
  }
  return(df_z)
}

cat(" normalize_columns() → Min-Max [0,1]\n")

##  normalize_columns() → Min-Max [0,1]

cat(" z_score()           → Mean=0, SD=1\n")

##  z_score()           → Mean=0, SD=1

7.2 Step 2 — Prepare Dataset

set.seed(123)
departments <- c("HR","Finance","Engineering","Marketing","Operations")

raw_data <- data.frame(
  employee_id       = paste0("EMP-", sprintf("%03d", 1:250)),
  company_id        = rep(paste0("COMP-", sprintf("%02d", 1:5)), each = 50),
  salary            = round(runif(250, 3000, 15000), 2),
  performance_score = round(runif(250, 50, 100), 1),
  KPI_score         = round(runif(250, 60, 100), 1),
  department        = sample(departments, 250, replace = TRUE)
)

numeric_cols <- raw_data %>% select(salary, performance_score, KPI_score)
cat(sprintf(" Dataset: %d rows × %d numeric columns\n",
            nrow(numeric_cols), ncol(numeric_cols)))

##  Dataset: 250 rows × 3 numeric columns

7.3 Step 3 — Apply Transformations

df_normalized <- normalize_columns(numeric_cols)
df_zscore     <- z_score(numeric_cols)

cat(" Summary Min-Max (salary):\n")

##  Summary Min-Max (salary):

print(summary(df_normalized$salary))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.2753  0.4849  0.5103  0.7302  1.0000

cat("\n Summary Z-Score (salary):\n")

## 
##  Summary Z-Score (salary):

print(round(summary(df_zscore$salary), 3))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -1.855  -0.854  -0.092   0.000   0.800   1.781

7.4 Step 4 — Feature Engineering

engineered_data <- raw_data %>%
  mutate(
    performance_category = case_when(
      performance_score >= 90 ~ "Excellent",
      performance_score >= 75 ~ "Good",
      performance_score >= 60 ~ "Average",
      TRUE ~ "Poor"
    ),
    salary_bracket = case_when(
      salary >= 12000 ~ "High (>=12k)",
      salary >= 8000  ~ "Mid (8k-12k)",
      salary >= 5000  ~ "Low-Mid (5k-8k)",
      TRUE ~ "Low (<5k)"
    ),
    KPI_tier = case_when(
      KPI_score >= 90 ~ "Top Tier",
      KPI_score >= 75 ~ "Mid Tier",
      TRUE ~ "Base Tier"
    )
  )

cat(" performance_category:\n"); print(table(engineered_data$performance_category))

##  performance_category:

## 
##   Average Excellent      Good      Poor 
##        82        47        66        55

cat(" salary_bracket:\n");       print(table(engineered_data$salary_bracket))

##  salary_bracket:

## 
##    High (>=12k) Low-Mid (5k-8k)       Low (<5k)    Mid (8k-12k) 
##              57              76              31              86

cat(" KPI_tier:\n");             print(table(engineered_data$KPI_tier))

##  KPI_tier:

## 
## Base Tier  Mid Tier  Top Tier 
##        90        87        73

7.5 Step 5 — Histogram: Before vs After

p6a <- plot_ly(alpha = 0.75) %>%
  add_histogram(x = numeric_cols$salary, name = "Original",
                marker = list(color = "#a78bfa"),
                hovertemplate = "Original<br>Range:%{x}<br>Count:%{y}<extra></extra>")

p6b <- plot_ly(alpha = 0.75) %>%
  add_histogram(x = df_normalized$salary, name = "Min-Max",
                marker = list(color = "#38bdf8"),
                hovertemplate = "Min-Max<br>Range:%{x:.3f}<br>Count:%{y}<extra></extra>")

p6c <- plot_ly(alpha = 0.75) %>%
  add_histogram(x = df_zscore$salary, name = "Z-Score",
                marker = list(color = "#f472b6"),
                hovertemplate = "Z-Score<br>Range:%{x:.3f}<br>Count:%{y}<extra></extra>")

subplot(p6a, p6b, p6c, nrows = 1, shareY = TRUE, titleX = TRUE) %>%
  layout(
    title = list(text = "<b>Salary Distribution: Before vs After Transformation</b>",
                 font = list(color = "white", size = 15)),
    showlegend = FALSE,
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

7.6 Step 6 — Boxplot: Before vs After

# FIX: gunakan add_trace type="box", hapus add_boxplot yang tidak valid
plot_ly() %>%
  add_trace(y = numeric_cols$salary, type = "box", name = "Original",
            marker = list(color = "#a78bfa"), line = list(color = "#a78bfa"),
            hovertemplate = "Original<br>Value:%{y:.2f}<extra></extra>") %>%
  add_trace(y = df_normalized$salary, type = "box", name = "Min-Max",
            marker = list(color = "#38bdf8"), line = list(color = "#38bdf8"),
            hovertemplate = "Min-Max<br>Value:%{y:.4f}<extra></extra>") %>%
  add_trace(y = df_zscore$salary, type = "box", name = "Z-Score",
            marker = list(color = "#f472b6"), line = list(color = "#f472b6"),
            hovertemplate = "Z-Score<br>Value:%{y:.4f}<extra></extra>") %>%
  layout(
    title = list(text = "<b>Salary: Before vs After Transformation</b>",
                 font = list(color = "white", size = 15)),
    yaxis = list(title = "Value", color = "white",
                 gridcolor = "rgba(255,255,255,0.1)"),
    xaxis = list(color = "white"),
    legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

7.7 Step 7 — New Features Distribution

# FIX: gunakan subplot agar kedua plot tampil sekaligus
perf_dist   <- as.data.frame(table(engineered_data$performance_category))
salary_dist <- as.data.frame(table(engineered_data$salary_bracket))
colnames(perf_dist)   <- c("Category","Count")
colnames(salary_dist) <- c("Category","Count")

p_perf <- plot_ly(perf_dist, x = ~Category, y = ~Count, type = "bar",
                  name = "Performance",
                  marker = list(color = c("#34d399","#38bdf8","#fb923c","#f472b6"),
                                line = list(color = "#0f0f1e", width = 1)),
                  hovertemplate = "<b>%{x}</b><br>Count: %{y}<extra></extra>") %>%
  layout(xaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.06)"),
         yaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.08)"),
         paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e")

p_sal <- plot_ly(salary_dist, x = ~Category, y = ~Count, type = "bar",
                 name = "Salary Bracket",
                 marker = list(color = c("#a78bfa","#38bdf8","#fb923c","#34d399"),
                               line = list(color = "#0f0f1e", width = 1)),
                 hovertemplate = "<b>%{x}</b><br>Count: %{y}<extra></extra>") %>%
  layout(xaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.06)"),
         yaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.08)"),
         paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e")

subplot(p_perf, p_sal, nrows = 1, shareY = FALSE, titleX = TRUE, margin = 0.06) %>%
  layout(
    title = list(text = "<b>New Features Distribution</b><br><sup>performance_category | salary_bracket</sup>",
                 font = list(color = "white", size = 15)),
    showlegend = FALSE,
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

7.8 Conclusion — Task 6

Two data transformation methods are applied:

Min-Max normalization: \[ x' = \frac{x - x_{\min}}{x_{\max} - x_{\min}} \]

Z-score standardization: \[ z = \frac{x - \mu}{\sigma} \]

Min-Max rescales data into the range $[0,1]$, while Z-score standardizes data to have mean $0$ and standard deviation $1$. These transformations improve comparability across variables.

Additional categorical features are created using conditional logic, including performance categories, salary brackets, and KPI tiers.

8 Task 7 - Mini Project: Company KPI Dashboard & Simulation

Generate dataset for 7 companies with 50–200 employees. Full KPI dashboard with advanced visualizations.

8.1 Step 1 — Generate Dataset

library(dplyr)
library(plotly)

set.seed(123)
n_companies           <- 7
employees_per_company <- sample(50:200, n_companies, replace = TRUE)
departments           <- c("HR","Finance","Engineering","Marketing","Operations")
company_data7         <- data.frame()

for (i in 1:n_companies) {
  n_emp <- employees_per_company[i]
  temp  <- data.frame(
    employee_id       = paste0("EMP-", sprintf("%04d", seq_len(n_emp) + (i*1000))),
    company_id        = paste0("COMP-", sprintf("%02d", i)),
    salary            = round(runif(n_emp, 3000, 15000), 2),
    performance_score = round(runif(n_emp, 50, 100), 1),
    KPI_score         = round(runif(n_emp, 60, 100), 1),
    department        = sample(departments, n_emp, replace = TRUE)
  )
  company_data7 <- rbind(company_data7, temp)
}

cat(sprintf(" Total rows: %d | Companies: %d\n",
            nrow(company_data7), n_distinct(company_data7$company_id)))

##  Total rows: 790 | Companies: 7

head(company_data7, 5)

##   employee_id company_id   salary performance_score KPI_score  department
## 1    EMP-1001    COMP-01 13797.90              67.6      76.4 Engineering
## 2    EMP-1002    COMP-01  5953.05              55.6      60.4   Marketing
## 3    EMP-1003    COMP-01  3504.71              62.2      67.4 Engineering
## 4    EMP-1004    COMP-01  6935.05              83.4      93.7  Operations
## 5    EMP-1005    COMP-01 14454.04              70.9      69.2   Marketing

8.2 Step 2 — KPI Tier (Loop-Based)

company_data7$KPI_tier <- ""
for (i in 1:nrow(company_data7)) {
  s <- company_data7$KPI_score[i]
  company_data7$KPI_tier[i] <- if (s >= 90) "Top Tier" else
                                if (s >= 75) "Mid Tier" else "Base Tier"
}
company_data7$KPI_tier <- factor(company_data7$KPI_tier,
  levels = c("Top Tier","Mid Tier","Base Tier"))
print(table(company_data7$KPI_tier))

## 
##  Top Tier  Mid Tier Base Tier 
##       192       300       298

8.3 Step 3 — Company Summary

company_summary7 <- company_data7 %>%
  group_by(company_id) %>%
  summarise(
    Total_Employees = n(),
    Avg_Salary      = round(mean(salary), 2),
    Avg_KPI         = round(mean(KPI_score), 1),
    Avg_Performance = round(mean(performance_score), 1),
    Top_Performers  = sum(KPI_score >= 90),
    .groups = "drop"
  ) %>%
  mutate(Top_Pct = round(Top_Performers / Total_Employees * 100, 1))

print(company_summary7)

## # A tibble: 7 × 7
##   company_id Total_Employees Avg_Salary Avg_KPI Avg_Performance Top_Performers
##   <chr>                <int>      <dbl>   <dbl>           <dbl>          <int>
## 1 COMP-01                 63      8925.    79.6            76.2             10
## 2 COMP-02                 99      8871.    80.9            74.1             30
## 3 COMP-03                167      9021.    79.1            73.9             39
## 4 COMP-04                 92      9048.    81.1            77.2             23
## 5 COMP-05                 63      9020.    79.6            74.7             14
## 6 COMP-06                167      8900.    79.7            75.1             43
## 7 COMP-07                139      8922.    79.9            75               33
## # ℹ 1 more variable: Top_Pct <dbl>

8.4 Step 4 — Interactive Summary Table

plot_ly(type = "table",
  header = list(
    values = list("<b>Company</b>","<b>Employees</b>","<b>Avg Salary</b>",
                  "<b>Avg KPI</b>","<b>Avg Perf</b>","<b>Top Performers</b>","<b>Top %</b>"),
    fill = list(color = "#1e1b4b"), font = list(color = "white", size = 12),
    align = "center", line = list(color = "#0f0f1e", width = 1)
  ),
  cells = list(
    values = list(
      company_summary7$company_id, company_summary7$Total_Employees,
      paste0("$", format(round(company_summary7$Avg_Salary), big.mark = ",")),
      company_summary7$Avg_KPI, company_summary7$Avg_Performance,
      company_summary7$Top_Performers, paste0(company_summary7$Top_Pct, "%")
    ),
    fill = list(color = list("#0f0f1e","#111128")),
    font = list(color = "white", size = 11),
    align = "center", line = list(color = "#1e1b4b", width = 1)
  )
) %>% layout(paper_bgcolor = "#0f0f1e")

8.5 Step 5 — Top Performers Table

top_performers7 <- company_data7 %>%
  filter(KPI_score >= 90) %>%
  arrange(desc(KPI_score)) %>%
  select(employee_id, company_id, department, KPI_score, performance_score, salary)

cat(sprintf(" Total Top Performers: %d\n", nrow(top_performers7)))

##  Total Top Performers: 192

plot_ly(type = "table",
  header = list(
    values = list("<b>Employee</b>","<b>Company</b>","<b>Department</b>",
                  "<b>KPI</b>","<b>Performance</b>","<b>Salary</b>"),
    fill = list(color = "#1e1b4b"), font = list(color = "white", size = 12),
    align = "center", line = list(color = "#0f0f1e", width = 1)
  ),
  cells = list(
    values = list(
      head(top_performers7$employee_id, 15),
      head(top_performers7$company_id, 15),
      head(top_performers7$department, 15),
      head(top_performers7$KPI_score, 15),
      head(top_performers7$performance_score, 15),
      paste0("$", format(round(head(top_performers7$salary,15)), big.mark=","))
    ),
    fill = list(color = list("#0f0f1e","#111128")),
    font = list(color = "white", size = 11),
    align = "center", line = list(color = "#1e1b4b", width = 1)
  )
) %>% layout(paper_bgcolor = "#0f0f1e")

8.6 Step 6 — Salary Distribution (Histogram)

comp_colors7 <- c("#a78bfa","#38bdf8","#f472b6","#fb923c","#34d399","#fbbf24","#60a5fa")

p7_hist <- plot_ly()
for (i in seq_along(unique(company_data7$company_id))) {
  comp <- unique(company_data7$company_id)[i]
  df_c <- company_data7[company_data7$company_id == comp, ]
  p7_hist <- p7_hist %>% add_histogram(
    x = df_c$salary, name = comp, nbinsx = 25, opacity = 0.6,
    marker = list(color = comp_colors7[i],
                  line = list(color = "#0f0f1e", width = 0.5)),
    hovertemplate = paste0("<b>", comp, "</b><br>Range:%{x}<br>Count:%{y}<extra></extra>")
  )
}
p7_hist %>% layout(
  title = list(text = "<b>Salary Distribution by Company</b>",
               font = list(color = "white", size = 15)),
  barmode = "overlay",
  xaxis = list(title = "Salary (USD)", color = "white",
               gridcolor = "rgba(255,255,255,0.08)", tickprefix = "$"),
  yaxis = list(title = "Count", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
  legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
  paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)

8.7 Step 7 — Grouped Bar: Avg KPI & Top Performers

p7a <- plot_ly(company_summary7, x = ~company_id, y = ~Avg_KPI, type = "bar",
               name = "Avg KPI",
               marker = list(color = "#a78bfa", line = list(color = "#0f0f1e", width=1)),
               text = ~round(Avg_KPI,1), textposition = "outside",
               textfont = list(color = "white"),
               hovertemplate = "<b>%{x}</b><br>Avg KPI: %{y:.1f}<extra></extra>")

p7b <- plot_ly(company_summary7, x = ~company_id, y = ~Top_Performers, type = "bar",
               name = "Top Performers",
               marker = list(color = "#34d399", line = list(color = "#0f0f1e", width=1)),
               text = ~Top_Performers, textposition = "outside",
               textfont = list(color = "white"),
               hovertemplate = "<b>%{x}</b><br>Top Performers: %{y}<extra></extra>")

subplot(p7a, p7b, nrows = 1, shareX = FALSE, titleX = TRUE) %>%
  layout(
    title = list(text = "<b>Avg KPI & Top Performers per Company</b>",
                 font = list(color = "white", size = 15)),
    xaxis  = list(title = "Company", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
    xaxis2 = list(title = "Company", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
    yaxis  = list(title = "Avg KPI", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
    yaxis2 = list(title = "Top Performers", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
    legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

8.8 Step 8 — Scatter: Salary vs KPI + Regression Line

lm_model <- lm(KPI_score ~ salary, data = company_data7)
reg_x    <- seq(min(company_data7$salary), max(company_data7$salary), length.out = 200)
reg_y    <- predict(lm_model, newdata = data.frame(salary = reg_x))

p7_scatter <- plot_ly()
for (i in seq_along(unique(company_data7$company_id))) {
  comp <- unique(company_data7$company_id)[i]
  df_c <- company_data7[company_data7$company_id == comp, ]
  p7_scatter <- p7_scatter %>% add_trace(
    data = df_c, x = ~salary, y = ~KPI_score,
    type = "scatter", mode = "markers", name = comp,
    marker = list(color = comp_colors7[i], size = 6, opacity = 0.65,
                  line = list(color = "white", width = 0.4)),
    hovertemplate = paste0("<b>", comp, "</b><br>",
                           "Employee: %{customdata}<br>",
                           "Salary: $%{x:,.0f}<br>KPI: %{y:.1f}<extra></extra>"),
    customdata = ~employee_id
  )
}

p7_scatter %>%
  add_trace(x = reg_x, y = reg_y, type = "scatter", mode = "lines",
            name = "Regression Line",
            line = list(color = "#fb923c", width = 2.5, dash = "dash"),
            hoverinfo = "skip") %>%
  layout(
    title = list(text = "<b>Salary vs KPI Score + Regression Line</b>",
                 font = list(color = "white", size = 15)),
    xaxis = list(title = "Salary (USD)", color = "white",
                 gridcolor = "rgba(255,255,255,0.08)", tickprefix = "$"),
    yaxis = list(title = "KPI Score", color = "white",
                 gridcolor = "rgba(255,255,255,0.08)"),
    legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

8.9 Step 9 — Department Analysis

dept_summary7 <- company_data7 %>%
  group_by(company_id, department) %>%
  summarise(avg_KPI = round(mean(KPI_score),1), count = n(), .groups = "drop")

dept_colors <- c("HR"="#a78bfa","Finance"="#38bdf8","Engineering"="#f472b6",
                 "Marketing"="#fb923c","Operations"="#34d399")

p7_dept <- plot_ly()
for (dept in unique(dept_summary7$department)) {
  df_d <- dept_summary7[dept_summary7$department == dept, ]
  p7_dept <- p7_dept %>% add_trace(
    data = df_d, x = ~company_id, y = ~avg_KPI, type = "bar", name = dept,
    marker = list(color = dept_colors[dept],
                  line = list(color = "#0f0f1e", width = 0.8)),
    hovertemplate = paste0("<b>", dept, " — %{x}</b><br>",
                           "Avg KPI: %{y:.1f}<extra></extra>")
  )
}

p7_dept %>% layout(
  title = list(text = "<b>Avg KPI per Department per Company</b>",
               font = list(color = "white", size = 15)),
  barmode = "group",
  xaxis = list(title = "Company", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
  yaxis = list(title = "Avg KPI", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
  legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
  paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)

8.10 Step 10 — KPI Tier Pie Chart

kpi_dist7 <- company_data7 %>%
  count(KPI_tier) %>% rename(Tier = KPI_tier, Count = n) %>%
  mutate(Pct = round(Count / sum(Count) * 100, 1))

tier_colors <- c("Top Tier"="#34d399","Mid Tier"="#a78bfa","Base Tier"="#f472b6")

plot_ly(kpi_dist7, labels = ~Tier, values = ~Count, type = "pie",
  marker = list(colors = unname(tier_colors[as.character(kpi_dist7$Tier)]),
                line = list(color = "#0f0f1e", width = 2)),
  textinfo = "label+percent", textfont = list(color = "white", size = 13),
  hovertemplate = "<b>%{label}</b><br>Count:%{value}<br>%{percent}<extra></extra>",
  pull = c(0.05, 0, 0)
) %>% layout(
  title = list(text = "<b>KPI Tier Distribution</b>",
               font = list(color = "white", size = 15)),
  legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
  paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)

8.11 Conclusion — Task 7

A KPI dashboard is generated for multiple companies using simulated data. The relationship between salary and KPI is modeled as:

\[ KPI = \beta_0 + \beta_1 \cdot salary + \varepsilon \]

The resulting regression shows a weak relationship, indicating that salary alone is not a strong predictor of employee performance. Various visualizations support this analysis.

9 Task 8 (Bonus) - Automated Report Generation

Use functions + loops to auto-generate summary reports for all companies.

9.1 Step 1 — Build Report Function

library(dplyr)
library(plotly)

generate_company_report <- function(data, company_name) {
  df        <- data %>% filter(company_id == company_name)
  total_emp <- nrow(df)
  
  df$KPI_tier <- ""
  for (i in 1:nrow(df)) {
    s <- df$KPI_score[i]
    df$KPI_tier[i] <- if (s >= 90) "Top Tier" else if (s >= 75) "Mid Tier" else "Base Tier"
  }
  
  return(list(
    company = company_name, data = df,
    stats = list(
      total_emp  = total_emp,
      avg_salary = round(mean(df$salary), 2),
      avg_kpi    = round(mean(df$KPI_score), 1),
      avg_perf   = round(mean(df$performance_score), 1),
      top_count  = sum(df$KPI_score >= 90),
      top_pct    = round(sum(df$KPI_score >= 90) / total_emp * 100, 1),
      max_kpi    = round(max(df$KPI_score), 1),
      min_salary = round(min(df$salary), 2),
      max_salary = round(max(df$salary), 2)
    )
  ))
}

cat(" Function generate_company_report() berhasil dibuat!\n")

##  Function generate_company_report() berhasil dibuat!

9.2 Step 2 — Generate All Reports

# FIX: gunakan company_data7 (bukan company_data)
all_companies <- unique(company_data7$company_id)
all_reports   <- list()

for (comp in all_companies) {
  all_reports[[comp]] <- generate_company_report(company_data7, comp)
  cat(sprintf(" %s | %d employees | Avg KPI: %.1f\n",
              comp,
              all_reports[[comp]]$stats$total_emp,
              all_reports[[comp]]$stats$avg_kpi))
}

##  COMP-01 | 63 employees | Avg KPI: 79.6
##  COMP-02 | 99 employees | Avg KPI: 80.9
##  COMP-03 | 167 employees | Avg KPI: 79.1
##  COMP-04 | 92 employees | Avg KPI: 81.1
##  COMP-05 | 63 employees | Avg KPI: 79.6
##  COMP-06 | 167 employees | Avg KPI: 79.7
##  COMP-07 | 139 employees | Avg KPI: 79.9

cat(sprintf("\n Total laporan: %d perusahaan\n", length(all_reports)))

## 
##  Total laporan: 7 perusahaan

9.3 Step 3 — Automated Summary Table

summary_compiled <- data.frame()

for (comp in names(all_reports)) {
  s <- all_reports[[comp]]$stats
  summary_compiled <- rbind(summary_compiled, data.frame(
    Company        = comp,
    Employees      = s$total_emp,
    Avg_Salary     = s$avg_salary,
    Avg_KPI        = s$avg_kpi,
    Avg_Perf       = s$avg_perf,
    Top_Performers = s$top_count,
    Top_Pct        = paste0(s$top_pct, "%"),
    Max_KPI        = s$max_kpi,
    Salary_Range   = paste0("$", format(round(s$min_salary), big.mark=","),
                            "–$", format(round(s$max_salary), big.mark=","))
  ))
}

plot_ly(type = "table",
  header = list(
    values = list("<b>Company</b>","<b>Employees</b>","<b>Avg Salary</b>",
                  "<b>Avg KPI</b>","<b>Avg Perf</b>","<b>Top</b>","<b>Top%</b>",
                  "<b>Max KPI</b>","<b>Salary Range</b>"),
    fill = list(color = "#1e1b4b"), font = list(color = "white", size = 11),
    align = "center", line = list(color = "#0f0f1e", width = 1)
  ),
  cells = list(
    values = list(
      summary_compiled$Company, summary_compiled$Employees,
      paste0("$", format(round(summary_compiled$Avg_Salary), big.mark=",")),
      summary_compiled$Avg_KPI, summary_compiled$Avg_Perf,
      summary_compiled$Top_Performers, summary_compiled$Top_Pct,
      summary_compiled$Max_KPI, summary_compiled$Salary_Range
    ),
    fill = list(color = list("#0f0f1e","#111128")),
    font = list(color = "white", size = 11),
    align = "center", line = list(color = "#1e1b4b", width = 1)
  )
) %>% layout(paper_bgcolor = "#0f0f1e")

9.4 Step 4 — Automated KPI Bar Chart

comp_colors8 <- c("#a78bfa","#38bdf8","#f472b6","#fb923c","#34d399","#fbbf24","#60a5fa")

plot_ly(summary_compiled, x = ~Company, y = ~Avg_KPI, type = "bar",
        color = ~Company, colors = comp_colors8,
        text  = ~paste0(Avg_KPI, "\n(Top:", Top_Performers, ")"),
        textposition = "outside", textfont = list(color = "white", size = 10),
        hovertemplate = paste0("<b>%{x}</b><br>Avg KPI: %{y:.1f}<br>",
                               "Employees: ", summary_compiled$Employees,
                               "<extra></extra>"),
        marker = list(line = list(color = "#0f0f1e", width = 1))
) %>% layout(
  title = list(text = "<b>Automated KPI Report — All Companies</b>",
               font = list(color = "white", size = 15)),
  xaxis = list(title = "Company", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
  yaxis = list(title = "Avg KPI Score", color = "white",
               gridcolor = "rgba(255,255,255,0.1)",
               range = c(0, max(summary_compiled$Avg_KPI) * 1.2)),
  showlegend = FALSE,
  paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)

9.5 Step 5 — Automated Box Plot (Loop)

p8_box <- plot_ly()

for (i in seq_along(names(all_reports))) {
  comp   <- names(all_reports)[i]
  df_rep <- all_reports[[comp]]$data
  
  p8_box <- p8_box %>% add_trace(
    y = df_rep$KPI_score, type = "box", name = comp,
    marker    = list(color = comp_colors8[i], size = 4),
    line      = list(color = comp_colors8[i]),
    fillcolor = paste0(comp_colors8[i], "40"),
    boxpoints = "outliers",
    hovertemplate = paste0("<b>", comp, "</b><br>KPI:%{y:.1f}<extra></extra>")
  )
}

p8_box %>%
  add_lines(x = c(-0.5, length(all_reports) - 0.5), y = c(90, 90),
            line = list(color = "rgba(251,191,36,0.7)", dash = "dash", width = 1.5),
            name = "Top Tier (90)", hoverinfo = "skip") %>%
  layout(
    title = list(text = "<b>KPI Distribution per Company (Auto-Generated)</b><br><sup>Garis kuning = Top Tier ≥ 90</sup>",
                 font = list(color = "white", size = 15)),
    xaxis = list(title = "Company", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
    yaxis = list(title = "KPI Score", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
    legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
    paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
  )

9.6 Step 6 — Export to CSV

# FIX: gunakan company_data7
write.csv(summary_compiled, "automated_report_summary.csv", row.names = FALSE)
write.csv(company_data7,    "company_data_full.csv",        row.names = FALSE)

cat(" File berhasil diekspor:\n")

##  File berhasil diekspor:

cat("    automated_report_summary.csv\n")

##     automated_report_summary.csv

cat("    company_data_full.csv\n")

##     company_data_full.csv

9.7 Conclusion — Task 8 (Bonus)

The generate company report function automates the reporting pipeline using loops.

All results, including summary statistics, visualizations, and exported files, are generated programmatically. This approach demonstrates a scalable workflow where adding new companies only requires adjusting input parameters without modifying the core logic.