Prakticum Week ~ 5
RISKY NURHIDAYAH
ADVANCED PRACTICUM
FUNCTION & LOOPS + DATA SCIENCE ~ Week 5
Sains Data
Institut Teknologi Sains Bandung
Bakti Siregar, M.Sc., CSD
1 Introduction
This practicum focuses on the application of functions, loops, and conditional logic in the context of data science. Students are encouraged to build structured workflows, from raw data simulations to automated reporting through interactive visualizations.
2 Task 1 - Dynamic Multi-Formula Function
In this task, we build a function called
compute formula that dynamically computes
four types of mathematical formulas linear,
quadratic, cubic, and
exponential for input values x = 1:20.
2.1 Step 1 — Build & Compute Formulas
library(ggplot2)
library(plotly)
library(reshape2)
library(dplyr)
# Function: compute_formula
compute_formula <- function(x, formula) {
valid_formulas <- c("linear", "quadratic", "cubic", "exponential")
if (!(formula %in% valid_formulas)) {
stop(paste("Formula tidak valid! Pilih:", paste(valid_formulas, collapse = ", ")))
}
result <- if (formula == "linear") {
2 * x + 3
} else if (formula == "quadratic") {
x^2 + 2 * x + 1
} else if (formula == "cubic") {
x^3 - x^2 + x
} else if (formula == "exponential") {
exp(0.3 * x)
}
return(result)
}
# Nested Loop Computation
x_vals <- 1:20
formulas <- c("linear", "quadratic", "cubic", "exponential")
results <- data.frame(x = x_vals)
for (f in formulas) {
values <- c()
for (x in x_vals) {
values <- c(values, compute_formula(x, f))
}
results[[f]] <- values
}
head(results, 5)## x linear quadratic cubic exponential
## 1 1 5 4 1 1.349859
## 2 2 7 9 6 1.822119
## 3 3 9 16 21 2.459603
## 4 4 11 25 52 3.320117
## 5 5 13 36 105 4.481689
2.2 Step 2 — Visual Comparison
results_long <- melt(results, id.vars = "x", variable.name = "Formula", value.name = "Value")
results_long$Label <- factor(results_long$Formula,
levels = c("linear","quadratic","cubic","exponential"),
labels = c("Linear: 2x+3","Quadratic: x²+2x+1","Cubic: x³-x²+x","Exponential: e^0.3x"))
colors_plt <- c("#a78bfa","#38bdf8","#f472b6","#fb923c")
plot_ly(results_long, x = ~x, y = ~Value, color = ~Label, colors = colors_plt,
type = "scatter", mode = "lines+markers") %>%
layout(
title = list(text = "<b>Dynamic Multi-Formula Analysis</b>", font = list(color = "white")),
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e",
xaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.1)"),
yaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.1)"),
legend = list(font = list(color = "white"))
)2.3 Conclusion — Task 1
The compute formula function evaluates four mathematical
models for \(x = 1,2,\ldots,20\) using
nested loops. The formulas include linear \(f(x)=2x+3\), quadratic \(f(x)=x^2+2x+1\), cubic \(f(x)=x^3-x^2+x\), and exponential \(f(x)=e^{0.3x}\).
At \(x=20\), the cubic function produces the largest value (approximately \(7600\)), followed by the exponential function (\(\approx 403\)) and the linear function (\(43\)). This shows that polynomial functions of higher degree can dominate growth within a finite range. The nested loop structure allows efficient computation of multiple formulas in a single process.
3 Task 2 - Nested Simulation: Multi Sales & Discounts
In this task, we build simulate_sales()
that simulates daily sales data with conditional
discounts and tracks cumulative sales per salesperson.
3.1 Step 1 — Simulation Logic
get_discount <- function(sales_amount) {
if (sales_amount >= 1000) return(0.20)
else if (sales_amount >= 500) return(0.10)
else if (sales_amount >= 200) return(0.05)
else return(0.00)
}
simulate_sales <- function(n_salesperson, days) {
set.seed(42)
all_data <- data.frame()
for (i in 1:n_salesperson) {
sales_id <- paste0("SP-", sprintf("%02d", i))
cumulative <- 0
for (d in 1:days) {
sales_amount <- round(runif(1, 100, 1500), 2)
discount_rate <- get_discount(sales_amount)
cumulative <- cumulative + sales_amount
all_data <- rbind(all_data, data.frame(
sales_id = sales_id, day = d, sales_amount = sales_amount,
discount_rate = discount_rate, cumulative = round(cumulative, 2)
))
}
}
return(all_data)
}
sales_data <- simulate_sales(n_salesperson = 3, days = 30)
head(sales_data, 5)## sales_id day sales_amount discount_rate cumulative
## 1 SP-01 1 1380.73 0.2 1380.73
## 2 SP-01 2 1411.91 0.2 2792.64
## 3 SP-01 3 500.60 0.1 3293.24
## 4 SP-01 4 1262.63 0.2 4555.87
## 5 SP-01 5 998.44 0.1 5554.31
3.2 Step 2 — Performance Tracking
plot_ly(sales_data, x = ~day, y = ~cumulative, color = ~sales_id,
colors = c("#a78bfa", "#38bdf8", "#f472b6"),
type = "scatter", mode = "lines+markers") %>%
layout(
title = list(text = "<b>Cumulative Sales Performance</b>", font = list(color = "white")),
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e",
xaxis = list(title = "Day", color = "white"),
yaxis = list(title = "Cumulative USD", color = "white"),
legend = list(font = list(color = "white"))
)3.3 Conclusion — Task 2
The simulate_sales() function models sales activity
using nested loops across salespersons and days. A conditional function
applies discount rates based on transaction value:
\[ d(s) = \begin{cases} 0.20 & s \geq 1000 \\ 0.10 & s \geq 500 \\ 0.05 & s \geq 200 \\ 0 & s < 200 \end{cases} \]
Cumulative sales follow:
\[ C_d = \sum_{i=1}^{d} s_i \]
After 30 days, total sales per salesperson range approximately between $18,000 and $22,000, depending on daily variation.
4 Task 3 - Multi Level Performance Categorization
In this task, categorize performance
classifies sales into 5 levels and visualizes distribution.
4.1 Step 1 — Build the Function
library(plotly)
categorize_performance <- function(sales_amount) {
categories <- c()
for (sales in sales_amount) {
if (sales >= 1200) categories <- c(categories, "Excellent")
else if (sales >= 900) categories <- c(categories, "Very Good")
else if (sales >= 600) categories <- c(categories, "Good")
else if (sales >= 300) categories <- c(categories, "Average")
else categories <- c(categories, "Poor")
}
return(categories)
}4.2 Step 2 — Apply Function & Calculate Distribution
set.seed(42)
sales_vector <- round(runif(150, 100, 1500), 2)
performance_category <- categorize_performance(sales_vector)
perf_data <- data.frame(sales_amount = sales_vector, category = performance_category)
category_summary <- as.data.frame(table(perf_data$category))
colnames(category_summary) <- c("Category", "Count")
category_summary$Percentage <- round(
category_summary$Count / sum(category_summary$Count) * 100, 1)
category_summary$Category <- factor(category_summary$Category,
levels = c("Excellent","Very Good","Good","Average","Poor"))
category_summary <- category_summary[order(category_summary$Category), ]
print(category_summary)## Category Count Percentage
## 2 Excellent 35 23.3
## 5 Very Good 41 27.3
## 3 Good 29 19.3
## 1 Average 26 17.3
## 4 Poor 19 12.7
4.3 Step 3 — Interactive Bar Plot
cat_colors <- c("Excellent"="#34d399","Very Good"="#38bdf8",
"Good"="#a78bfa","Average"="#fb923c","Poor"="#f472b6")
plot_ly(
data = category_summary, x = ~Category, y = ~Count,
type = "bar", color = ~Category, colors = unname(cat_colors),
text = ~paste0(Count, " records | ", Percentage, "%"),
hovertemplate = "<b>%{x}</b><br>Count: %{y}<br>%{text}<extra></extra>",
marker = list(line = list(color = "#0f0f1e", width = 1.5))
) %>% layout(
title = list(text = "<b>Performance Category Distribution</b><br><sup>150 Observations</sup>",
font = list(color = "white", size = 15)),
xaxis = list(title = "Category", color = "white",
gridcolor = "rgba(255,255,255,0.08)",
categoryorder = "array",
categoryarray = c("Excellent","Very Good","Good","Average","Poor")),
yaxis = list(title = "Count", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
showlegend = FALSE,
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)4.4 Step 4 — Interactive Pie Chart
plot_ly(
data = category_summary, labels = ~Category, values = ~Count,
type = "pie",
marker = list(colors = unname(cat_colors),
line = list(color = "#0f0f1e", width = 2)),
textinfo = "label+percent",
textfont = list(color = "white", size = 13),
hovertemplate = "<b>%{label}</b><br>Count: %{value}<br>%{percent}<extra></extra>",
pull = c(0.05, 0, 0, 0, 0)
) %>% layout(
title = list(text = "<b>Performance Category — Pie Chart</b>",
font = list(color = "white", size = 15)),
legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)",
bordercolor = "rgba(255,255,255,0.15)", borderwidth = 1),
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)4.5 Step 5 — Interactive Box Plot by Category
perf_data$category <- factor(perf_data$category,
levels = c("Excellent","Very Good","Good","Average","Poor"))
plot_ly(
data = perf_data, x = ~category, y = ~sales_amount,
color = ~category, colors = unname(cat_colors),
type = "box", boxpoints = "all", jitter = 0.3, pointpos = 0,
marker = list(size = 4, opacity = 0.5),
hovertemplate = "<b>%{x}</b><br>Sales: $%{y:,.2f}<extra></extra>"
) %>% layout(
title = list(text = "<b>Sales Distribution per Performance Category</b>",
font = list(color = "white", size = 15)),
xaxis = list(title = "Category", color = "white",
gridcolor = "rgba(255,255,255,0.08)"),
yaxis = list(title = "Sales Amount (USD)", color = "white",
gridcolor = "rgba(255,255,255,0.1)", tickprefix = "$"),
showlegend = FALSE,
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)4.6 Conclusion — Task 3
The categorize_performance() function classifies sales
into five categories based on threshold values:
\[ \text{Category}(x) = \begin{cases} \text{Excellent} & x \geq 1200 \\ \text{Very Good} & x \geq 900 \\ \text{Good} & x \geq 600 \\ \text{Average} & x \geq 300 \\ \text{Poor} & x < 300 \end{cases} \]
Because the data is uniformly distributed over \([100,1500]\), each category covers a similar range, resulting in an approximately uniform distribution across all categories.
5 Task 4 - Multi Company Dataset Simulation
generate company data uses nested loops
to generate company & employee data with KPI based conditional
logic.
5.1 Step 1 — Build the Function
library(plotly)
library(dplyr)
generate_company_data <- function(n_company, n_employees) {
set.seed(123)
departments <- c("HR","Finance","Engineering","Marketing","Operations")
all_data <- data.frame()
for (i in 1:n_company) {
company_id <- paste0("COMP-", sprintf("%02d", i))
for (j in 1:n_employees) {
salary <- round(runif(1, 3000, 15000), 2)
department <- sample(departments, 1)
performance_score <- round(runif(1, 50, 100), 1)
KPI_score <- round(runif(1, 60, 100), 1)
is_top <- ifelse(KPI_score > 90, "Top Performer", "Regular")
all_data <- rbind(all_data, data.frame(
company_id = company_id,
employee_id = paste0("EMP-", sprintf("%03d", j)),
salary = salary, department = department,
performance_score = performance_score,
KPI_score = KPI_score, performer_status = is_top
))
}
}
return(all_data)
}
cat(" Function generate_company_data() berhasil dibuat!\n")## Function generate_company_data() berhasil dibuat!
## Conditional: KPI > 90 → Top Performer
5.2 Step 2 — Generate Dataset
company_data <- generate_company_data(n_company = 5, n_employees = 50)
write.csv(company_data, "company_data.csv", row.names = FALSE)
cat(sprintf(" Dataset: %d rows × %d cols | Saved to company_data.csv\n",
nrow(company_data), ncol(company_data)))## Dataset: 250 rows × 7 cols | Saved to company_data.csv
5.3 Step 3 — Summary per Company
company_summary <- company_data %>%
group_by(company_id) %>%
summarise(
Avg_Salary = round(mean(salary), 2),
Avg_Performance = round(mean(performance_score), 1),
Avg_KPI = round(mean(KPI_score), 1),
Max_KPI = round(max(KPI_score), 1),
.groups = "drop"
)
print(company_summary)## # A tibble: 5 × 5
## company_id Avg_Salary Avg_Performance Avg_KPI Max_KPI
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 COMP-01 8696. 74.4 82.5 99.4
## 2 COMP-02 8345. 74.5 79.4 98.9
## 3 COMP-03 9457. 76.3 78.7 99.7
## 4 COMP-04 8620. 71.4 78.7 97.7
## 5 COMP-05 9274. 76.7 77.4 99.7
5.4 Step 4 — Interactive Summary Table
plot_ly(
type = "table",
header = list(
values = list("<b>Company</b>","<b>Avg Salary</b>",
"<b>Avg Performance</b>","<b>Avg KPI</b>","<b>Max KPI</b>"),
fill = list(color = "#1e1b4b"), font = list(color = "white", size = 12),
align = "center", line = list(color = "#0f0f1e", width = 1)
),
cells = list(
values = list(
company_summary$company_id,
paste0("$", format(company_summary$Avg_Salary, big.mark = ",")),
company_summary$Avg_Performance,
company_summary$Avg_KPI,
company_summary$Max_KPI
),
fill = list(color = list("#0f0f1e","#111128")),
font = list(color = "white", size = 11),
align = "center", line = list(color = "#1e1b4b", width = 1)
)
) %>% layout(paper_bgcolor = "#0f0f1e")5.5 Step 5 — Interactive Bar: Avg Salary & KPI
comp_colors <- c("#a78bfa","#38bdf8","#f472b6","#fb923c","#34d399")
p4a <- plot_ly(company_summary, x = ~company_id, y = ~Avg_Salary,
type = "bar", name = "Avg Salary",
marker = list(color = "#a78bfa",
line = list(color = "#0f0f1e", width = 1)),
hovertemplate = "<b>%{x}</b><br>Avg Salary: $%{y:,.2f}<extra></extra>")
p4b <- plot_ly(company_summary, x = ~company_id, y = ~Avg_KPI,
type = "bar", name = "Avg KPI",
marker = list(color = "#38bdf8",
line = list(color = "#0f0f1e", width = 1)),
hovertemplate = "<b>%{x}</b><br>Avg KPI: %{y:.1f}<extra></extra>")
subplot(p4a, p4b, nrows = 1, shareX = TRUE, titleX = TRUE) %>%
layout(
title = list(text = "<b>Avg Salary & Avg KPI per Company</b>",
font = list(color = "white", size = 15)),
xaxis = list(title = "Company", color = "white",
gridcolor = "rgba(255,255,255,0.08)"),
xaxis2 = list(title = "Company", color = "white",
gridcolor = "rgba(255,255,255,0.08)"),
yaxis = list(title = "Avg Salary (USD)", color = "white",
gridcolor = "rgba(255,255,255,0.1)", tickprefix = "$"),
yaxis2 = list(title = "Avg KPI Score", color = "white",
gridcolor = "rgba(255,255,255,0.1)"),
legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)5.6 Step 6 — Interactive Scatter: Performance vs KPI
p_scatter <- plot_ly()
for (i in seq_along(unique(company_data$company_id))) {
comp <- unique(company_data$company_id)[i]
df_c <- company_data[company_data$company_id == comp, ]
p_scatter <- p_scatter %>% add_trace(
data = df_c, x = ~performance_score, y = ~KPI_score,
type = "scatter", mode = "markers", name = comp,
marker = list(color = comp_colors[i], size = 8, opacity = 0.75,
symbol = ifelse(df_c$performer_status == "Top Performer",
"star","circle"),
line = list(color = "white", width = 0.5)),
hovertemplate = paste0("<b>", comp, "</b><br>",
"Employee: %{customdata}<br>",
"Performance: %{x:.1f}<br>KPI: %{y:.1f}<extra></extra>"),
customdata = ~employee_id
)
}
p_scatter %>%
add_lines(x = c(50,100), y = c(90,90),
line = list(color = "rgba(251,146,60,0.6)", dash = "dash", width = 1.5),
name = "KPI = 90 threshold", showlegend = TRUE, hoverinfo = "skip") %>%
layout(
title = list(text = "<b>Performance vs KPI Score</b><br><sup>⭐ = Top Performer</sup>",
font = list(color = "white", size = 15)),
xaxis = list(title = "Performance Score", color = "white",
gridcolor = "rgba(255,255,255,0.08)", range = c(45,105)),
yaxis = list(title = "KPI Score", color = "white",
gridcolor = "rgba(255,255,255,0.08)", range = c(55,105)),
legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)5.7 Conclusion — Task 4
The generate company data function simulates structured
employee data using nested loops across companies and employees. Each
observation includes salary, performance score, and KPI score.
The classification rule is defined as:
\[ \text{status} = \begin{cases} \text{Top Performer} & \text{if } KPI > 90 \\ \text{Regular} & \text{otherwise} \end{cases} \]
The generated dataset consists of \(5 \times 50 = 250\) observations, with salary ranging from $3,000 to $15,000. Summary statistics such as mean salary, mean performance, and maximum KPI are computed efficiently.
6 Task 5 - Monte Carlo Simulation: Pi & Probability
monte carlo pi estimates π using random
point simulation and computes sub-square probability.
6.1 Step 1 — Build the Function
library(plotly)
monte_carlo_pi <- function(n_points) {
set.seed(99)
x <- c(); y <- c(); inside_circle <- c(); in_subsquare <- c()
for (i in 1:n_points) {
xi <- runif(1, -1, 1)
yi <- runif(1, -1, 1)
x <- c(x, xi); y <- c(y, yi)
inside_circle <- c(inside_circle, sqrt(xi^2 + yi^2) <= 1)
in_subsquare <- c(in_subsquare, xi >= 0 & xi <= 0.5 & yi >= 0 & yi <= 0.5)
}
pi_estimate <- 4 * sum(inside_circle) / n_points
prob_subsquare <- sum(in_subsquare) / n_points
cat(sprintf(" estimate : %.6f\n", pi_estimate))
cat(sprintf(" Actual π : %.6f\n", pi))
cat(sprintf(" Error : %.4f%%\n", abs(pi_estimate - pi) / pi * 100))
cat(sprintf(" P(sub-sq) : %.6f (theoretical: 0.062500)\n", prob_subsquare))
return(list(x = x, y = y, inside_circle = inside_circle,
in_subsquare = in_subsquare, pi_estimate = pi_estimate,
prob_subsquare = prob_subsquare, n_points = n_points))
}6.2 Step 2 — Run Simulation
## estimate : 3.145333
## Actual π : 3.141593
## Error : 0.1191%
## P(sub-sq) : 0.059000 (theoretical: 0.062500)
6.3 Step 3 — Interactive Scatter: Inside vs Outside Circle
x <- mc_result$x; y <- mc_result$y
inside <- mc_result$inside_circle; in_sub <- mc_result$in_subsquare
theta <- seq(0, 2*pi, length.out = 300)
plot_ly() %>%
add_trace(x = x[!inside], y = y[!inside], type = "scatter", mode = "markers",
name = "Outside Circle",
marker = list(color = "#f472b6", size = 3, opacity = 0.5),
hovertemplate = "Outside<br>x:%{x:.3f} y:%{y:.3f}<extra></extra>") %>%
add_trace(x = x[inside], y = y[inside], type = "scatter", mode = "markers",
name = "Inside Circle",
marker = list(color = "#a78bfa", size = 3, opacity = 0.6),
hovertemplate = "Inside<br>x:%{x:.3f} y:%{y:.3f}<extra></extra>") %>%
add_trace(x = x[in_sub], y = y[in_sub], type = "scatter", mode = "markers",
name = "Sub-Square [0,0.5]²",
marker = list(color = "#34d399", size = 4, symbol = "diamond"),
hovertemplate = "Sub-sq<br>x:%{x:.3f} y:%{y:.3f}<extra></extra>") %>%
add_trace(x = cos(theta), y = sin(theta), type = "scatter", mode = "lines",
name = "Unit Circle",
line = list(color = "#fb923c", width = 2, dash = "dot"),
hoverinfo = "skip") %>%
add_trace(x = c(0,0.5,0.5,0,0), y = c(0,0,0.5,0.5,0),
type = "scatter", mode = "lines", name = "Sub-Square Border",
line = list(color = "#34d399", width = 2, dash = "dash"),
hoverinfo = "skip") %>%
layout(
title = list(
text = paste0("<b>Monte Carlo — π Estimation</b><br>",
"<sup>n=", mc_result$n_points,
" | π≈", round(mc_result$pi_estimate, 5), "</sup>"),
font = list(color = "white", size = 14)),
xaxis = list(title = "x", color = "white", range = c(-1.1,1.1),
gridcolor = "rgba(255,255,255,0.08)", zeroline = FALSE),
yaxis = list(title = "y", color = "white", range = c(-1.1,1.1),
gridcolor = "rgba(255,255,255,0.08)", zeroline = FALSE,
scaleanchor = "x"),
legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)6.4 Step 4 — π Convergence Plot
set.seed(99)
iter_sizes <- c(10, 50, 100, 250, 500, 1000, 2000, 3000, 5000)
pi_estimates <- c()
x_all <- runif(5000, -1, 1); y_all <- runif(5000, -1, 1)
for (n in iter_sizes) {
pi_estimates <- c(pi_estimates,
4 * sum(sqrt(x_all[1:n]^2 + y_all[1:n]^2) <= 1) / n)
}
conv_df <- data.frame(n = iter_sizes, pi_estimate = pi_estimates, actual_pi = pi)
plot_ly(conv_df) %>%
add_trace(x = ~n, y = ~pi_estimate, type = "scatter", mode = "lines+markers",
name = "π Estimate",
line = list(color = "#a78bfa", width = 2.5),
marker = list(color = "#a78bfa", size = 8,
line = list(color = "white", width = 1)),
hovertemplate = "n=%{x}<br>π≈%{y:.5f}<extra></extra>") %>%
add_trace(x = ~n, y = ~actual_pi, type = "scatter", mode = "lines",
name = "Actual π",
line = list(color = "#fb923c", width = 2, dash = "dash"),
hoverinfo = "skip") %>%
layout(
title = list(text = "<b>π Estimate Convergence</b>",
font = list(color = "white", size = 15)),
xaxis = list(title = "n", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
yaxis = list(title = "π Estimate", color = "white",
gridcolor = "rgba(255,255,255,0.08)", range = c(2.5,4.0)),
legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)6.5 Step 5 — Probability Bar Chart
prob_df <- data.frame(Type = c("Estimated","Theoretical"),
Probability = c(mc_result$prob_subsquare, 0.0625))
plot_ly(prob_df, x = ~Type, y = ~Probability, type = "bar",
marker = list(color = c("#38bdf8","#fb923c"),
line = list(color = "#0f0f1e", width = 1.5)),
text = ~round(Probability, 5), textposition = "outside",
textfont = list(color = "white"),
hovertemplate = "<b>%{x}</b><br>P = %{y:.5f}<extra></extra>") %>%
layout(
title = list(text = "<b>P(point in sub-square [0,0.5]²)</b>",
font = list(color = "white", size = 14)),
xaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.08)"),
yaxis = list(title = "Probability", color = "white",
gridcolor = "rgba(255,255,255,0.1)",
range = c(0, max(prob_df$Probability) * 1.3)),
showlegend = FALSE,
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)6.6 Conclusion — Task 5
The monte carlo pi function estimates \(\pi\) using random sampling within a square
region:
\[ \hat{\pi} = 4 \cdot \frac{\text{number of points inside the circle}}{n} \]
This method is based on the ratio between the area of a unit circle and the enclosing square. As the number of points increases, the estimate converges to \(\pi\), consistent with the Law of Large Numbers. For example, with \(n=5000\), the estimate approaches \(3.143\) with very small error.
7 Task 6 - Advanced Data Transformation & Feature Engineering
normalize columns and
z score() transform data with loop-based
normalization plus new feature creation.
7.1 Step 1 — Build Transformation Functions
library(plotly)
library(dplyr)
normalize_columns <- function(df) {
df_norm <- df
for (col in names(df)) {
if (is.numeric(df[[col]])) {
mn <- min(df[[col]], na.rm = TRUE)
mx <- max(df[[col]], na.rm = TRUE)
df_norm[[col]] <- if (mx - mn == 0) 0 else (df[[col]] - mn) / (mx - mn)
}
}
return(df_norm)
}
z_score <- function(df) {
df_z <- df
for (col in names(df)) {
if (is.numeric(df[[col]])) {
m <- mean(df[[col]], na.rm = TRUE)
s <- sd(df[[col]], na.rm = TRUE)
df_z[[col]] <- if (s == 0) 0 else (df[[col]] - m) / s
}
}
return(df_z)
}
cat(" normalize_columns() → Min-Max [0,1]\n")## normalize_columns() → Min-Max [0,1]
## z_score() → Mean=0, SD=1
7.2 Step 2 — Prepare Dataset
set.seed(123)
departments <- c("HR","Finance","Engineering","Marketing","Operations")
raw_data <- data.frame(
employee_id = paste0("EMP-", sprintf("%03d", 1:250)),
company_id = rep(paste0("COMP-", sprintf("%02d", 1:5)), each = 50),
salary = round(runif(250, 3000, 15000), 2),
performance_score = round(runif(250, 50, 100), 1),
KPI_score = round(runif(250, 60, 100), 1),
department = sample(departments, 250, replace = TRUE)
)
numeric_cols <- raw_data %>% select(salary, performance_score, KPI_score)
cat(sprintf(" Dataset: %d rows × %d numeric columns\n",
nrow(numeric_cols), ncol(numeric_cols)))## Dataset: 250 rows × 3 numeric columns
7.3 Step 3 — Apply Transformations
df_normalized <- normalize_columns(numeric_cols)
df_zscore <- z_score(numeric_cols)
cat(" Summary Min-Max (salary):\n")## Summary Min-Max (salary):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2753 0.4849 0.5103 0.7302 1.0000
##
## Summary Z-Score (salary):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.855 -0.854 -0.092 0.000 0.800 1.781
7.4 Step 4 — Feature Engineering
engineered_data <- raw_data %>%
mutate(
performance_category = case_when(
performance_score >= 90 ~ "Excellent",
performance_score >= 75 ~ "Good",
performance_score >= 60 ~ "Average",
TRUE ~ "Poor"
),
salary_bracket = case_when(
salary >= 12000 ~ "High (>=12k)",
salary >= 8000 ~ "Mid (8k-12k)",
salary >= 5000 ~ "Low-Mid (5k-8k)",
TRUE ~ "Low (<5k)"
),
KPI_tier = case_when(
KPI_score >= 90 ~ "Top Tier",
KPI_score >= 75 ~ "Mid Tier",
TRUE ~ "Base Tier"
)
)
cat(" performance_category:\n"); print(table(engineered_data$performance_category))## performance_category:
##
## Average Excellent Good Poor
## 82 47 66 55
## salary_bracket:
##
## High (>=12k) Low-Mid (5k-8k) Low (<5k) Mid (8k-12k)
## 57 76 31 86
## KPI_tier:
##
## Base Tier Mid Tier Top Tier
## 90 87 73
7.5 Step 5 — Histogram: Before vs After
p6a <- plot_ly(alpha = 0.75) %>%
add_histogram(x = numeric_cols$salary, name = "Original",
marker = list(color = "#a78bfa"),
hovertemplate = "Original<br>Range:%{x}<br>Count:%{y}<extra></extra>")
p6b <- plot_ly(alpha = 0.75) %>%
add_histogram(x = df_normalized$salary, name = "Min-Max",
marker = list(color = "#38bdf8"),
hovertemplate = "Min-Max<br>Range:%{x:.3f}<br>Count:%{y}<extra></extra>")
p6c <- plot_ly(alpha = 0.75) %>%
add_histogram(x = df_zscore$salary, name = "Z-Score",
marker = list(color = "#f472b6"),
hovertemplate = "Z-Score<br>Range:%{x:.3f}<br>Count:%{y}<extra></extra>")
subplot(p6a, p6b, p6c, nrows = 1, shareY = TRUE, titleX = TRUE) %>%
layout(
title = list(text = "<b>Salary Distribution: Before vs After Transformation</b>",
font = list(color = "white", size = 15)),
showlegend = FALSE,
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)7.6 Step 6 — Boxplot: Before vs After
# FIX: gunakan add_trace type="box", hapus add_boxplot yang tidak valid
plot_ly() %>%
add_trace(y = numeric_cols$salary, type = "box", name = "Original",
marker = list(color = "#a78bfa"), line = list(color = "#a78bfa"),
hovertemplate = "Original<br>Value:%{y:.2f}<extra></extra>") %>%
add_trace(y = df_normalized$salary, type = "box", name = "Min-Max",
marker = list(color = "#38bdf8"), line = list(color = "#38bdf8"),
hovertemplate = "Min-Max<br>Value:%{y:.4f}<extra></extra>") %>%
add_trace(y = df_zscore$salary, type = "box", name = "Z-Score",
marker = list(color = "#f472b6"), line = list(color = "#f472b6"),
hovertemplate = "Z-Score<br>Value:%{y:.4f}<extra></extra>") %>%
layout(
title = list(text = "<b>Salary: Before vs After Transformation</b>",
font = list(color = "white", size = 15)),
yaxis = list(title = "Value", color = "white",
gridcolor = "rgba(255,255,255,0.1)"),
xaxis = list(color = "white"),
legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)7.7 Step 7 — New Features Distribution
# FIX: gunakan subplot agar kedua plot tampil sekaligus
perf_dist <- as.data.frame(table(engineered_data$performance_category))
salary_dist <- as.data.frame(table(engineered_data$salary_bracket))
colnames(perf_dist) <- c("Category","Count")
colnames(salary_dist) <- c("Category","Count")
p_perf <- plot_ly(perf_dist, x = ~Category, y = ~Count, type = "bar",
name = "Performance",
marker = list(color = c("#34d399","#38bdf8","#fb923c","#f472b6"),
line = list(color = "#0f0f1e", width = 1)),
hovertemplate = "<b>%{x}</b><br>Count: %{y}<extra></extra>") %>%
layout(xaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.06)"),
yaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.08)"),
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e")
p_sal <- plot_ly(salary_dist, x = ~Category, y = ~Count, type = "bar",
name = "Salary Bracket",
marker = list(color = c("#a78bfa","#38bdf8","#fb923c","#34d399"),
line = list(color = "#0f0f1e", width = 1)),
hovertemplate = "<b>%{x}</b><br>Count: %{y}<extra></extra>") %>%
layout(xaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.06)"),
yaxis = list(color = "white", gridcolor = "rgba(255,255,255,0.08)"),
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e")
subplot(p_perf, p_sal, nrows = 1, shareY = FALSE, titleX = TRUE, margin = 0.06) %>%
layout(
title = list(text = "<b>New Features Distribution</b><br><sup>performance_category | salary_bracket</sup>",
font = list(color = "white", size = 15)),
showlegend = FALSE,
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)7.8 Conclusion — Task 6
Two data transformation methods are applied:
Min-Max normalization: \[ x' = \frac{x - x_{\min}}{x_{\max} - x_{\min}} \]
Z-score standardization: \[ z = \frac{x - \mu}{\sigma} \]
Min-Max rescales data into the range \([0,1]\), while Z-score standardizes data to have mean \(0\) and standard deviation \(1\). These transformations improve comparability across variables.
Additional categorical features are created using conditional logic, including performance categories, salary brackets, and KPI tiers.
8 Task 7 - Mini Project: Company KPI Dashboard & Simulation
Generate dataset for 7 companies with 50–200 employees. Full KPI dashboard with advanced visualizations.
8.1 Step 1 — Generate Dataset
library(dplyr)
library(plotly)
set.seed(123)
n_companies <- 7
employees_per_company <- sample(50:200, n_companies, replace = TRUE)
departments <- c("HR","Finance","Engineering","Marketing","Operations")
company_data7 <- data.frame()
for (i in 1:n_companies) {
n_emp <- employees_per_company[i]
temp <- data.frame(
employee_id = paste0("EMP-", sprintf("%04d", seq_len(n_emp) + (i*1000))),
company_id = paste0("COMP-", sprintf("%02d", i)),
salary = round(runif(n_emp, 3000, 15000), 2),
performance_score = round(runif(n_emp, 50, 100), 1),
KPI_score = round(runif(n_emp, 60, 100), 1),
department = sample(departments, n_emp, replace = TRUE)
)
company_data7 <- rbind(company_data7, temp)
}
cat(sprintf(" Total rows: %d | Companies: %d\n",
nrow(company_data7), n_distinct(company_data7$company_id)))## Total rows: 790 | Companies: 7
## employee_id company_id salary performance_score KPI_score department
## 1 EMP-1001 COMP-01 13797.90 67.6 76.4 Engineering
## 2 EMP-1002 COMP-01 5953.05 55.6 60.4 Marketing
## 3 EMP-1003 COMP-01 3504.71 62.2 67.4 Engineering
## 4 EMP-1004 COMP-01 6935.05 83.4 93.7 Operations
## 5 EMP-1005 COMP-01 14454.04 70.9 69.2 Marketing
8.2 Step 2 — KPI Tier (Loop-Based)
company_data7$KPI_tier <- ""
for (i in 1:nrow(company_data7)) {
s <- company_data7$KPI_score[i]
company_data7$KPI_tier[i] <- if (s >= 90) "Top Tier" else
if (s >= 75) "Mid Tier" else "Base Tier"
}
company_data7$KPI_tier <- factor(company_data7$KPI_tier,
levels = c("Top Tier","Mid Tier","Base Tier"))
print(table(company_data7$KPI_tier))##
## Top Tier Mid Tier Base Tier
## 192 300 298
8.3 Step 3 — Company Summary
company_summary7 <- company_data7 %>%
group_by(company_id) %>%
summarise(
Total_Employees = n(),
Avg_Salary = round(mean(salary), 2),
Avg_KPI = round(mean(KPI_score), 1),
Avg_Performance = round(mean(performance_score), 1),
Top_Performers = sum(KPI_score >= 90),
.groups = "drop"
) %>%
mutate(Top_Pct = round(Top_Performers / Total_Employees * 100, 1))
print(company_summary7)## # A tibble: 7 × 7
## company_id Total_Employees Avg_Salary Avg_KPI Avg_Performance Top_Performers
## <chr> <int> <dbl> <dbl> <dbl> <int>
## 1 COMP-01 63 8925. 79.6 76.2 10
## 2 COMP-02 99 8871. 80.9 74.1 30
## 3 COMP-03 167 9021. 79.1 73.9 39
## 4 COMP-04 92 9048. 81.1 77.2 23
## 5 COMP-05 63 9020. 79.6 74.7 14
## 6 COMP-06 167 8900. 79.7 75.1 43
## 7 COMP-07 139 8922. 79.9 75 33
## # ℹ 1 more variable: Top_Pct <dbl>
8.4 Step 4 — Interactive Summary Table
plot_ly(type = "table",
header = list(
values = list("<b>Company</b>","<b>Employees</b>","<b>Avg Salary</b>",
"<b>Avg KPI</b>","<b>Avg Perf</b>","<b>Top Performers</b>","<b>Top %</b>"),
fill = list(color = "#1e1b4b"), font = list(color = "white", size = 12),
align = "center", line = list(color = "#0f0f1e", width = 1)
),
cells = list(
values = list(
company_summary7$company_id, company_summary7$Total_Employees,
paste0("$", format(round(company_summary7$Avg_Salary), big.mark = ",")),
company_summary7$Avg_KPI, company_summary7$Avg_Performance,
company_summary7$Top_Performers, paste0(company_summary7$Top_Pct, "%")
),
fill = list(color = list("#0f0f1e","#111128")),
font = list(color = "white", size = 11),
align = "center", line = list(color = "#1e1b4b", width = 1)
)
) %>% layout(paper_bgcolor = "#0f0f1e")8.5 Step 5 — Top Performers Table
top_performers7 <- company_data7 %>%
filter(KPI_score >= 90) %>%
arrange(desc(KPI_score)) %>%
select(employee_id, company_id, department, KPI_score, performance_score, salary)
cat(sprintf(" Total Top Performers: %d\n", nrow(top_performers7)))## Total Top Performers: 192
plot_ly(type = "table",
header = list(
values = list("<b>Employee</b>","<b>Company</b>","<b>Department</b>",
"<b>KPI</b>","<b>Performance</b>","<b>Salary</b>"),
fill = list(color = "#1e1b4b"), font = list(color = "white", size = 12),
align = "center", line = list(color = "#0f0f1e", width = 1)
),
cells = list(
values = list(
head(top_performers7$employee_id, 15),
head(top_performers7$company_id, 15),
head(top_performers7$department, 15),
head(top_performers7$KPI_score, 15),
head(top_performers7$performance_score, 15),
paste0("$", format(round(head(top_performers7$salary,15)), big.mark=","))
),
fill = list(color = list("#0f0f1e","#111128")),
font = list(color = "white", size = 11),
align = "center", line = list(color = "#1e1b4b", width = 1)
)
) %>% layout(paper_bgcolor = "#0f0f1e")8.6 Step 6 — Salary Distribution (Histogram)
comp_colors7 <- c("#a78bfa","#38bdf8","#f472b6","#fb923c","#34d399","#fbbf24","#60a5fa")
p7_hist <- plot_ly()
for (i in seq_along(unique(company_data7$company_id))) {
comp <- unique(company_data7$company_id)[i]
df_c <- company_data7[company_data7$company_id == comp, ]
p7_hist <- p7_hist %>% add_histogram(
x = df_c$salary, name = comp, nbinsx = 25, opacity = 0.6,
marker = list(color = comp_colors7[i],
line = list(color = "#0f0f1e", width = 0.5)),
hovertemplate = paste0("<b>", comp, "</b><br>Range:%{x}<br>Count:%{y}<extra></extra>")
)
}
p7_hist %>% layout(
title = list(text = "<b>Salary Distribution by Company</b>",
font = list(color = "white", size = 15)),
barmode = "overlay",
xaxis = list(title = "Salary (USD)", color = "white",
gridcolor = "rgba(255,255,255,0.08)", tickprefix = "$"),
yaxis = list(title = "Count", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)8.7 Step 7 — Grouped Bar: Avg KPI & Top Performers
p7a <- plot_ly(company_summary7, x = ~company_id, y = ~Avg_KPI, type = "bar",
name = "Avg KPI",
marker = list(color = "#a78bfa", line = list(color = "#0f0f1e", width=1)),
text = ~round(Avg_KPI,1), textposition = "outside",
textfont = list(color = "white"),
hovertemplate = "<b>%{x}</b><br>Avg KPI: %{y:.1f}<extra></extra>")
p7b <- plot_ly(company_summary7, x = ~company_id, y = ~Top_Performers, type = "bar",
name = "Top Performers",
marker = list(color = "#34d399", line = list(color = "#0f0f1e", width=1)),
text = ~Top_Performers, textposition = "outside",
textfont = list(color = "white"),
hovertemplate = "<b>%{x}</b><br>Top Performers: %{y}<extra></extra>")
subplot(p7a, p7b, nrows = 1, shareX = FALSE, titleX = TRUE) %>%
layout(
title = list(text = "<b>Avg KPI & Top Performers per Company</b>",
font = list(color = "white", size = 15)),
xaxis = list(title = "Company", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
xaxis2 = list(title = "Company", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
yaxis = list(title = "Avg KPI", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
yaxis2 = list(title = "Top Performers", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)8.8 Step 8 — Scatter: Salary vs KPI + Regression Line
lm_model <- lm(KPI_score ~ salary, data = company_data7)
reg_x <- seq(min(company_data7$salary), max(company_data7$salary), length.out = 200)
reg_y <- predict(lm_model, newdata = data.frame(salary = reg_x))
p7_scatter <- plot_ly()
for (i in seq_along(unique(company_data7$company_id))) {
comp <- unique(company_data7$company_id)[i]
df_c <- company_data7[company_data7$company_id == comp, ]
p7_scatter <- p7_scatter %>% add_trace(
data = df_c, x = ~salary, y = ~KPI_score,
type = "scatter", mode = "markers", name = comp,
marker = list(color = comp_colors7[i], size = 6, opacity = 0.65,
line = list(color = "white", width = 0.4)),
hovertemplate = paste0("<b>", comp, "</b><br>",
"Employee: %{customdata}<br>",
"Salary: $%{x:,.0f}<br>KPI: %{y:.1f}<extra></extra>"),
customdata = ~employee_id
)
}
p7_scatter %>%
add_trace(x = reg_x, y = reg_y, type = "scatter", mode = "lines",
name = "Regression Line",
line = list(color = "#fb923c", width = 2.5, dash = "dash"),
hoverinfo = "skip") %>%
layout(
title = list(text = "<b>Salary vs KPI Score + Regression Line</b>",
font = list(color = "white", size = 15)),
xaxis = list(title = "Salary (USD)", color = "white",
gridcolor = "rgba(255,255,255,0.08)", tickprefix = "$"),
yaxis = list(title = "KPI Score", color = "white",
gridcolor = "rgba(255,255,255,0.08)"),
legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)8.9 Step 9 — Department Analysis
dept_summary7 <- company_data7 %>%
group_by(company_id, department) %>%
summarise(avg_KPI = round(mean(KPI_score),1), count = n(), .groups = "drop")
dept_colors <- c("HR"="#a78bfa","Finance"="#38bdf8","Engineering"="#f472b6",
"Marketing"="#fb923c","Operations"="#34d399")
p7_dept <- plot_ly()
for (dept in unique(dept_summary7$department)) {
df_d <- dept_summary7[dept_summary7$department == dept, ]
p7_dept <- p7_dept %>% add_trace(
data = df_d, x = ~company_id, y = ~avg_KPI, type = "bar", name = dept,
marker = list(color = dept_colors[dept],
line = list(color = "#0f0f1e", width = 0.8)),
hovertemplate = paste0("<b>", dept, " — %{x}</b><br>",
"Avg KPI: %{y:.1f}<extra></extra>")
)
}
p7_dept %>% layout(
title = list(text = "<b>Avg KPI per Department per Company</b>",
font = list(color = "white", size = 15)),
barmode = "group",
xaxis = list(title = "Company", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
yaxis = list(title = "Avg KPI", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)8.10 Step 10 — KPI Tier Pie Chart
kpi_dist7 <- company_data7 %>%
count(KPI_tier) %>% rename(Tier = KPI_tier, Count = n) %>%
mutate(Pct = round(Count / sum(Count) * 100, 1))
tier_colors <- c("Top Tier"="#34d399","Mid Tier"="#a78bfa","Base Tier"="#f472b6")
plot_ly(kpi_dist7, labels = ~Tier, values = ~Count, type = "pie",
marker = list(colors = unname(tier_colors[as.character(kpi_dist7$Tier)]),
line = list(color = "#0f0f1e", width = 2)),
textinfo = "label+percent", textfont = list(color = "white", size = 13),
hovertemplate = "<b>%{label}</b><br>Count:%{value}<br>%{percent}<extra></extra>",
pull = c(0.05, 0, 0)
) %>% layout(
title = list(text = "<b>KPI Tier Distribution</b>",
font = list(color = "white", size = 15)),
legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)8.11 Conclusion — Task 7
A KPI dashboard is generated for multiple companies using simulated data. The relationship between salary and KPI is modeled as:
\[ KPI = \beta_0 + \beta_1 \cdot salary + \varepsilon \]
The resulting regression shows a weak relationship, indicating that salary alone is not a strong predictor of employee performance. Various visualizations support this analysis.
9 Task 8 (Bonus) - Automated Report Generation
Use functions + loops to auto-generate summary reports for all companies.
9.1 Step 1 — Build Report Function
library(dplyr)
library(plotly)
generate_company_report <- function(data, company_name) {
df <- data %>% filter(company_id == company_name)
total_emp <- nrow(df)
df$KPI_tier <- ""
for (i in 1:nrow(df)) {
s <- df$KPI_score[i]
df$KPI_tier[i] <- if (s >= 90) "Top Tier" else if (s >= 75) "Mid Tier" else "Base Tier"
}
return(list(
company = company_name, data = df,
stats = list(
total_emp = total_emp,
avg_salary = round(mean(df$salary), 2),
avg_kpi = round(mean(df$KPI_score), 1),
avg_perf = round(mean(df$performance_score), 1),
top_count = sum(df$KPI_score >= 90),
top_pct = round(sum(df$KPI_score >= 90) / total_emp * 100, 1),
max_kpi = round(max(df$KPI_score), 1),
min_salary = round(min(df$salary), 2),
max_salary = round(max(df$salary), 2)
)
))
}
cat(" Function generate_company_report() berhasil dibuat!\n")## Function generate_company_report() berhasil dibuat!
9.2 Step 2 — Generate All Reports
# FIX: gunakan company_data7 (bukan company_data)
all_companies <- unique(company_data7$company_id)
all_reports <- list()
for (comp in all_companies) {
all_reports[[comp]] <- generate_company_report(company_data7, comp)
cat(sprintf(" %s | %d employees | Avg KPI: %.1f\n",
comp,
all_reports[[comp]]$stats$total_emp,
all_reports[[comp]]$stats$avg_kpi))
}## COMP-01 | 63 employees | Avg KPI: 79.6
## COMP-02 | 99 employees | Avg KPI: 80.9
## COMP-03 | 167 employees | Avg KPI: 79.1
## COMP-04 | 92 employees | Avg KPI: 81.1
## COMP-05 | 63 employees | Avg KPI: 79.6
## COMP-06 | 167 employees | Avg KPI: 79.7
## COMP-07 | 139 employees | Avg KPI: 79.9
##
## Total laporan: 7 perusahaan
9.3 Step 3 — Automated Summary Table
summary_compiled <- data.frame()
for (comp in names(all_reports)) {
s <- all_reports[[comp]]$stats
summary_compiled <- rbind(summary_compiled, data.frame(
Company = comp,
Employees = s$total_emp,
Avg_Salary = s$avg_salary,
Avg_KPI = s$avg_kpi,
Avg_Perf = s$avg_perf,
Top_Performers = s$top_count,
Top_Pct = paste0(s$top_pct, "%"),
Max_KPI = s$max_kpi,
Salary_Range = paste0("$", format(round(s$min_salary), big.mark=","),
"–$", format(round(s$max_salary), big.mark=","))
))
}
plot_ly(type = "table",
header = list(
values = list("<b>Company</b>","<b>Employees</b>","<b>Avg Salary</b>",
"<b>Avg KPI</b>","<b>Avg Perf</b>","<b>Top</b>","<b>Top%</b>",
"<b>Max KPI</b>","<b>Salary Range</b>"),
fill = list(color = "#1e1b4b"), font = list(color = "white", size = 11),
align = "center", line = list(color = "#0f0f1e", width = 1)
),
cells = list(
values = list(
summary_compiled$Company, summary_compiled$Employees,
paste0("$", format(round(summary_compiled$Avg_Salary), big.mark=",")),
summary_compiled$Avg_KPI, summary_compiled$Avg_Perf,
summary_compiled$Top_Performers, summary_compiled$Top_Pct,
summary_compiled$Max_KPI, summary_compiled$Salary_Range
),
fill = list(color = list("#0f0f1e","#111128")),
font = list(color = "white", size = 11),
align = "center", line = list(color = "#1e1b4b", width = 1)
)
) %>% layout(paper_bgcolor = "#0f0f1e")9.4 Step 4 — Automated KPI Bar Chart
comp_colors8 <- c("#a78bfa","#38bdf8","#f472b6","#fb923c","#34d399","#fbbf24","#60a5fa")
plot_ly(summary_compiled, x = ~Company, y = ~Avg_KPI, type = "bar",
color = ~Company, colors = comp_colors8,
text = ~paste0(Avg_KPI, "\n(Top:", Top_Performers, ")"),
textposition = "outside", textfont = list(color = "white", size = 10),
hovertemplate = paste0("<b>%{x}</b><br>Avg KPI: %{y:.1f}<br>",
"Employees: ", summary_compiled$Employees,
"<extra></extra>"),
marker = list(line = list(color = "#0f0f1e", width = 1))
) %>% layout(
title = list(text = "<b>Automated KPI Report — All Companies</b>",
font = list(color = "white", size = 15)),
xaxis = list(title = "Company", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
yaxis = list(title = "Avg KPI Score", color = "white",
gridcolor = "rgba(255,255,255,0.1)",
range = c(0, max(summary_compiled$Avg_KPI) * 1.2)),
showlegend = FALSE,
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)9.5 Step 5 — Automated Box Plot (Loop)
p8_box <- plot_ly()
for (i in seq_along(names(all_reports))) {
comp <- names(all_reports)[i]
df_rep <- all_reports[[comp]]$data
p8_box <- p8_box %>% add_trace(
y = df_rep$KPI_score, type = "box", name = comp,
marker = list(color = comp_colors8[i], size = 4),
line = list(color = comp_colors8[i]),
fillcolor = paste0(comp_colors8[i], "40"),
boxpoints = "outliers",
hovertemplate = paste0("<b>", comp, "</b><br>KPI:%{y:.1f}<extra></extra>")
)
}
p8_box %>%
add_lines(x = c(-0.5, length(all_reports) - 0.5), y = c(90, 90),
line = list(color = "rgba(251,191,36,0.7)", dash = "dash", width = 1.5),
name = "Top Tier (90)", hoverinfo = "skip") %>%
layout(
title = list(text = "<b>KPI Distribution per Company (Auto-Generated)</b><br><sup>Garis kuning = Top Tier ≥ 90</sup>",
font = list(color = "white", size = 15)),
xaxis = list(title = "Company", color = "white", gridcolor = "rgba(255,255,255,0.08)"),
yaxis = list(title = "KPI Score", color = "white", gridcolor = "rgba(255,255,255,0.1)"),
legend = list(font = list(color = "white"), bgcolor = "rgba(30,27,75,0.8)"),
paper_bgcolor = "#0f0f1e", plot_bgcolor = "#0f0f1e"
)9.6 Step 6 — Export to CSV
# FIX: gunakan company_data7
write.csv(summary_compiled, "automated_report_summary.csv", row.names = FALSE)
write.csv(company_data7, "company_data_full.csv", row.names = FALSE)
cat(" File berhasil diekspor:\n")## File berhasil diekspor:
## automated_report_summary.csv
## company_data_full.csv
9.7 Conclusion — Task 8 (Bonus)
The generate company report function automates the
reporting pipeline using loops.
All results, including summary statistics, visualizations, and exported files, are generated programmatically. This approach demonstrates a scalable workflow where adding new companies only requires adjusting input parameters without modifying the core logic.