This practicum report is a consolidated document covering Task 1 through Task 8 of the Data Science Programming — Functions and Loops module. All tasks are unified into a single file to provide a coherent, end-to-end reference that traces the progressive application of core programming concepts across increasingly complex analytical scenarios.
The practicum centres on three fundamental pillars of programming: functions, loops, and conditional branching. These concepts underpin virtually all data wrangling, simulation, and automated reporting workflows in professional data science practice. Each task is implemented in both R and Python to reinforce language-agnostic thinking and to highlight syntactic differences between the two ecosystems.
All visualisations in this document are interactive,
rendered via the plotly library in R and
plotly in Python, allowing readers to hover for precise
values, zoom into regions of interest, and toggle series visibility
directly in the browser.
The overarching objectives of this practicum are:
for loops — to automate repetitive computation across
multi-dimensional datasets.if-else if-else / if-elif-else) for
data-driven decision making: tiered discounts, performance
classification, and KPI assignment.plotly for hover-enabled,
zoomable charts.
This task covers Section 3 of Data Science Programming — Functions and Loops. The specific objectives are:
| Formula Type | Expression | Behaviour |
|---|---|---|
| Linear | f(x) = 2x + 3 | Constant growth rate — straight line |
| Quadratic | f(x) = x² - 2x + 1 | Accelerating growth — U-shaped curve |
| Cubic | f(x) = 0.5x³ - 3x + 2 | Initial dip, then steep rise |
| Exponential | f(x) = 2e^(0.2x) | Fastest growth — compounds repeatedly |
compute_formula() accepts a numeric x and a
string formula, returning the computed result. The
else branch acts as an input guard,
returning NA / None with a descriptive warning
rather than crashing silently.
| Feature | R | Python |
|---|---|---|
| Branch keyword | else if |
elif |
| Exponential | exp() |
math.exp() |
| Invalid input handling | warning() + return(NA) |
print() + return None |
# compute_formula: evaluates one of four mathematical formulas at a given x.
# Returns NA with a warning if an unrecognised formula name is supplied.
compute_formula <- function(x, formula) {
if (formula == "linear") return(2 * x + 3)
else if (formula == "quadratic") return(x^2 - 2 * x + 1)
else if (formula == "cubic") return(0.5 * x^3 - 3 * x + 2)
else if (formula == "exponential") return(2 * exp(0.2 * x))
else {
warning(paste0("[compute_formula] Unknown formula: '", formula, "'."))
return(NA)
}
}
cat("=== Spot Checks (x = 5) ===\n")
## === Spot Checks (x = 5) ===
cat("linear :", compute_formula(5, "linear"), "\n")
## linear : 13
cat("quadratic :", compute_formula(5, "quadratic"), "\n")
## quadratic : 16
cat("cubic :", compute_formula(5, "cubic"), "\n")
## cubic : 49.5
cat("exponential :", round(compute_formula(5, "exponential"), 4), "\n")
## exponential : 5.4366
cat("\n=== Validation — Unknown Formula ===\n")
##
## === Validation — Unknown Formula ===
cat("Returned :", compute_formula(5, "logarithmic"), "\n")
## Returned : NA
A nested loop evaluates every formula at every x from 1 to 20, producing an 80-row dataset (4 formulas × 20 values).
formula_list <- c("linear", "quadratic", "cubic", "exponential")
x_range <- 1:20
# FIX: gunakan list lalu do.call(rbind) — jauh lebih cepat dari rbind() per iterasi
rows_list <- vector("list", length(formula_list) * length(x_range))
idx <- 1L
for (f in formula_list) {
for (x in x_range) {
rows_list[[idx]] <- data.frame(x = x, y = compute_formula(x, f),
formula = f, stringsAsFactors = FALSE)
idx <- idx + 1L
}
}
results_df <- do.call(rbind, rows_list)
cat("Rows generated:", nrow(results_df),
"(", length(formula_list), "formulas ×", length(x_range), "x values )\n\n")
## Rows generated: 80 ( 4 formulas × 20 x values )
head(results_df, 8)
formula_colours <- c(
"linear" = "#2196F3",
"quadratic" = "#4CAF50",
"cubic" = "#FF9800",
"exponential" = "#E91E63"
)
formula_labels <- c(
"linear" = "Linear: f(x) = 2x + 3",
"quadratic" = "Quadratic: f(x) = x\u00B2 - 2x + 1",
"cubic" = "Cubic: f(x) = 0.5x\u00B3 - 3x + 2",
"exponential" = "Exponential: f(x) = 2e^(0.2x)"
)
p1 <- plot_ly()
for (f in formula_list) {
df_sub <- results_df[results_df$formula == f, ]
p1 <- add_trace(p1,
data = df_sub, x = ~x, y = ~y,
type = "scatter", mode = "lines+markers",
name = formula_labels[[f]],
line = list(color = formula_colours[[f]], width = 2.5),
marker = list(color = formula_colours[[f]], size = 6))
}
p1 <- layout(p1,
title = list(text = "Task 1: Comparison of Four Mathematical Formulas (x = 1 to 20)",
font = list(size = 15)),
xaxis = list(title = "x value", showgrid = TRUE, gridcolor = "#e8e8e8"),
yaxis = list(title = "f(x)", showgrid = TRUE, gridcolor = "#e8e8e8"),
legend = list(title = list(text = "<b>Formula</b>")),
hovermode = "x unified",
plot_bgcolor = "white",
paper_bgcolor = "white")
p1
summary_table <- results_df %>%
group_by(formula) %>%
summarise(
Min = round(min(y), 2),
Max = round(max(y), 2),
Mean = round(mean(y), 2),
Median = round(median(y), 2),
SD = round(sd(y), 2),
.groups = "drop"
) %>% rename(Formula = formula)
kable(summary_table,
col.names = c("Formula","Min","Max","Mean","Median","Std Dev"),
align = "lrrrrr",
caption = "Descriptive statistics for each formula across x = 1 to 20")
| Formula | Min | Max | Mean | Median | Std Dev |
|---|---|---|---|---|---|
| cubic | -0.50 | 3942.0 | 1073.00 | 553.25 | 1236.10 |
| exponential | 2.44 | 109.2 | 29.57 | 16.41 | 31.35 |
| linear | 5.00 | 43.0 | 24.00 | 24.00 | 11.83 |
| quadratic | 0.00 | 361.0 | 123.50 | 90.50 | 116.44 |
else branch
ensures safe failure — the function returns
NA/None with a descriptive warning.
The objectives for Task 2 are:
simulate_sales() — a nested simulation generating
a multi-salesperson, multi-day sales dataset.| Parameter / Rule | Detail |
|---|---|
| n_salesperson | Number of salespersons to simulate |
| days | Number of trading days per salesperson |
| Discount — High | sales_amount > 800 → 20% discount |
| Discount — Medium | sales_amount > 500 → 10% discount |
| No Discount | sales_amount ≤ 500 → no discount |
set.seed(42)
# simulate_sales: generates a sales dataset with tiered discount logic.
# FIX: gunakan list lalu do.call(rbind) — jauh lebih cepat dari rbind() per iterasi
simulate_sales <- function(n_salesperson, days) {
rows_list <- vector("list", n_salesperson * days)
idx <- 1L
for (s in 1:n_salesperson) {
for (d in 1:days) {
amount <- round(runif(1, 200, 1000), 2)
discount <- if (amount > 800) 0.20
else if (amount > 500) 0.10
else 0.00
rows_list[[idx]] <- data.frame(sales_id = s, day = d,
sales_amount = amount, discount_rate = discount,
stringsAsFactors = FALSE)
idx <- idx + 1L
}
}
do.call(rbind, rows_list)
}
sales_data <- simulate_sales(n_salesperson = 5, days = 10)
cat("Dataset:", nrow(sales_data), "rows ×", ncol(sales_data), "columns\n\n")
## Dataset: 50 rows × 4 columns
head(sales_data, 10)
calc_cumulative <- function(df, sid) {
sub_df <- df[df$sales_id == sid, ]
sub_df <- sub_df[order(sub_df$day), ]
cumsum(sub_df$sales_amount)
}
# FIX: gunakan list lalu do.call(rbind)
cum_rows <- vector("list", length(unique(sales_data$sales_id)))
for (i in seq_along(unique(sales_data$sales_id))) {
s <- sort(unique(sales_data$sales_id))[i]
cum_vals <- calc_cumulative(sales_data, s)
cum_rows[[i]] <- data.frame(sales_id = s, day = seq_along(cum_vals),
cumulative = cum_vals, stringsAsFactors = FALSE)
}
cum_df <- do.call(rbind, cum_rows)
head(cum_df, 10)
summary_sales <- sales_data %>%
group_by(sales_id) %>%
summarise(
Total_Sales = round(sum(sales_amount), 2),
Avg_Daily = round(mean(sales_amount), 2),
Total_Discount = round(sum(sales_amount * discount_rate), 2),
Days_High = sum(discount_rate == 0.20),
Days_Medium = sum(discount_rate == 0.10),
Days_None = sum(discount_rate == 0.00),
.groups = "drop"
) %>% rename("Salesperson ID" = sales_id)
kable(summary_sales,
col.names = c("Salesperson","Total Sales","Avg Daily",
"Total Discount","Days High","Days Medium","Days None"),
align = "crrrrrr",
caption = "Summary statistics per salesperson (5 persons × 10 days)")
| Salesperson | Total Sales | Avg Daily | Total Discount | Days High | Days Medium | Days None |
|---|---|---|---|---|---|---|
| 1 | 7090.09 | 709.01 | 909.93 | 3 | 5 | 2 |
| 2 | 6720.24 | 672.02 | 890.42 | 3 | 5 | 2 |
| 3 | 6923.09 | 692.31 | 1101.12 | 5 | 3 | 2 |
| 4 | 6153.74 | 615.37 | 801.90 | 3 | 4 | 3 |
| 5 | 7066.52 | 706.65 | 1066.84 | 4 | 5 | 1 |
pal2 <- brewer.pal(5, "Set1")
p2 <- plot_ly()
for (i in seq_along(unique(cum_df$sales_id))) {
s <- sort(unique(cum_df$sales_id))[i]
df_sub <- cum_df[cum_df$sales_id == s, ]
p2 <- add_trace(p2,
data = df_sub, x = ~day, y = ~cumulative,
type = "scatter", mode = "lines+markers",
name = paste("Salesperson", s),
line = list(color = pal2[i], width = 2.2),
marker = list(color = pal2[i], size = 6))
}
p2 <- layout(p2,
title = list(text = "Task 2: Cumulative Sales per Salesperson (10 Days)"),
xaxis = list(title = "Day", dtick = 1),
yaxis = list(title = "Cumulative Sales Amount"),
hovermode = "x unified",
legend = list(title = list(text = "<b>Salesperson</b>")),
plot_bgcolor = "white",
paper_bgcolor = "white")
p2
The objectives for Task 3 are:
categorize_performance() — a function assigning
one of five performance labels to a sales amount.| Category | Condition | Plot Colour |
|---|---|---|
| Excellent | sales_amount > 800 | #2ecc71 |
| Very Good | sales_amount > 650 | #3498db |
| Good | sales_amount > 500 | #f39c12 |
| Average | sales_amount > 350 | #e67e22 |
| Poor | sales_amount ≤ 350 | #e74c3c |
set.seed(7)
categorize_performance <- function(sales_amount) {
if (sales_amount > 800) return("Excellent")
else if (sales_amount > 650) return("Very Good")
else if (sales_amount > 500) return("Good")
else if (sales_amount > 350) return("Average")
else return("Poor")
}
sales_vector <- round(runif(100, 100, 1000), 2)
# FIX: pre-alokasi vektor karakter, bukan rbind row per row
categories <- character(length(sales_vector))
for (i in seq_along(sales_vector)) {
categories[i] <- categorize_performance(sales_vector[i])
}
result_df <- data.frame(sales_amount = sales_vector, category = categories,
stringsAsFactors = FALSE)
head(result_df, 10)
level_order <- c("Excellent","Very Good","Good","Average","Poor")
freq_table <- as.data.frame(table(result_df$category))
colnames(freq_table) <- c("Category","Count")
freq_table$Percentage <- round(freq_table$Count / nrow(result_df) * 100, 1)
freq_table$Category <- factor(freq_table$Category, levels = level_order)
freq_table <- freq_table[order(freq_table$Category), ]
kable(freq_table, col.names = c("Category","Count","Percentage (%)"),
align = "lrr",
caption = "Distribution of performance categories across 100 sales values")
| Category | Count | Percentage (%) | |
|---|---|---|---|
| 2 | Excellent | 22 | 22 |
| 5 | Very Good | 17 | 17 |
| 3 | Good | 15 | 15 |
| 1 | Average | 21 | 21 |
| 4 | Poor | 25 | 25 |
cat_colours <- c("Excellent"="#2ecc71","Very Good"="#3498db",
"Good"="#f39c12","Average"="#e67e22","Poor"="#e74c3c")
freq_plot <- freq_table
freq_plot$Category <- as.character(freq_plot$Category)
p_bar3 <- plot_ly(freq_plot,
x = ~Category, y = ~Count,
type = "bar",
marker = list(color = unname(cat_colours[freq_plot$Category])),
text = ~paste0(Percentage, "%"),
textposition = "outside",
hovertemplate = "<b>%{x}</b><br>Count: %{y}<br>Share: %{text}<extra></extra>"
) %>% layout(
title = "Task 3: Performance Category Distribution (n = 100)",
xaxis = list(title = "Category", categoryorder = "array",
categoryarray = level_order),
yaxis = list(title = "Count"),
showlegend = FALSE,
plot_bgcolor = "white",
paper_bgcolor = "white")
p_pie3 <- plot_ly(freq_plot,
labels = ~Category, values = ~Count,
type = "pie",
marker = list(colors = unname(cat_colours[freq_plot$Category])),
textinfo = "label+percent",
hovertemplate = "<b>%{label}</b><br>Count: %{value}<br>%{percent}<extra></extra>"
) %>% layout(
title = "Task 3: Proportional Share by Category",
plot_bgcolor = "white",
paper_bgcolor = "white")
p_bar3
p_pie3
The objectives for Task 4 are:
generate_company_data() with a nested
loop to simulate employee records across multiple
companies.| Column | Type | Description |
|---|---|---|
| company_id | integer | Company identifier (1 to n_company) |
| employee_id | integer | Employee identifier (1 to n_employees per company) |
| salary | numeric | Monthly salary in thousands — Uniform(3, 15) |
| department | character | One of: HR, IT, Finance, Marketing, Operations |
| performance_score | numeric | Performance score — Normal(70, 15), capped to [0, 100] |
| KPI_score | numeric | KPI: Uniform(91,100) if top performer, else Uniform(50,89) |
set.seed(123)
# FIX: gunakan list lalu do.call(rbind) — jauh lebih cepat dari rbind() per iterasi
generate_company_data <- function(n_company, n_employees) {
departments <- c("HR","IT","Finance","Marketing","Operations")
rows_list <- vector("list", n_company * n_employees)
idx <- 1L
for (c in 1:n_company) {
for (e in 1:n_employees) {
salary <- round(runif(1, 3, 15), 2)
dept <- sample(departments, 1)
perf <- round(max(0, min(100, rnorm(1, mean = 70, sd = 15))), 1)
kpi <- if (perf > 75) round(runif(1, 91, 100), 1) else round(runif(1, 50, 89), 1)
rows_list[[idx]] <- data.frame(company_id = c, employee_id = e, salary = salary,
department = dept, performance_score = perf,
KPI_score = kpi, stringsAsFactors = FALSE)
idx <- idx + 1L
}
}
do.call(rbind, rows_list)
}
company_data <- generate_company_data(n_company = 4, n_employees = 20)
cat("Dataset:", nrow(company_data), "rows ×", ncol(company_data), "columns\n\n")
## Dataset: 80 rows × 6 columns
head(company_data, 8)
company_summary <- company_data %>%
group_by(company_id) %>%
summarise(
Avg_Salary = round(mean(salary), 2),
Avg_Performance = round(mean(performance_score), 1),
Max_KPI = round(max(KPI_score), 1),
Top_Performers = sum(performance_score > 75),
.groups = "drop"
) %>% rename("Company ID" = company_id)
kable(company_summary,
col.names = c("Company","Avg Salary (k)","Avg Performance","Max KPI","Top Performers"),
align = "crrrc",
caption = "Per-company summary: 4 companies × 20 employees each")
| Company | Avg Salary (k) | Avg Performance | Max KPI | Top Performers |
|---|---|---|---|---|
| 1 | 9.31 | 73.3 | 97.4 | 8 |
| 2 | 9.29 | 71.0 | 99.4 | 7 |
| 3 | 9.92 | 67.8 | 99.6 | 5 |
| 4 | 8.71 | 71.8 | 99.2 | 6 |
bar_data <- company_summary %>%
rename(company_id = "Company ID") %>%
select(company_id, Avg_Salary, Avg_Performance) %>%
pivot_longer(cols = c(Avg_Salary, Avg_Performance),
names_to = "Metric", values_to = "Value")
p_bar4 <- plot_ly(bar_data,
x = ~factor(company_id), y = ~Value,
color = ~Metric,
colors = c("Avg_Salary" = "#3498db","Avg_Performance" = "#e74c3c"),
type = "bar",
text = ~round(Value, 2), textposition = "outside"
) %>% layout(
barmode = "group",
title = "Task 4: Average Salary vs. Performance by Company",
xaxis = list(title = "Company ID"),
yaxis = list(title = "Value"),
legend = list(title = list(text = "<b>Metric</b>")),
plot_bgcolor = "white",
paper_bgcolor = "white")
company_data$company_label <- paste("Company", company_data$company_id)
p_scatter4 <- plot_ly(company_data,
x = ~performance_score, y = ~KPI_score,
color = ~company_label,
type = "scatter", mode = "markers",
marker = list(size = 8, opacity = 0.7),
hovertemplate = "<b>%{color}</b><br>Performance: %{x}<br>KPI: %{y}<extra></extra>"
) %>% layout(
title = "Task 4: Performance Score vs. KPI Score (per Employee)",
xaxis = list(title = "Performance Score"),
yaxis = list(title = "KPI Score"),
legend = list(title = list(text = "<b>Company</b>")),
plot_bgcolor = "white",
paper_bgcolor = "white")
p_bar4
p_scatter4
The objectives for Task 5 are:
monte_carlo_pi() to estimate π
by random point sampling in a unit square.| Item | Description |
|---|---|
| Unit square | x ∈ [0,1], y ∈ [0,1] — area = 1 |
| Quarter-circle | x² + y² ≤ 1 — area = π/4 |
| Key ratio | points inside circle / total ≈ π/4 |
| π estimate | π ≈ 4 × (points inside / total points) |
| Sub-square | x ∈ [0,0.5], y ∈ [0,0.5] — area = 0.25 → P ≈ 0.25 |
set.seed(2024)
# FIX: pre-alokasi vektor numerik, bukan rbind per iterasi
monte_carlo_pi <- function(n_points) {
inside <- 0L; outside <- 0L
# Pre-alokasi vektor untuk plotting (max 2000 titik)
plot_n <- min(n_points, 2000L)
pts_x <- numeric(plot_n)
pts_y <- numeric(plot_n)
pts_st <- character(plot_n)
for (i in seq_len(n_points)) {
x <- runif(1); y <- runif(1)
if (x^2 + y^2 <= 1) { inside <- inside + 1L; status <- "inside" }
else { outside <- outside + 1L; status <- "outside" }
if (i <= plot_n) {
pts_x[i] <- x; pts_y[i] <- y; pts_st[i] <- status
}
}
pts <- data.frame(x = pts_x, y = pts_y, status = pts_st, stringsAsFactors = FALSE)
list(pi_estimate = 4 * inside / n_points,
inside_count = inside,
outside_count = outside,
points = pts)
}
mc_result <- monte_carlo_pi(10000)
cat("=== Monte Carlo π Estimation (n = 10,000) ===\n")
## === Monte Carlo π Estimation (n = 10,000) ===
cat("Points inside circle :", mc_result$inside_count, "\n")
## Points inside circle : 7865
cat("Points outside circle:", mc_result$outside_count, "\n")
## Points outside circle: 2135
cat("Estimated π :", round(mc_result$pi_estimate, 6), "\n")
## Estimated π : 3.146
cat("True π :", round(pi, 6), "\n")
## True π : 3.141593
cat("Absolute error :", round(abs(mc_result$pi_estimate - pi), 6), "\n")
## Absolute error : 0.004407
set.seed(2024)
all_x <- runif(10000); all_y <- runif(10000)
in_subsquare <- sum(all_x <= 0.5 & all_y <= 0.5)
prob_subsquare <- round(in_subsquare / 10000, 4)
cat("=== Sub-Square Probability (x ≤ 0.5 AND y ≤ 0.5) ===\n")
## === Sub-Square Probability (x ≤ 0.5 AND y ≤ 0.5) ===
cat("Points in sub-square :", in_subsquare, "\n")
## Points in sub-square : 2549
cat("Estimated probability :", prob_subsquare, "\n")
## Estimated probability : 0.2549
cat("Theoretical P : 0.2500\n")
## Theoretical P : 0.2500
cat("Absolute error :", round(abs(prob_subsquare - 0.25), 4), "\n")
## Absolute error : 0.0049
points_df <- mc_result$points
theta_seq <- seq(0, pi / 2, length.out = 200)
arc_df <- data.frame(x = cos(theta_seq), y = sin(theta_seq))
p_mc <- plot_ly() %>%
add_trace(data = points_df[points_df$status == "outside", ],
x = ~x, y = ~y, type = "scatter", mode = "markers",
name = "Outside circle",
marker = list(color = "#e74c3c", size = 3, opacity = 0.5)) %>%
add_trace(data = points_df[points_df$status == "inside", ],
x = ~x, y = ~y, type = "scatter", mode = "markers",
name = "Inside circle",
marker = list(color = "#3498db", size = 3, opacity = 0.5)) %>%
add_trace(data = arc_df, x = ~x, y = ~y,
type = "scatter", mode = "lines", name = "Circle boundary",
line = list(color = "black", width = 1.8)) %>%
layout(
title = list(text = paste0(
"Task 5: Monte Carlo Simulation (2,000 points shown)<br>",
"<sub>Estimated π = ", round(mc_result$pi_estimate, 5),
" | True π = ", round(pi, 5), "</sub>")),
xaxis = list(title = "x", range = c(0, 1), scaleanchor = "y"),
yaxis = list(title = "y", range = c(0, 1)),
shapes = list(list(
type = "rect", x0 = 0, x1 = 0.5, y0 = 0, y1 = 0.5,
line = list(color = "purple", dash = "dash", width = 1.5),
fillcolor = "rgba(128,0,128,0.05)")),
annotations = list(list(
x = 0.25, y = 0.55, text = "Sub-square (P ≈ 0.25)",
showarrow = FALSE, font = list(color = "purple", size = 11))),
plot_bgcolor = "white",
paper_bgcolor = "white")
p_mc
The objectives for Task 6 are:
normalize_columns() — loop-based min-max
normalisation (scales to [0, 1]).z_score() — loop-based Z-score
standardisation (mean = 0, sd = 1).performance_category
and salary_bracket.| Method | Formula | Output Range | Typical Use Case |
|---|---|---|---|
| Min-Max Normalisation | x_norm = (x − min) / (max − min) | [0, 1] — bounded | Distance-based models, neural networks |
| Z-Score Standardisation | x_z = (x − mean) / sd | Unbounded — centred at 0 | Regression, clustering — unit variance |
set.seed(99)
# FIX: generate_company_data_t6 juga menggunakan list + do.call(rbind)
generate_company_data_t6 <- function(n_company, n_employees) {
departments <- c("HR","IT","Finance","Marketing","Operations")
rows_list <- vector("list", n_company * n_employees)
idx <- 1L
for (c in 1:n_company) {
for (e in 1:n_employees) {
salary <- round(runif(1, 3, 15), 2)
dept <- sample(departments, 1)
perf <- round(max(0, min(100, rnorm(1, 70, 15))), 1)
kpi <- if (perf > 75) round(runif(1, 91, 100), 1) else round(runif(1, 50, 89), 1)
rows_list[[idx]] <- data.frame(company_id = c, employee_id = e, salary = salary,
department = dept, performance_score = perf,
KPI_score = kpi, stringsAsFactors = FALSE)
idx <- idx + 1L
}
}
do.call(rbind, rows_list)
}
company_data_t6 <- generate_company_data_t6(4, 20)
normalize_columns <- function(df) {
df_norm <- df
for (col in names(df_norm)) {
if (is.numeric(df_norm[[col]])) {
col_min <- min(df_norm[[col]], na.rm = TRUE)
col_max <- max(df_norm[[col]], na.rm = TRUE)
if (col_max != col_min)
df_norm[[col]] <- round((df_norm[[col]] - col_min) / (col_max - col_min), 4)
}
}
return(df_norm)
}
z_score <- function(df) {
df_z <- df
for (col in names(df_z)) {
if (is.numeric(df_z[[col]])) {
col_mean <- mean(df_z[[col]], na.rm = TRUE)
col_sd <- sd(df_z[[col]], na.rm = TRUE)
if (col_sd != 0)
df_z[[col]] <- round((df_z[[col]] - col_mean) / col_sd, 4)
}
}
return(df_z)
}
df_norm_t6 <- normalize_columns(company_data_t6)
df_z_t6 <- z_score(company_data_t6)
cat("=== Original (first 5 rows) ===\n")
## === Original (first 5 rows) ===
head(company_data_t6[, c("salary","performance_score","KPI_score")], 5)
cat("=== After min-max normalisation ===\n")
## === After min-max normalisation ===
head(df_norm_t6[, c("salary","performance_score","KPI_score")], 5)
cat("=== After Z-score standardisation ===\n")
## === After Z-score standardisation ===
head(df_z_t6[, c("salary","performance_score","KPI_score")], 5)
company_data_t6$performance_category <- NA_character_
company_data_t6$salary_bracket <- NA_character_
for (i in seq_len(nrow(company_data_t6))) {
perf <- company_data_t6$performance_score[i]
salary <- company_data_t6$salary[i]
company_data_t6$performance_category[i] <-
if (perf > 80) "Excellent"
else if (perf > 65) "Very Good"
else if (perf > 50) "Good"
else if (perf > 35) "Average"
else "Poor"
company_data_t6$salary_bracket[i] <-
if (salary > 10) "High"
else if (salary > 6) "Mid"
else "Low"
}
head(company_data_t6[, c("salary","performance_score",
"performance_category","salary_bracket")], 8)
sal_compare <- data.frame(
Original = company_data_t6$salary,
Normalised = df_norm_t6$salary,
Z_Score = df_z_t6$salary
) %>% pivot_longer(everything(), names_to = "Transformation", values_to = "Value")
perf_compare <- data.frame(
Original = company_data_t6$performance_score,
Normalised = df_norm_t6$performance_score,
Z_Score = df_z_t6$performance_score
) %>% pivot_longer(everything(), names_to = "Transformation", values_to = "Value")
tr_colours <- c("Original"="#3498db","Normalised"="#2ecc71","Z_Score"="#e74c3c")
p_sal_hist <- plot_ly()
for (tr in c("Original","Normalised","Z_Score")) {
vals <- sal_compare$Value[sal_compare$Transformation == tr]
p_sal_hist <- add_trace(p_sal_hist,
x = vals, type = "histogram", nbinsx = 15, name = tr,
marker = list(color = tr_colours[[tr]], opacity = 0.7))
}
p_sal_hist <- layout(p_sal_hist,
barmode = "overlay",
title = "Task 6: Salary Distribution — Before vs. After Transformation",
xaxis = list(title = "Value"), yaxis = list(title = "Count"),
plot_bgcolor = "white", paper_bgcolor = "white")
p_perf_hist <- plot_ly()
for (tr in c("Original","Normalised","Z_Score")) {
vals <- perf_compare$Value[perf_compare$Transformation == tr]
p_perf_hist <- add_trace(p_perf_hist,
x = vals, type = "histogram", nbinsx = 15, name = tr,
marker = list(color = tr_colours[[tr]], opacity = 0.7))
}
p_perf_hist <- layout(p_perf_hist,
barmode = "overlay",
title = "Task 6: Performance Score Distribution — Before vs. After Transformation",
xaxis = list(title = "Value"), yaxis = list(title = "Count"),
plot_bgcolor = "white", paper_bgcolor = "white")
p_sal_box <- plot_ly(sal_compare,
x = ~Transformation, y = ~Value,
color = ~Transformation, colors = tr_colours,
type = "box"
) %>% layout(
title = "Task 6: Salary — Boxplot Comparison",
xaxis = list(title = ""), yaxis = list(title = "Value"),
showlegend = FALSE,
plot_bgcolor = "white",
paper_bgcolor = "white")
p_sal_hist
p_perf_hist
p_sal_box
Task 7 is the capstone mini project integrating all concepts from Tasks 1–6. The objectives are:
| Parameter | Value / Definition |
|---|---|
| Companies | 7 |
| Employees per company | 100 |
| Total records | 700 |
| KPI Tier — High | KPI_score > 85 |
| KPI Tier — Mid | KPI_score 65–85 |
| KPI Tier — Low | KPI_score < 65 |
set.seed(777)
# FIX: gunakan list + do.call(rbind) untuk dataset 700 baris
generate_company_data_t7 <- function(n_company, n_employees) {
departments <- c("HR","IT","Finance","Marketing","Operations")
rows_list <- vector("list", n_company * n_employees)
idx <- 1L
for (c in 1:n_company) {
for (e in 1:n_employees) {
salary <- round(runif(1, 3, 15), 2)
dept <- sample(departments, 1)
perf <- round(max(0, min(100, rnorm(1, 70, 15))), 1)
kpi <- if (perf > 75) round(runif(1, 85, 100), 1) else round(runif(1, 40, 84), 1)
rows_list[[idx]] <- data.frame(company_id = c, employee_id = e, salary = salary,
department = dept, performance_score = perf,
KPI_score = kpi, stringsAsFactors = FALSE)
idx <- idx + 1L
}
}
do.call(rbind, rows_list)
}
kpi_df <- generate_company_data_t7(n_company = 7, n_employees = 100)
cat("Dataset:", nrow(kpi_df), "rows ×", ncol(kpi_df), "columns\n\n")
## Dataset: 700 rows × 6 columns
head(kpi_df, 6)
# FIX: gunakan ifelse() vectorised — jauh lebih cepat dari loop per baris untuk 700 baris
kpi_df$KPI_tier <- ifelse(kpi_df$KPI_score > 85, "High",
ifelse(kpi_df$KPI_score >= 65, "Mid", "Low"))
company_summary_t7 <- kpi_df %>%
group_by(company_id) %>%
summarise(
Avg_Salary = round(mean(salary), 2),
Avg_KPI = round(mean(KPI_score), 1),
Top_N = sum(KPI_tier == "High"),
Mid_N = sum(KPI_tier == "Mid"),
Low_N = sum(KPI_tier == "Low"),
.groups = "drop"
)
kable(company_summary_t7,
col.names = c("Company","Avg Salary (k)","Avg KPI","KPI High","KPI Mid","KPI Low"),
align = "crrrrrr",
caption = "KPI dashboard summary: 7 companies × 100 employees each")
| Company | Avg Salary (k) | Avg KPI | KPI High | KPI Mid | KPI Low |
|---|---|---|---|---|---|
| 1 | 8.80 | 70.9 | 30 | 30 | 40 |
| 2 | 8.99 | 73.7 | 33 | 34 | 33 |
| 3 | 8.34 | 73.5 | 37 | 26 | 37 |
| 4 | 8.30 | 73.9 | 39 | 26 | 35 |
| 5 | 8.88 | 73.7 | 36 | 30 | 34 |
| 6 | 8.58 | 74.9 | 45 | 24 | 31 |
| 7 | 8.43 | 70.5 | 32 | 24 | 44 |
tier_colours <- c("High"="#2ecc71","Mid"="#f39c12","Low"="#e74c3c")
tier_counts <- kpi_df %>%
group_by(company_id, KPI_tier) %>%
summarise(count = n(), .groups = "drop")
p_tier <- plot_ly(tier_counts,
x = ~factor(company_id), y = ~count,
color = ~KPI_tier, colors = tier_colours,
type = "bar",
hovertemplate = "Company %{x} | Tier: %{color} | Count: %{y}<extra></extra>"
) %>% layout(
barmode = "group",
title = "Chart 1: KPI Tier Distribution per Company",
xaxis = list(title = "Company ID"),
yaxis = list(title = "Number of Employees"),
legend = list(title = list(text = "<b>KPI Tier</b>")),
plot_bgcolor = "white",
paper_bgcolor = "white")
dept_top <- kpi_df %>%
filter(KPI_tier == "High") %>%
group_by(department) %>%
summarise(top_count = n(), .groups = "drop") %>%
arrange(top_count)
p_dept <- plot_ly(dept_top,
y = ~department, x = ~top_count,
type = "bar", orientation = "h",
marker = list(color = "#3498db", opacity = 0.85),
hovertemplate = "%{y}: %{x} top performers<extra></extra>"
) %>% layout(
title = "Chart 2: Top Performers by Department (KPI > 85)",
xaxis = list(title = "Count of Top Performers"),
yaxis = list(title = ""),
plot_bgcolor = "white",
paper_bgcolor = "white")
kpi_df$company_label <- paste("Company", kpi_df$company_id)
p_salary <- plot_ly(kpi_df,
x = ~company_label, y = ~salary,
color = ~company_label, type = "box", boxpoints = "outliers",
hovertemplate = "%{x}<br>Salary: %{y:.2f}k<extra></extra>"
) %>% layout(
title = "Chart 3: Salary Distribution per Company",
xaxis = list(title = "Company"),
yaxis = list(title = "Salary (thousands)"),
showlegend = FALSE,
plot_bgcolor = "white",
paper_bgcolor = "white")
p_perf_kpi <- plot_ly(kpi_df,
x = ~performance_score, y = ~KPI_score,
color = ~company_label, type = "scatter", mode = "markers",
marker = list(size = 5, opacity = 0.6),
hovertemplate = "<b>%{color}</b><br>Performance: %{x}<br>KPI: %{y}<extra></extra>"
) %>% layout(
title = "Chart 4: Performance Score vs. KPI Score",
xaxis = list(title = "Performance Score"),
yaxis = list(title = "KPI Score"),
legend = list(title = list(text = "<b>Company</b>")),
hovermode = "closest",
plot_bgcolor = "white",
paper_bgcolor = "white")
p_tier
p_dept
p_salary
p_perf_kpi
The objective of Task 8 is to demonstrate automated report generation: one function encapsulates all reporting logic for a single company, and a loop calls it for every company — producing consistent, structured outputs at scale without any repetitive code.
set.seed(777)
# FIX: gunakan list + do.call(rbind) untuk dataset 700 baris
generate_company_data_t8 <- function(n_company, n_employees) {
departments <- c("HR","IT","Finance","Marketing","Operations")
rows_list <- vector("list", n_company * n_employees)
idx <- 1L
for (c in 1:n_company) {
for (e in 1:n_employees) {
salary <- round(runif(1, 3, 15), 2)
dept <- sample(departments, 1)
perf <- round(max(0, min(100, rnorm(1, 70, 15))), 1)
kpi <- if (perf > 75) round(runif(1, 85, 100), 1) else round(runif(1, 40, 84), 1)
rows_list[[idx]] <- data.frame(company_id = c, employee_id = e, salary = salary,
department = dept, performance_score = perf,
KPI_score = kpi, stringsAsFactors = FALSE)
idx <- idx + 1L
}
}
do.call(rbind, rows_list)
}
kpi_df_t8 <- generate_company_data_t8(7, 100)
kpi_df_t8$KPI_tier <- ifelse(kpi_df_t8$KPI_score > 85, "High",
ifelse(kpi_df_t8$KPI_score >= 65, "Mid", "Low"))
generate_company_report <- function(df, cid) {
co_df <- df %>% filter(company_id == cid)
tier_colours <- c("High"="#2ecc71","Mid"="#f39c12","Low"="#e74c3c")
stats <- data.frame(
Metric = c("Total Employees","Avg Salary (k)","Avg Performance Score",
"Avg KPI Score","KPI High (>85)","KPI Mid (65–85)","KPI Low (<65)"),
Value = c(nrow(co_df),
round(mean(co_df$salary), 2),
round(mean(co_df$performance_score), 1),
round(mean(co_df$KPI_score), 1),
sum(co_df$KPI_tier == "High"),
sum(co_df$KPI_tier == "Mid"),
sum(co_df$KPI_tier == "Low")))
cat("\n══════════════════════════════════════════════\n")
cat(paste0(" COMPANY ", cid, " — AUTOMATED KPI REPORT\n"))
cat("══════════════════════════════════════════════\n\n")
cat("[ Summary Statistics ]\n")
print(kable(stats, col.names = c("Metric","Value"), align = "lr"))
top_df <- co_df %>%
filter(KPI_tier == "High") %>%
select(employee_id, department, salary, performance_score, KPI_score) %>%
arrange(desc(KPI_score)) %>%
head(5)
cat("\n[ Top 5 Performers ]\n")
print(kable(top_df,
col.names = c("Employee ID","Department","Salary (k)","Performance","KPI"),
align = "clrrr"))
dept_summary <- co_df %>%
group_by(department) %>%
summarise(Count = n(), Avg_KPI = round(mean(KPI_score), 1), .groups = "drop")
cat("\n[ Department Breakdown ]\n")
print(kable(dept_summary,
col.names = c("Department","Headcount","Avg KPI"), align = "lrr"))
p1 <- plot_ly(co_df, x = ~department, color = ~KPI_tier, colors = tier_colours,
type = "histogram", barnorm = "fraction",
hovertemplate = "%{x} — %{color}: %{y:.1%}<extra></extra>"
) %>% layout(
title = paste0("Company ", cid, ": KPI Tier Proportion by Department"),
xaxis = list(title = "Department"),
yaxis = list(title = "Proportion", tickformat = ".0%"),
barmode = "stack",
plot_bgcolor = "white",
paper_bgcolor = "white")
p2 <- plot_ly(co_df, x = ~salary, color = ~KPI_tier, colors = tier_colours,
type = "histogram", nbinsx = 15, opacity = 0.8,
hovertemplate = "Salary: %{x:.1f}k<br>Count: %{y}<extra></extra>"
) %>% layout(
barmode = "overlay",
title = paste0("Company ", cid, ": Salary Distribution by KPI Tier"),
xaxis = list(title = "Salary (thousands)"),
yaxis = list(title = "Count"),
plot_bgcolor = "white",
paper_bgcolor = "white")
print(p1)
print(p2)
# FIX: CSV hanya ditulis jika belum ada — mencegah duplikasi di setiap run/knit
folder <- "reports"
if (!dir.exists(folder)) dir.create(folder)
file_name <- file.path(folder, paste0("company_", cid, "_report.csv"))
if (!file.exists(file_name)) {
write.csv(co_df, file = file_name, row.names = FALSE)
cat(paste0("\n CSV exported: ", file_name, "\n\n"))
} else {
cat(paste0("\n CSV sudah ada, skip: ", file_name, "\n\n"))
}
}
# A single for loop produces a complete report for every company automatically.
for (cid in sort(unique(kpi_df_t8$company_id))) {
generate_company_report(kpi_df_t8, cid)
}
##
## ══════════════════════════════════════════════
## COMPANY 1 — AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
##
## [ Summary Statistics ]
##
##
## |Metric | Value|
## |:---------------------|-----:|
## |Total Employees | 100.0|
## |Avg Salary (k) | 8.8|
## |Avg Performance Score | 69.7|
## |Avg KPI Score | 70.9|
## |KPI High (>85) | 30.0|
## |KPI Mid (65–85) | 30.0|
## |KPI Low (<65) | 40.0|
##
## [ Top 5 Performers ]
##
##
## | Employee ID |Department | Salary (k)| Performance| KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## | 49 |Operations | 5.26| 94.8| 99.9|
## | 94 |Marketing | 3.00| 89.1| 99.7|
## | 86 |Finance | 9.80| 85.6| 99.0|
## | 8 |HR | 6.98| 100.0| 98.6|
## | 79 |Marketing | 3.61| 79.6| 98.4|
##
## [ Department Breakdown ]
##
##
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance | 22| 61.2|
## |HR | 19| 77.1|
## |IT | 25| 70.9|
## |Marketing | 18| 73.9|
## |Operations | 16| 73.7|
##
## CSV sudah ada, skip: reports/company_1_report.csv
##
##
## ══════════════════════════════════════════════
## COMPANY 2 — AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
##
## [ Summary Statistics ]
##
##
## |Metric | Value|
## |:---------------------|------:|
## |Total Employees | 100.00|
## |Avg Salary (k) | 8.99|
## |Avg Performance Score | 70.40|
## |Avg KPI Score | 73.70|
## |KPI High (>85) | 33.00|
## |KPI Mid (65–85) | 34.00|
## |KPI Low (<65) | 33.00|
##
## [ Top 5 Performers ]
##
##
## | Employee ID |Department | Salary (k)| Performance| KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## | 85 |Marketing | 12.87| 82.0| 99.8|
## | 93 |Marketing | 11.43| 88.2| 99.0|
## | 97 |Operations | 12.72| 81.6| 98.9|
## | 98 |Marketing | 4.03| 92.7| 98.8|
## | 39 |Marketing | 8.39| 83.5| 98.2|
##
## [ Department Breakdown ]
##
##
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance | 24| 72.5|
## |HR | 17| 72.4|
## |IT | 22| 69.8|
## |Marketing | 22| 77.9|
## |Operations | 15| 76.5|
##
## CSV sudah ada, skip: reports/company_2_report.csv
##
##
## ══════════════════════════════════════════════
## COMPANY 3 — AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
##
## [ Summary Statistics ]
##
##
## |Metric | Value|
## |:---------------------|------:|
## |Total Employees | 100.00|
## |Avg Salary (k) | 8.34|
## |Avg Performance Score | 70.50|
## |Avg KPI Score | 73.50|
## |KPI High (>85) | 37.00|
## |KPI Mid (65–85) | 26.00|
## |KPI Low (<65) | 37.00|
##
## [ Top 5 Performers ]
##
##
## | Employee ID |Department | Salary (k)| Performance| KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## | 51 |Marketing | 3.84| 100.0| 99.7|
## | 70 |Operations | 13.13| 79.8| 99.7|
## | 11 |Operations | 6.45| 84.5| 99.3|
## | 59 |Finance | 5.62| 90.1| 98.9|
## | 26 |HR | 13.20| 79.9| 98.7|
##
## [ Department Breakdown ]
##
##
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance | 23| 69.7|
## |HR | 17| 81.0|
## |IT | 20| 70.2|
## |Marketing | 21| 73.8|
## |Operations | 19| 74.6|
##
## CSV sudah ada, skip: reports/company_3_report.csv
##
##
## ══════════════════════════════════════════════
## COMPANY 4 — AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
##
## [ Summary Statistics ]
##
##
## |Metric | Value|
## |:---------------------|-----:|
## |Total Employees | 100.0|
## |Avg Salary (k) | 8.3|
## |Avg Performance Score | 70.4|
## |Avg KPI Score | 73.9|
## |KPI High (>85) | 39.0|
## |KPI Mid (65–85) | 26.0|
## |KPI Low (<65) | 35.0|
##
## [ Top 5 Performers ]
##
##
## | Employee ID |Department | Salary (k)| Performance| KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## | 37 |Operations | 13.87| 75.9| 99.4|
## | 19 |HR | 6.70| 75.6| 99.0|
## | 79 |Operations | 14.66| 100.0| 98.6|
## | 72 |Marketing | 14.38| 77.0| 98.0|
## | 28 |Finance | 3.52| 90.2| 97.2|
##
## [ Department Breakdown ]
##
##
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance | 19| 75.2|
## |HR | 24| 72.6|
## |IT | 16| 73.5|
## |Marketing | 15| 77.4|
## |Operations | 26| 72.5|
##
## CSV sudah ada, skip: reports/company_4_report.csv
##
##
## ══════════════════════════════════════════════
## COMPANY 5 — AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
##
## [ Summary Statistics ]
##
##
## |Metric | Value|
## |:---------------------|------:|
## |Total Employees | 100.00|
## |Avg Salary (k) | 8.88|
## |Avg Performance Score | 70.10|
## |Avg KPI Score | 73.70|
## |KPI High (>85) | 36.00|
## |KPI Mid (65–85) | 30.00|
## |KPI Low (<65) | 34.00|
##
## [ Top 5 Performers ]
##
##
## | Employee ID |Department | Salary (k)| Performance| KPI|
## |:-----------:|:----------|----------:|-----------:|-----:|
## | 47 |Marketing | 8.21| 79.2| 100.0|
## | 6 |Finance | 10.40| 86.7| 99.8|
## | 67 |IT | 11.92| 85.5| 99.7|
## | 98 |HR | 7.73| 76.9| 99.5|
## | 86 |IT | 13.85| 76.8| 99.1|
##
## [ Department Breakdown ]
##
##
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance | 21| 74.9|
## |HR | 27| 73.3|
## |IT | 23| 72.8|
## |Marketing | 16| 74.9|
## |Operations | 13| 73.1|
##
## CSV sudah ada, skip: reports/company_5_report.csv
##
##
## ══════════════════════════════════════════════
## COMPANY 6 — AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
##
## [ Summary Statistics ]
##
##
## |Metric | Value|
## |:---------------------|------:|
## |Total Employees | 100.00|
## |Avg Salary (k) | 8.58|
## |Avg Performance Score | 71.20|
## |Avg KPI Score | 74.90|
## |KPI High (>85) | 45.00|
## |KPI Mid (65–85) | 24.00|
## |KPI Low (<65) | 31.00|
##
## [ Top 5 Performers ]
##
##
## | Employee ID |Department | Salary (k)| Performance| KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## | 51 |IT | 14.91| 77.1| 99.5|
## | 19 |Finance | 11.75| 80.5| 99.1|
## | 25 |Operations | 5.88| 78.2| 99.0|
## | 34 |IT | 8.45| 78.5| 98.3|
## | 41 |IT | 7.79| 100.0| 98.1|
##
## [ Department Breakdown ]
##
##
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance | 18| 75.7|
## |HR | 22| 71.2|
## |IT | 21| 80.3|
## |Marketing | 20| 75.7|
## |Operations | 19| 71.6|
##
## CSV sudah ada, skip: reports/company_6_report.csv
##
##
## ══════════════════════════════════════════════
## COMPANY 7 — AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
##
## [ Summary Statistics ]
##
##
## |Metric | Value|
## |:---------------------|------:|
## |Total Employees | 100.00|
## |Avg Salary (k) | 8.43|
## |Avg Performance Score | 69.40|
## |Avg KPI Score | 70.50|
## |KPI High (>85) | 32.00|
## |KPI Mid (65–85) | 24.00|
## |KPI Low (<65) | 44.00|
##
## [ Top 5 Performers ]
##
##
## | Employee ID |Department | Salary (k)| Performance| KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## | 79 |Operations | 3.18| 77.7| 99.0|
## | 11 |Operations | 9.97| 90.2| 98.9|
## | 64 |Marketing | 5.36| 77.8| 98.0|
## | 40 |HR | 12.21| 95.0| 97.9|
## | 86 |Finance | 13.85| 76.2| 97.9|
##
## [ Department Breakdown ]
##
##
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance | 26| 69.9|
## |HR | 22| 71.1|
## |IT | 13| 72.2|
## |Marketing | 19| 74.2|
## |Operations | 20| 66.0|
##
## CSV sudah ada, skip: reports/company_7_report.csv
This practicum has demonstrated, in a progressive and integrated manner, how functions, loops, and conditional branching are sufficient to build a complete, end-to-end data science pipeline.
Tasks 1–2 established the foundation: well-validated functions and nested loops for formula evaluation and sales simulation. Tasks 3–4 extended the pattern to classification and hierarchical data generation, reflecting real-world business datasets. Task 5 applied the same loop-and-condition skeleton to stochastic simulation, proving that Monte Carlo methods require no specialised libraries — only random sampling and counting. Task 6 addressed the pre-modelling pipeline through data transformation and feature engineering, confirming that linear transformations preserve distributional shape while rescaling the axis. Task 7 consolidated all prior concepts into a full KPI dashboard at scale (700 records, 7 companies), demonstrating that the same code patterns scale effortlessly. Task 8 completed the cycle with automated report generation, where a single function called by a loop produces consistent, structured outputs for every entity — the hallmark of production-ready reporting pipelines.
Throughout all tasks, R and Python implementations were maintained in
parallel. While syntax differs — else if
vs. elif, rbind()
vs. list.append(), base R vs. pandas — the
underlying logic is identical, reinforcing language-agnostic programming
thinking that is essential in modern data science practice.