Preface

This practicum report is a consolidated document covering Task 1 through Task 8 of the Data Science Programming — Functions and Loops module. All tasks are unified into a single file to provide a coherent, end-to-end reference that traces the progressive application of core programming concepts across increasingly complex analytical scenarios.

The practicum centres on three fundamental pillars of programming: functions, loops, and conditional branching. These concepts underpin virtually all data wrangling, simulation, and automated reporting workflows in professional data science practice. Each task is implemented in both R and Python to reinforce language-agnostic thinking and to highlight syntactic differences between the two ecosystems.

All visualisations in this document are interactive, rendered via the plotly library in R and plotly in Python, allowing readers to hover for precise values, zoom into regions of interest, and toggle series visibility directly in the browser.

Research Objectives

The overarching objectives of this practicum are:

Understand and implement functions in R and Python as reusable, self-contained units of logic — including input validation and safe error handling.
Master loop constructs — both single and nested for loops — to automate repetitive computation across multi-dimensional datasets.
Apply conditional logic (if-else if-else / if-elif-else) for data-driven decision making: tiered discounts, performance classification, and KPI assignment.
Build and evaluate data simulations using Monte Carlo sampling and multi-entity nested loop structures.
Perform data transformation and feature engineering through min-max normalisation, Z-score standardisation, and categorical variable creation.
Communicate results through interactive visualisation using plotly for hover-enabled, zoomable charts.
Automate report generation by encapsulating all reporting logic inside a function called iteratively via a loop — the foundation of production-grade automated reporting pipelines.

Task 1 — Function Definition, Nested Loops & Formula Comparison

This task covers Section 3 of Data Science Programming — Functions and Loops. The specific objectives are:

Build a dynamic multi-formula function with conditional branching to evaluate four mathematical expressions.
Apply a nested loop to compute formula values across x = 1 to 20.
Implement input validation so the function fails safely on unrecognised inputs.
Produce an interactive comparative visualisation of all four formulas.

Formulas Overview

Formula Type	Expression	Behaviour
Linear	f(x) = 2x + 3	Constant growth rate — straight line
Quadratic	f(x) = x² - 2x + 1	Accelerating growth — U-shaped curve
Cubic	f(x) = 0.5x³ - 3x + 2	Initial dip, then steep rise
Exponential	f(x) = 2e^(0.2x)	Fastest growth — compounds repeatedly

Task 1.1 — Function Definition & Input Validation

compute_formula() accepts a numeric x and a string formula, returning the computed result. The else branch acts as an input guard, returning NA / None with a descriptive warning rather than crashing silently.

Feature	R	Python
Branch keyword	`else if`	`elif`
Exponential	`exp()`	`math.exp()`
Invalid input handling	`warning()` + `return(NA)`	`print()` + `return None`

— R

# compute_formula: evaluates one of four mathematical formulas at a given x.
# Returns NA with a warning if an unrecognised formula name is supplied.
compute_formula <- function(x, formula) {
  if      (formula == "linear")      return(2 * x + 3)
  else if (formula == "quadratic")   return(x^2 - 2 * x + 1)
  else if (formula == "cubic")       return(0.5 * x^3 - 3 * x + 2)
  else if (formula == "exponential") return(2 * exp(0.2 * x))
  else {
    warning(paste0("[compute_formula] Unknown formula: '", formula, "'."))
    return(NA)
  }
}

cat("=== Spot Checks (x = 5) ===\n")

## === Spot Checks (x = 5) ===

cat("linear      :", compute_formula(5, "linear"),                "\n")

## linear      : 13

cat("quadratic   :", compute_formula(5, "quadratic"),             "\n")

## quadratic   : 16

cat("cubic       :", compute_formula(5, "cubic"),                 "\n")

## cubic       : 49.5

cat("exponential :", round(compute_formula(5, "exponential"), 4), "\n")

## exponential : 5.4366

cat("\n=== Validation — Unknown Formula ===\n")

## 
## === Validation — Unknown Formula ===

cat("Returned   :", compute_formula(5, "logarithmic"), "\n")

## Returned   : NA

Task 1.2 — Nested Loop & Dataset Construction

A nested loop evaluates every formula at every x from 1 to 20, producing an 80-row dataset (4 formulas × 20 values).

— R

formula_list <- c("linear", "quadratic", "cubic", "exponential")
x_range      <- 1:20

# FIX: gunakan list lalu do.call(rbind) — jauh lebih cepat dari rbind() per iterasi
rows_list <- vector("list", length(formula_list) * length(x_range))
idx <- 1L
for (f in formula_list) {
  for (x in x_range) {
    rows_list[[idx]] <- data.frame(x = x, y = compute_formula(x, f),
                                   formula = f, stringsAsFactors = FALSE)
    idx <- idx + 1L
  }
}
results_df <- do.call(rbind, rows_list)

cat("Rows generated:", nrow(results_df),
    "(", length(formula_list), "formulas ×", length(x_range), "x values )\n\n")

## Rows generated: 80 ( 4 formulas × 20 x values )

head(results_df, 8)

Task 1.3 — Interactive Visualisation: Four Formulas on One Plot

— R

formula_colours <- c(
  "linear"      = "#2196F3",
  "quadratic"   = "#4CAF50",
  "cubic"       = "#FF9800",
  "exponential" = "#E91E63"
)
formula_labels <- c(
  "linear"      = "Linear: f(x) = 2x + 3",
  "quadratic"   = "Quadratic: f(x) = x\u00B2 - 2x + 1",
  "cubic"       = "Cubic: f(x) = 0.5x\u00B3 - 3x + 2",
  "exponential" = "Exponential: f(x) = 2e^(0.2x)"
)

p1 <- plot_ly()
for (f in formula_list) {
  df_sub <- results_df[results_df$formula == f, ]
  p1 <- add_trace(p1,
    data      = df_sub, x = ~x, y = ~y,
    type      = "scatter", mode = "lines+markers",
    name      = formula_labels[[f]],
    line      = list(color = formula_colours[[f]], width = 2.5),
    marker    = list(color = formula_colours[[f]], size = 6))
}
p1 <- layout(p1,
  title     = list(text = "Task 1: Comparison of Four Mathematical Formulas (x = 1 to 20)",
                   font = list(size = 15)),
  xaxis     = list(title = "x value", showgrid = TRUE, gridcolor = "#e8e8e8"),
  yaxis     = list(title = "f(x)",    showgrid = TRUE, gridcolor = "#e8e8e8"),
  legend    = list(title = list(text = "<b>Formula</b>")),
  hovermode = "x unified",
  plot_bgcolor  = "white",
  paper_bgcolor = "white")
p1

Task 1.4 — Descriptive Statistics per Formula

— R

summary_table <- results_df %>%
  group_by(formula) %>%
  summarise(
    Min    = round(min(y),    2),
    Max    = round(max(y),    2),
    Mean   = round(mean(y),   2),
    Median = round(median(y), 2),
    SD     = round(sd(y),     2),
    .groups = "drop"
  ) %>% rename(Formula = formula)

kable(summary_table,
      col.names = c("Formula","Min","Max","Mean","Median","Std Dev"),
      align = "lrrrrr",
      caption = "Descriptive statistics for each formula across x = 1 to 20")

Descriptive statistics for each formula across x = 1 to 20
Formula	Min	Max	Mean	Median	Std Dev
cubic	-0.50	3942.0	1073.00	553.25	1236.10
exponential	2.44	109.2	29.57	16.41	31.35
linear	5.00	43.0	24.00	24.00	11.83
quadratic	0.00	361.0	123.50	90.50	116.44

Summary — Task 1

Function design with conditional branching provides a clean, reusable interface for four distinct formula types.
Nested loops are the natural structure for two-dimensional computation, producing an 80-row dataset with minimal code.
Input validation via the else branch ensures safe failure — the function returns NA/None with a descriptive warning.
Visualisation confirms that exponential growth dominates at large x, while linear remains the most predictable throughout.

Task 2 — Nested Sales Simulation with Conditional Discounts

The objectives for Task 2 are:

Build simulate_sales() — a nested simulation generating a multi-salesperson, multi-day sales dataset.
Apply tiered conditional discount logic based on sales amount thresholds.
Compute cumulative sales per salesperson using a nested helper function.
Produce an interactive cumulative sales chart.

Simulation Parameters

Parameter / Rule	Detail
n_salesperson	Number of salespersons to simulate
days	Number of trading days per salesperson
Discount — High	sales_amount > 800 → 20% discount
Discount — Medium	sales_amount > 500 → 10% discount
No Discount	sales_amount ≤ 500 → no discount

Task 2.1 — Sales Simulation Function

— R

set.seed(42)

# simulate_sales: generates a sales dataset with tiered discount logic.
# FIX: gunakan list lalu do.call(rbind) — jauh lebih cepat dari rbind() per iterasi
simulate_sales <- function(n_salesperson, days) {
  rows_list <- vector("list", n_salesperson * days)
  idx <- 1L
  for (s in 1:n_salesperson) {
    for (d in 1:days) {
      amount   <- round(runif(1, 200, 1000), 2)
      discount <- if      (amount > 800) 0.20
                  else if (amount > 500) 0.10
                  else                  0.00
      rows_list[[idx]] <- data.frame(sales_id = s, day = d,
                                     sales_amount = amount, discount_rate = discount,
                                     stringsAsFactors = FALSE)
      idx <- idx + 1L
    }
  }
  do.call(rbind, rows_list)
}

sales_data <- simulate_sales(n_salesperson = 5, days = 10)
cat("Dataset:", nrow(sales_data), "rows ×", ncol(sales_data), "columns\n\n")

## Dataset: 50 rows × 4 columns

head(sales_data, 10)

Task 2.2 — Cumulative Sales per Salesperson

— R

calc_cumulative <- function(df, sid) {
  sub_df <- df[df$sales_id == sid, ]
  sub_df <- sub_df[order(sub_df$day), ]
  cumsum(sub_df$sales_amount)
}

# FIX: gunakan list lalu do.call(rbind)
cum_rows <- vector("list", length(unique(sales_data$sales_id)))
for (i in seq_along(unique(sales_data$sales_id))) {
  s <- sort(unique(sales_data$sales_id))[i]
  cum_vals <- calc_cumulative(sales_data, s)
  cum_rows[[i]] <- data.frame(sales_id = s, day = seq_along(cum_vals),
                               cumulative = cum_vals, stringsAsFactors = FALSE)
}
cum_df <- do.call(rbind, cum_rows)

head(cum_df, 10)

Task 2.3 — Summary Statistics & Interactive Cumulative Sales Plot

— R

summary_sales <- sales_data %>%
  group_by(sales_id) %>%
  summarise(
    Total_Sales    = round(sum(sales_amount), 2),
    Avg_Daily      = round(mean(sales_amount), 2),
    Total_Discount = round(sum(sales_amount * discount_rate), 2),
    Days_High      = sum(discount_rate == 0.20),
    Days_Medium    = sum(discount_rate == 0.10),
    Days_None      = sum(discount_rate == 0.00),
    .groups = "drop"
  ) %>% rename("Salesperson ID" = sales_id)

kable(summary_sales,
      col.names = c("Salesperson","Total Sales","Avg Daily",
                    "Total Discount","Days High","Days Medium","Days None"),
      align = "crrrrrr",
      caption = "Summary statistics per salesperson (5 persons × 10 days)")

Summary statistics per salesperson (5 persons × 10 days)
Salesperson	Total Sales	Avg Daily	Total Discount	Days High	Days Medium	Days None
1	7090.09	709.01	909.93	3	5	2
2	6720.24	672.02	890.42	3	5	2
3	6923.09	692.31	1101.12	5	3	2
4	6153.74	615.37	801.90	3	4	3
5	7066.52	706.65	1066.84	4	5	1

pal2 <- brewer.pal(5, "Set1")
p2   <- plot_ly()
for (i in seq_along(unique(cum_df$sales_id))) {
  s      <- sort(unique(cum_df$sales_id))[i]
  df_sub <- cum_df[cum_df$sales_id == s, ]
  p2 <- add_trace(p2,
    data = df_sub, x = ~day, y = ~cumulative,
    type = "scatter", mode = "lines+markers",
    name = paste("Salesperson", s),
    line   = list(color = pal2[i], width = 2.2),
    marker = list(color = pal2[i], size  = 6))
}
p2 <- layout(p2,
  title     = list(text = "Task 2: Cumulative Sales per Salesperson (10 Days)"),
  xaxis     = list(title = "Day", dtick = 1),
  yaxis     = list(title = "Cumulative Sales Amount"),
  hovermode = "x unified",
  legend    = list(title = list(text = "<b>Salesperson</b>")),
  plot_bgcolor  = "white",
  paper_bgcolor = "white")
p2

Task 3 — Multi-Level Performance Categorisation

The objectives for Task 3 are:

Build categorize_performance() — a function assigning one of five performance labels to a sales amount.
Apply the function across a vector of 100 values using a loop.
Compute the percentage distribution across all five categories.
Visualise with an interactive bar chart and pie chart.

Performance Categories

Category	Condition	Plot Colour
Excellent	sales_amount > 800	#2ecc71
Very Good	sales_amount > 650	#3498db
Good	sales_amount > 500	#f39c12
Average	sales_amount > 350	#e67e22
Poor	sales_amount ≤ 350	#e74c3c

Task 3.1 — Five-Level Classification Function

— R

set.seed(7)

categorize_performance <- function(sales_amount) {
  if      (sales_amount > 800) return("Excellent")
  else if (sales_amount > 650) return("Very Good")
  else if (sales_amount > 500) return("Good")
  else if (sales_amount > 350) return("Average")
  else                          return("Poor")
}

sales_vector <- round(runif(100, 100, 1000), 2)
# FIX: pre-alokasi vektor karakter, bukan rbind row per row
categories   <- character(length(sales_vector))
for (i in seq_along(sales_vector)) {
  categories[i] <- categorize_performance(sales_vector[i])
}

result_df <- data.frame(sales_amount = sales_vector, category = categories,
                         stringsAsFactors = FALSE)
head(result_df, 10)

Task 3.2 — Category Frequency Distribution

— R

level_order           <- c("Excellent","Very Good","Good","Average","Poor")
freq_table            <- as.data.frame(table(result_df$category))
colnames(freq_table)  <- c("Category","Count")
freq_table$Percentage <- round(freq_table$Count / nrow(result_df) * 100, 1)
freq_table$Category   <- factor(freq_table$Category, levels = level_order)
freq_table            <- freq_table[order(freq_table$Category), ]

kable(freq_table, col.names = c("Category","Count","Percentage (%)"),
      align = "lrr",
      caption = "Distribution of performance categories across 100 sales values")

Distribution of performance categories across 100 sales values
	Category	Count	Percentage (%)
2	Excellent	22	22
5	Very Good	17	17
3	Good	15	15
1	Average	21	21
4	Poor	25	25

Task 3.3 — Interactive Bar Chart & Pie Chart

— R

cat_colours  <- c("Excellent"="#2ecc71","Very Good"="#3498db",
                  "Good"="#f39c12","Average"="#e67e22","Poor"="#e74c3c")
freq_plot    <- freq_table
freq_plot$Category <- as.character(freq_plot$Category)

p_bar3 <- plot_ly(freq_plot,
  x      = ~Category, y = ~Count,
  type   = "bar",
  marker = list(color = unname(cat_colours[freq_plot$Category])),
  text   = ~paste0(Percentage, "%"),
  textposition = "outside",
  hovertemplate = "<b>%{x}</b><br>Count: %{y}<br>Share: %{text}<extra></extra>"
) %>% layout(
  title  = "Task 3: Performance Category Distribution (n = 100)",
  xaxis  = list(title = "Category", categoryorder = "array",
                categoryarray = level_order),
  yaxis  = list(title = "Count"),
  showlegend   = FALSE,
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

p_pie3 <- plot_ly(freq_plot,
  labels  = ~Category, values = ~Count,
  type    = "pie",
  marker  = list(colors = unname(cat_colours[freq_plot$Category])),
  textinfo      = "label+percent",
  hovertemplate = "<b>%{label}</b><br>Count: %{value}<br>%{percent}<extra></extra>"
) %>% layout(
  title         = "Task 3: Proportional Share by Category",
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

p_bar3

p_pie3

Task 4 — Multi-Company Dataset Simulation

The objectives for Task 4 are:

Build generate_company_data() with a nested loop to simulate employee records across multiple companies.
Apply conditional KPI assignment for top performers (performance score > 75).
Produce a per-company summary table.
Visualise with an interactive grouped bar chart and scatter plot.

Dataset Schema

Column	Type	Description
company_id	integer	Company identifier (1 to n_company)
employee_id	integer	Employee identifier (1 to n_employees per company)
salary	numeric	Monthly salary in thousands — Uniform(3, 15)
department	character	One of: HR, IT, Finance, Marketing, Operations
performance_score	numeric	Performance score — Normal(70, 15), capped to [0, 100]
KPI_score	numeric	KPI: Uniform(91,100) if top performer, else Uniform(50,89)

Task 4.1 — Company Data Generation Function

— R

set.seed(123)

# FIX: gunakan list lalu do.call(rbind) — jauh lebih cepat dari rbind() per iterasi
generate_company_data <- function(n_company, n_employees) {
  departments <- c("HR","IT","Finance","Marketing","Operations")
  rows_list   <- vector("list", n_company * n_employees)
  idx <- 1L
  for (c in 1:n_company) {
    for (e in 1:n_employees) {
      salary <- round(runif(1, 3, 15), 2)
      dept   <- sample(departments, 1)
      perf   <- round(max(0, min(100, rnorm(1, mean = 70, sd = 15))), 1)
      kpi    <- if (perf > 75) round(runif(1, 91, 100), 1) else round(runif(1, 50, 89), 1)
      rows_list[[idx]] <- data.frame(company_id = c, employee_id = e, salary = salary,
                                     department = dept, performance_score = perf,
                                     KPI_score = kpi, stringsAsFactors = FALSE)
      idx <- idx + 1L
    }
  }
  do.call(rbind, rows_list)
}

company_data <- generate_company_data(n_company = 4, n_employees = 20)
cat("Dataset:", nrow(company_data), "rows ×", ncol(company_data), "columns\n\n")

## Dataset: 80 rows × 6 columns

head(company_data, 8)

Task 4.2 — Per-Company Summary Table

— R

company_summary <- company_data %>%
  group_by(company_id) %>%
  summarise(
    Avg_Salary      = round(mean(salary), 2),
    Avg_Performance = round(mean(performance_score), 1),
    Max_KPI         = round(max(KPI_score), 1),
    Top_Performers  = sum(performance_score > 75),
    .groups = "drop"
  ) %>% rename("Company ID" = company_id)

kable(company_summary,
      col.names = c("Company","Avg Salary (k)","Avg Performance","Max KPI","Top Performers"),
      align = "crrrc",
      caption = "Per-company summary: 4 companies × 20 employees each")

Per-company summary: 4 companies × 20 employees each
Company	Avg Salary (k)	Avg Performance	Max KPI	Top Performers
1	9.31	73.3	97.4	8
2	9.29	71.0	99.4	7
3	9.92	67.8	99.6	5
4	8.71	71.8	99.2	6

Task 4.3 — Interactive Grouped Bar Chart & Scatter Plot

— R

bar_data <- company_summary %>%
  rename(company_id = "Company ID") %>%
  select(company_id, Avg_Salary, Avg_Performance) %>%
  pivot_longer(cols = c(Avg_Salary, Avg_Performance),
               names_to = "Metric", values_to = "Value")

p_bar4 <- plot_ly(bar_data,
  x      = ~factor(company_id), y = ~Value,
  color  = ~Metric,
  colors = c("Avg_Salary" = "#3498db","Avg_Performance" = "#e74c3c"),
  type   = "bar",
  text   = ~round(Value, 2), textposition = "outside"
) %>% layout(
  barmode = "group",
  title   = "Task 4: Average Salary vs. Performance by Company",
  xaxis   = list(title = "Company ID"),
  yaxis   = list(title = "Value"),
  legend  = list(title = list(text = "<b>Metric</b>")),
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

company_data$company_label <- paste("Company", company_data$company_id)

p_scatter4 <- plot_ly(company_data,
  x      = ~performance_score, y = ~KPI_score,
  color  = ~company_label,
  type   = "scatter", mode = "markers",
  marker = list(size = 8, opacity = 0.7),
  hovertemplate = "<b>%{color}</b><br>Performance: %{x}<br>KPI: %{y}<extra></extra>"
) %>% layout(
  title  = "Task 4: Performance Score vs. KPI Score (per Employee)",
  xaxis  = list(title = "Performance Score"),
  yaxis  = list(title = "KPI Score"),
  legend = list(title = list(text = "<b>Company</b>")),
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

p_bar4

p_scatter4

Task 5 — Monte Carlo Simulation: π Estimation & Probability

The objectives for Task 5 are:

Build monte_carlo_pi() to estimate π by random point sampling in a unit square.
Use a loop to count points falling inside the unit circle.
Compute the probability of a point landing in a sub-square region.
Visualise with an interactive scatter plot.

The Monte Carlo Principle

Item	Description
Unit square	x ∈ [0,1], y ∈ [0,1] — area = 1
Quarter-circle	x² + y² ≤ 1 — area = π/4
Key ratio	points inside circle / total ≈ π/4
π estimate	π ≈ 4 × (points inside / total points)
Sub-square	x ∈ [0,0.5], y ∈ [0,0.5] — area = 0.25 → P ≈ 0.25

Task 5.1 — Monte Carlo π Estimation

— R

set.seed(2024)

# FIX: pre-alokasi vektor numerik, bukan rbind per iterasi
monte_carlo_pi <- function(n_points) {
  inside <- 0L; outside <- 0L
  # Pre-alokasi vektor untuk plotting (max 2000 titik)
  plot_n  <- min(n_points, 2000L)
  pts_x   <- numeric(plot_n)
  pts_y   <- numeric(plot_n)
  pts_st  <- character(plot_n)

  for (i in seq_len(n_points)) {
    x <- runif(1); y <- runif(1)
    if (x^2 + y^2 <= 1) { inside  <- inside  + 1L; status <- "inside"  }
    else                 { outside <- outside + 1L; status <- "outside" }
    if (i <= plot_n) {
      pts_x[i] <- x; pts_y[i] <- y; pts_st[i] <- status
    }
  }
  pts <- data.frame(x = pts_x, y = pts_y, status = pts_st, stringsAsFactors = FALSE)
  list(pi_estimate   = 4 * inside / n_points,
       inside_count  = inside,
       outside_count = outside,
       points        = pts)
}

mc_result <- monte_carlo_pi(10000)

cat("=== Monte Carlo π Estimation (n = 10,000) ===\n")

## === Monte Carlo π Estimation (n = 10,000) ===

cat("Points inside circle :", mc_result$inside_count,                      "\n")

## Points inside circle : 7865

cat("Points outside circle:", mc_result$outside_count,                     "\n")

## Points outside circle: 2135

cat("Estimated π          :", round(mc_result$pi_estimate, 6),              "\n")

## Estimated π          : 3.146

cat("True π               :", round(pi, 6),                                "\n")

## True π               : 3.141593

cat("Absolute error       :", round(abs(mc_result$pi_estimate - pi), 6),   "\n")

## Absolute error       : 0.004407

Task 5.2 — Sub-Square Probability Analysis

— R

set.seed(2024)
all_x <- runif(10000); all_y <- runif(10000)

in_subsquare   <- sum(all_x <= 0.5 & all_y <= 0.5)
prob_subsquare <- round(in_subsquare / 10000, 4)

cat("=== Sub-Square Probability (x ≤ 0.5 AND y ≤ 0.5) ===\n")

## === Sub-Square Probability (x ≤ 0.5 AND y ≤ 0.5) ===

cat("Points in sub-square  :", in_subsquare,                           "\n")

## Points in sub-square  : 2549

cat("Estimated probability :", prob_subsquare,                         "\n")

## Estimated probability : 0.2549

cat("Theoretical P         : 0.2500\n")

## Theoretical P         : 0.2500

cat("Absolute error        :", round(abs(prob_subsquare - 0.25), 4),  "\n")

## Absolute error        : 0.0049

Task 5.3 — Interactive Scatter Plot: Inside vs. Outside the Circle

— R

points_df <- mc_result$points
theta_seq <- seq(0, pi / 2, length.out = 200)
arc_df    <- data.frame(x = cos(theta_seq), y = sin(theta_seq))

p_mc <- plot_ly() %>%
  add_trace(data = points_df[points_df$status == "outside", ],
    x = ~x, y = ~y, type = "scatter", mode = "markers",
    name   = "Outside circle",
    marker = list(color = "#e74c3c", size = 3, opacity = 0.5)) %>%
  add_trace(data = points_df[points_df$status == "inside", ],
    x = ~x, y = ~y, type = "scatter", mode = "markers",
    name   = "Inside circle",
    marker = list(color = "#3498db", size = 3, opacity = 0.5)) %>%
  add_trace(data = arc_df, x = ~x, y = ~y,
    type = "scatter", mode = "lines", name = "Circle boundary",
    line = list(color = "black", width = 1.8)) %>%
  layout(
    title = list(text = paste0(
      "Task 5: Monte Carlo Simulation (2,000 points shown)<br>",
      "<sub>Estimated π = ", round(mc_result$pi_estimate, 5),
      " | True π = ", round(pi, 5), "</sub>")),
    xaxis   = list(title = "x", range = c(0, 1), scaleanchor = "y"),
    yaxis   = list(title = "y", range = c(0, 1)),
    shapes  = list(list(
      type = "rect", x0 = 0, x1 = 0.5, y0 = 0, y1 = 0.5,
      line = list(color = "purple", dash = "dash", width = 1.5),
      fillcolor = "rgba(128,0,128,0.05)")),
    annotations = list(list(
      x = 0.25, y = 0.55, text = "Sub-square (P ≈ 0.25)",
      showarrow = FALSE, font = list(color = "purple", size = 11))),
    plot_bgcolor  = "white",
    paper_bgcolor = "white")
p_mc

Task 6 — Advanced Data Transformation & Feature Engineering

The objectives for Task 6 are:

Build normalize_columns() — loop-based min-max normalisation (scales to [0, 1]).
Build z_score() — loop-based Z-score standardisation (mean = 0, sd = 1).
Engineer two categorical features: performance_category and salary_bracket.
Compare distributions before and after transformation with interactive histograms.

Transformation Reference

Method	Formula	Output Range	Typical Use Case
Min-Max Normalisation	x_norm = (x − min) / (max − min)	[0, 1] — bounded	Distance-based models, neural networks
Z-Score Standardisation	x_z = (x − mean) / sd	Unbounded — centred at 0	Regression, clustering — unit variance

Task 6.1 — Normalisation Functions

— R

set.seed(99)

# FIX: generate_company_data_t6 juga menggunakan list + do.call(rbind)
generate_company_data_t6 <- function(n_company, n_employees) {
  departments <- c("HR","IT","Finance","Marketing","Operations")
  rows_list   <- vector("list", n_company * n_employees)
  idx <- 1L
  for (c in 1:n_company) {
    for (e in 1:n_employees) {
      salary <- round(runif(1, 3, 15), 2)
      dept   <- sample(departments, 1)
      perf   <- round(max(0, min(100, rnorm(1, 70, 15))), 1)
      kpi    <- if (perf > 75) round(runif(1, 91, 100), 1) else round(runif(1, 50, 89), 1)
      rows_list[[idx]] <- data.frame(company_id = c, employee_id = e, salary = salary,
                                     department = dept, performance_score = perf,
                                     KPI_score = kpi, stringsAsFactors = FALSE)
      idx <- idx + 1L
    }
  }
  do.call(rbind, rows_list)
}

company_data_t6 <- generate_company_data_t6(4, 20)

normalize_columns <- function(df) {
  df_norm <- df
  for (col in names(df_norm)) {
    if (is.numeric(df_norm[[col]])) {
      col_min <- min(df_norm[[col]], na.rm = TRUE)
      col_max <- max(df_norm[[col]], na.rm = TRUE)
      if (col_max != col_min)
        df_norm[[col]] <- round((df_norm[[col]] - col_min) / (col_max - col_min), 4)
    }
  }
  return(df_norm)
}

z_score <- function(df) {
  df_z <- df
  for (col in names(df_z)) {
    if (is.numeric(df_z[[col]])) {
      col_mean <- mean(df_z[[col]], na.rm = TRUE)
      col_sd   <- sd(df_z[[col]],   na.rm = TRUE)
      if (col_sd != 0)
        df_z[[col]] <- round((df_z[[col]] - col_mean) / col_sd, 4)
    }
  }
  return(df_z)
}

df_norm_t6 <- normalize_columns(company_data_t6)
df_z_t6    <- z_score(company_data_t6)

cat("=== Original (first 5 rows) ===\n")

## === Original (first 5 rows) ===

head(company_data_t6[, c("salary","performance_score","KPI_score")], 5)

cat("=== After min-max normalisation ===\n")

## === After min-max normalisation ===

head(df_norm_t6[, c("salary","performance_score","KPI_score")], 5)

cat("=== After Z-score standardisation ===\n")

## === After Z-score standardisation ===

head(df_z_t6[, c("salary","performance_score","KPI_score")], 5)

Task 6.2 — Feature Engineering

— R

company_data_t6$performance_category <- NA_character_
company_data_t6$salary_bracket       <- NA_character_

for (i in seq_len(nrow(company_data_t6))) {
  perf   <- company_data_t6$performance_score[i]
  salary <- company_data_t6$salary[i]

  company_data_t6$performance_category[i] <-
    if      (perf > 80) "Excellent"
    else if (perf > 65) "Very Good"
    else if (perf > 50) "Good"
    else if (perf > 35) "Average"
    else                "Poor"

  company_data_t6$salary_bracket[i] <-
    if      (salary > 10) "High"
    else if (salary >  6) "Mid"
    else                  "Low"
}

head(company_data_t6[, c("salary","performance_score",
                           "performance_category","salary_bracket")], 8)

Task 6.3 — Interactive Distribution Comparison

— R

sal_compare <- data.frame(
  Original   = company_data_t6$salary,
  Normalised = df_norm_t6$salary,
  Z_Score    = df_z_t6$salary
) %>% pivot_longer(everything(), names_to = "Transformation", values_to = "Value")

perf_compare <- data.frame(
  Original   = company_data_t6$performance_score,
  Normalised = df_norm_t6$performance_score,
  Z_Score    = df_z_t6$performance_score
) %>% pivot_longer(everything(), names_to = "Transformation", values_to = "Value")

tr_colours <- c("Original"="#3498db","Normalised"="#2ecc71","Z_Score"="#e74c3c")

p_sal_hist <- plot_ly()
for (tr in c("Original","Normalised","Z_Score")) {
  vals <- sal_compare$Value[sal_compare$Transformation == tr]
  p_sal_hist <- add_trace(p_sal_hist,
    x = vals, type = "histogram", nbinsx = 15, name = tr,
    marker = list(color = tr_colours[[tr]], opacity = 0.7))
}
p_sal_hist <- layout(p_sal_hist,
  barmode = "overlay",
  title   = "Task 6: Salary Distribution — Before vs. After Transformation",
  xaxis   = list(title = "Value"), yaxis = list(title = "Count"),
  plot_bgcolor = "white", paper_bgcolor = "white")

p_perf_hist <- plot_ly()
for (tr in c("Original","Normalised","Z_Score")) {
  vals <- perf_compare$Value[perf_compare$Transformation == tr]
  p_perf_hist <- add_trace(p_perf_hist,
    x = vals, type = "histogram", nbinsx = 15, name = tr,
    marker = list(color = tr_colours[[tr]], opacity = 0.7))
}
p_perf_hist <- layout(p_perf_hist,
  barmode = "overlay",
  title   = "Task 6: Performance Score Distribution — Before vs. After Transformation",
  xaxis   = list(title = "Value"), yaxis = list(title = "Count"),
  plot_bgcolor = "white", paper_bgcolor = "white")

p_sal_box <- plot_ly(sal_compare,
  x = ~Transformation, y = ~Value,
  color = ~Transformation, colors = tr_colours,
  type  = "box"
) %>% layout(
  title  = "Task 6: Salary — Boxplot Comparison",
  xaxis  = list(title = ""), yaxis = list(title = "Value"),
  showlegend   = FALSE,
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

p_sal_hist

p_perf_hist

p_sal_box

Task 7 — Mini Project: Company KPI Dashboard

Task 7 is the capstone mini project integrating all concepts from Tasks 1–6. The objectives are:

Generate a large-scale dataset for 7 companies, each with 100 employees.
Categorise every employee into KPI tiers using a loop.
Summarise per-company statistics.
Produce a full interactive dashboard of four charts.

Dataset Scale

Parameter	Value / Definition
Companies	7
Employees per company	100
Total records	700
KPI Tier — High	KPI_score > 85
KPI Tier — Mid	KPI_score 65–85
KPI Tier — Low	KPI_score < 65

Task 7.1 — Large-Scale Dataset Generation

— R

set.seed(777)

# FIX: gunakan list + do.call(rbind) untuk dataset 700 baris
generate_company_data_t7 <- function(n_company, n_employees) {
  departments <- c("HR","IT","Finance","Marketing","Operations")
  rows_list   <- vector("list", n_company * n_employees)
  idx <- 1L
  for (c in 1:n_company) {
    for (e in 1:n_employees) {
      salary <- round(runif(1, 3, 15), 2)
      dept   <- sample(departments, 1)
      perf   <- round(max(0, min(100, rnorm(1, 70, 15))), 1)
      kpi    <- if (perf > 75) round(runif(1, 85, 100), 1) else round(runif(1, 40, 84), 1)
      rows_list[[idx]] <- data.frame(company_id = c, employee_id = e, salary = salary,
                                     department = dept, performance_score = perf,
                                     KPI_score = kpi, stringsAsFactors = FALSE)
      idx <- idx + 1L
    }
  }
  do.call(rbind, rows_list)
}

kpi_df <- generate_company_data_t7(n_company = 7, n_employees = 100)
cat("Dataset:", nrow(kpi_df), "rows ×", ncol(kpi_df), "columns\n\n")

## Dataset: 700 rows × 6 columns

head(kpi_df, 6)

Task 7.2 — KPI Tier Assignment & Company Summary

— R

# FIX: gunakan ifelse() vectorised — jauh lebih cepat dari loop per baris untuk 700 baris
kpi_df$KPI_tier <- ifelse(kpi_df$KPI_score > 85, "High",
                    ifelse(kpi_df$KPI_score >= 65, "Mid", "Low"))

company_summary_t7 <- kpi_df %>%
  group_by(company_id) %>%
  summarise(
    Avg_Salary = round(mean(salary), 2),
    Avg_KPI    = round(mean(KPI_score), 1),
    Top_N      = sum(KPI_tier == "High"),
    Mid_N      = sum(KPI_tier == "Mid"),
    Low_N      = sum(KPI_tier == "Low"),
    .groups = "drop"
  )

kable(company_summary_t7,
      col.names = c("Company","Avg Salary (k)","Avg KPI","KPI High","KPI Mid","KPI Low"),
      align     = "crrrrrr",
      caption   = "KPI dashboard summary: 7 companies × 100 employees each")

KPI dashboard summary: 7 companies × 100 employees each
Company	Avg Salary (k)	Avg KPI	KPI High	KPI Mid	KPI Low
1	8.80	70.9	30	30	40
2	8.99	73.7	33	34	33
3	8.34	73.5	37	26	37
4	8.30	73.9	39	26	35
5	8.88	73.7	36	30	34
6	8.58	74.9	45	24	31
7	8.43	70.5	32	24	44

Task 7.3 — Interactive KPI Dashboard

— R

tier_colours <- c("High"="#2ecc71","Mid"="#f39c12","Low"="#e74c3c")

tier_counts <- kpi_df %>%
  group_by(company_id, KPI_tier) %>%
  summarise(count = n(), .groups = "drop")

p_tier <- plot_ly(tier_counts,
  x      = ~factor(company_id), y = ~count,
  color  = ~KPI_tier, colors = tier_colours,
  type   = "bar",
  hovertemplate = "Company %{x} | Tier: %{color} | Count: %{y}<extra></extra>"
) %>% layout(
  barmode = "group",
  title   = "Chart 1: KPI Tier Distribution per Company",
  xaxis   = list(title = "Company ID"),
  yaxis   = list(title = "Number of Employees"),
  legend  = list(title = list(text = "<b>KPI Tier</b>")),
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

dept_top <- kpi_df %>%
  filter(KPI_tier == "High") %>%
  group_by(department) %>%
  summarise(top_count = n(), .groups = "drop") %>%
  arrange(top_count)

p_dept <- plot_ly(dept_top,
  y = ~department, x = ~top_count,
  type = "bar", orientation = "h",
  marker = list(color = "#3498db", opacity = 0.85),
  hovertemplate = "%{y}: %{x} top performers<extra></extra>"
) %>% layout(
  title  = "Chart 2: Top Performers by Department (KPI > 85)",
  xaxis  = list(title = "Count of Top Performers"),
  yaxis  = list(title = ""),
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

kpi_df$company_label <- paste("Company", kpi_df$company_id)

p_salary <- plot_ly(kpi_df,
  x = ~company_label, y = ~salary,
  color = ~company_label, type = "box", boxpoints = "outliers",
  hovertemplate = "%{x}<br>Salary: %{y:.2f}k<extra></extra>"
) %>% layout(
  title      = "Chart 3: Salary Distribution per Company",
  xaxis      = list(title = "Company"),
  yaxis      = list(title = "Salary (thousands)"),
  showlegend = FALSE,
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

p_perf_kpi <- plot_ly(kpi_df,
  x = ~performance_score, y = ~KPI_score,
  color = ~company_label, type = "scatter", mode = "markers",
  marker = list(size = 5, opacity = 0.6),
  hovertemplate = "<b>%{color}</b><br>Performance: %{x}<br>KPI: %{y}<extra></extra>"
) %>% layout(
  title     = "Chart 4: Performance Score vs. KPI Score",
  xaxis     = list(title = "Performance Score"),
  yaxis     = list(title = "KPI Score"),
  legend    = list(title = list(text = "<b>Company</b>")),
  hovermode = "closest",
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

p_tier

p_dept

p_salary

p_perf_kpi

Task 8 (Bonus) — Automated Report Generation

The objective of Task 8 is to demonstrate automated report generation: one function encapsulates all reporting logic for a single company, and a loop calls it for every company — producing consistent, structured outputs at scale without any repetitive code.

Task 8.1 — Report Generation Function

— R

set.seed(777)

# FIX: gunakan list + do.call(rbind) untuk dataset 700 baris
generate_company_data_t8 <- function(n_company, n_employees) {
  departments <- c("HR","IT","Finance","Marketing","Operations")
  rows_list   <- vector("list", n_company * n_employees)
  idx <- 1L
  for (c in 1:n_company) {
    for (e in 1:n_employees) {
      salary <- round(runif(1, 3, 15), 2)
      dept   <- sample(departments, 1)
      perf   <- round(max(0, min(100, rnorm(1, 70, 15))), 1)
      kpi    <- if (perf > 75) round(runif(1, 85, 100), 1) else round(runif(1, 40, 84), 1)
      rows_list[[idx]] <- data.frame(company_id = c, employee_id = e, salary = salary,
                                     department = dept, performance_score = perf,
                                     KPI_score = kpi, stringsAsFactors = FALSE)
      idx <- idx + 1L
    }
  }
  do.call(rbind, rows_list)
}

kpi_df_t8 <- generate_company_data_t8(7, 100)
kpi_df_t8$KPI_tier <- ifelse(kpi_df_t8$KPI_score > 85, "High",
                       ifelse(kpi_df_t8$KPI_score >= 65, "Mid", "Low"))

generate_company_report <- function(df, cid) {
  co_df        <- df %>% filter(company_id == cid)
  tier_colours <- c("High"="#2ecc71","Mid"="#f39c12","Low"="#e74c3c")

  stats <- data.frame(
    Metric = c("Total Employees","Avg Salary (k)","Avg Performance Score",
               "Avg KPI Score","KPI High (>85)","KPI Mid (65–85)","KPI Low (<65)"),
    Value  = c(nrow(co_df),
               round(mean(co_df$salary), 2),
               round(mean(co_df$performance_score), 1),
               round(mean(co_df$KPI_score), 1),
               sum(co_df$KPI_tier == "High"),
               sum(co_df$KPI_tier == "Mid"),
               sum(co_df$KPI_tier == "Low")))

  cat("\n══════════════════════════════════════════════\n")
  cat(paste0("  COMPANY ", cid, "  —  AUTOMATED KPI REPORT\n"))
  cat("══════════════════════════════════════════════\n\n")

  cat("[ Summary Statistics ]\n")
  print(kable(stats, col.names = c("Metric","Value"), align = "lr"))

  top_df <- co_df %>%
    filter(KPI_tier == "High") %>%
    select(employee_id, department, salary, performance_score, KPI_score) %>%
    arrange(desc(KPI_score)) %>%
    head(5)

  cat("\n[ Top 5 Performers ]\n")
  print(kable(top_df,
    col.names = c("Employee ID","Department","Salary (k)","Performance","KPI"),
    align     = "clrrr"))

  dept_summary <- co_df %>%
    group_by(department) %>%
    summarise(Count = n(), Avg_KPI = round(mean(KPI_score), 1), .groups = "drop")

  cat("\n[ Department Breakdown ]\n")
  print(kable(dept_summary,
    col.names = c("Department","Headcount","Avg KPI"), align = "lrr"))

  p1 <- plot_ly(co_df, x = ~department, color = ~KPI_tier, colors = tier_colours,
    type = "histogram", barnorm = "fraction",
    hovertemplate = "%{x} — %{color}: %{y:.1%}<extra></extra>"
  ) %>% layout(
    title   = paste0("Company ", cid, ": KPI Tier Proportion by Department"),
    xaxis   = list(title = "Department"),
    yaxis   = list(title = "Proportion", tickformat = ".0%"),
    barmode = "stack",
    plot_bgcolor  = "white",
    paper_bgcolor = "white")

  p2 <- plot_ly(co_df, x = ~salary, color = ~KPI_tier, colors = tier_colours,
    type = "histogram", nbinsx = 15, opacity = 0.8,
    hovertemplate = "Salary: %{x:.1f}k<br>Count: %{y}<extra></extra>"
  ) %>% layout(
    barmode = "overlay",
    title   = paste0("Company ", cid, ": Salary Distribution by KPI Tier"),
    xaxis   = list(title = "Salary (thousands)"),
    yaxis   = list(title = "Count"),
    plot_bgcolor  = "white",
    paper_bgcolor = "white")

  print(p1)
  print(p2)

  # FIX: CSV hanya ditulis jika belum ada — mencegah duplikasi di setiap run/knit
  folder    <- "reports"
  if (!dir.exists(folder)) dir.create(folder)
  file_name <- file.path(folder, paste0("company_", cid, "_report.csv"))

  if (!file.exists(file_name)) {
    write.csv(co_df, file = file_name, row.names = FALSE)
    cat(paste0("\n  CSV exported: ", file_name, "\n\n"))
  } else {
    cat(paste0("\n  CSV sudah ada, skip: ", file_name, "\n\n"))
  }
}

Task 8.2 — Automated Loop: One Loop, Seven Reports

— R

# A single for loop produces a complete report for every company automatically.
for (cid in sort(unique(kpi_df_t8$company_id))) {
  generate_company_report(kpi_df_t8, cid)
}

## 
## ══════════════════════════════════════════════
##   COMPANY 1  —  AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
## 
## [ Summary Statistics ]
## 
## 
## |Metric                | Value|
## |:---------------------|-----:|
## |Total Employees       | 100.0|
## |Avg Salary (k)        |   8.8|
## |Avg Performance Score |  69.7|
## |Avg KPI Score         |  70.9|
## |KPI High (>85)        |  30.0|
## |KPI Mid (65–85)       |  30.0|
## |KPI Low (<65)         |  40.0|
## 
## [ Top 5 Performers ]
## 
## 
## | Employee ID |Department | Salary (k)| Performance|  KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## |     49      |Operations |       5.26|        94.8| 99.9|
## |     94      |Marketing  |       3.00|        89.1| 99.7|
## |     86      |Finance    |       9.80|        85.6| 99.0|
## |      8      |HR         |       6.98|       100.0| 98.6|
## |     79      |Marketing  |       3.61|        79.6| 98.4|
## 
## [ Department Breakdown ]
## 
## 
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance    |        22|    61.2|
## |HR         |        19|    77.1|
## |IT         |        25|    70.9|
## |Marketing  |        18|    73.9|
## |Operations |        16|    73.7|

## 
##   CSV sudah ada, skip: reports/company_1_report.csv
## 
## 
## ══════════════════════════════════════════════
##   COMPANY 2  —  AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
## 
## [ Summary Statistics ]
## 
## 
## |Metric                |  Value|
## |:---------------------|------:|
## |Total Employees       | 100.00|
## |Avg Salary (k)        |   8.99|
## |Avg Performance Score |  70.40|
## |Avg KPI Score         |  73.70|
## |KPI High (>85)        |  33.00|
## |KPI Mid (65–85)       |  34.00|
## |KPI Low (<65)         |  33.00|
## 
## [ Top 5 Performers ]
## 
## 
## | Employee ID |Department | Salary (k)| Performance|  KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## |     85      |Marketing  |      12.87|        82.0| 99.8|
## |     93      |Marketing  |      11.43|        88.2| 99.0|
## |     97      |Operations |      12.72|        81.6| 98.9|
## |     98      |Marketing  |       4.03|        92.7| 98.8|
## |     39      |Marketing  |       8.39|        83.5| 98.2|
## 
## [ Department Breakdown ]
## 
## 
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance    |        24|    72.5|
## |HR         |        17|    72.4|
## |IT         |        22|    69.8|
## |Marketing  |        22|    77.9|
## |Operations |        15|    76.5|

## 
##   CSV sudah ada, skip: reports/company_2_report.csv
## 
## 
## ══════════════════════════════════════════════
##   COMPANY 3  —  AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
## 
## [ Summary Statistics ]
## 
## 
## |Metric                |  Value|
## |:---------------------|------:|
## |Total Employees       | 100.00|
## |Avg Salary (k)        |   8.34|
## |Avg Performance Score |  70.50|
## |Avg KPI Score         |  73.50|
## |KPI High (>85)        |  37.00|
## |KPI Mid (65–85)       |  26.00|
## |KPI Low (<65)         |  37.00|
## 
## [ Top 5 Performers ]
## 
## 
## | Employee ID |Department | Salary (k)| Performance|  KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## |     51      |Marketing  |       3.84|       100.0| 99.7|
## |     70      |Operations |      13.13|        79.8| 99.7|
## |     11      |Operations |       6.45|        84.5| 99.3|
## |     59      |Finance    |       5.62|        90.1| 98.9|
## |     26      |HR         |      13.20|        79.9| 98.7|
## 
## [ Department Breakdown ]
## 
## 
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance    |        23|    69.7|
## |HR         |        17|    81.0|
## |IT         |        20|    70.2|
## |Marketing  |        21|    73.8|
## |Operations |        19|    74.6|

## 
##   CSV sudah ada, skip: reports/company_3_report.csv
## 
## 
## ══════════════════════════════════════════════
##   COMPANY 4  —  AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
## 
## [ Summary Statistics ]
## 
## 
## |Metric                | Value|
## |:---------------------|-----:|
## |Total Employees       | 100.0|
## |Avg Salary (k)        |   8.3|
## |Avg Performance Score |  70.4|
## |Avg KPI Score         |  73.9|
## |KPI High (>85)        |  39.0|
## |KPI Mid (65–85)       |  26.0|
## |KPI Low (<65)         |  35.0|
## 
## [ Top 5 Performers ]
## 
## 
## | Employee ID |Department | Salary (k)| Performance|  KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## |     37      |Operations |      13.87|        75.9| 99.4|
## |     19      |HR         |       6.70|        75.6| 99.0|
## |     79      |Operations |      14.66|       100.0| 98.6|
## |     72      |Marketing  |      14.38|        77.0| 98.0|
## |     28      |Finance    |       3.52|        90.2| 97.2|
## 
## [ Department Breakdown ]
## 
## 
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance    |        19|    75.2|
## |HR         |        24|    72.6|
## |IT         |        16|    73.5|
## |Marketing  |        15|    77.4|
## |Operations |        26|    72.5|

## 
##   CSV sudah ada, skip: reports/company_4_report.csv
## 
## 
## ══════════════════════════════════════════════
##   COMPANY 5  —  AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
## 
## [ Summary Statistics ]
## 
## 
## |Metric                |  Value|
## |:---------------------|------:|
## |Total Employees       | 100.00|
## |Avg Salary (k)        |   8.88|
## |Avg Performance Score |  70.10|
## |Avg KPI Score         |  73.70|
## |KPI High (>85)        |  36.00|
## |KPI Mid (65–85)       |  30.00|
## |KPI Low (<65)         |  34.00|
## 
## [ Top 5 Performers ]
## 
## 
## | Employee ID |Department | Salary (k)| Performance|   KPI|
## |:-----------:|:----------|----------:|-----------:|-----:|
## |     47      |Marketing  |       8.21|        79.2| 100.0|
## |      6      |Finance    |      10.40|        86.7|  99.8|
## |     67      |IT         |      11.92|        85.5|  99.7|
## |     98      |HR         |       7.73|        76.9|  99.5|
## |     86      |IT         |      13.85|        76.8|  99.1|
## 
## [ Department Breakdown ]
## 
## 
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance    |        21|    74.9|
## |HR         |        27|    73.3|
## |IT         |        23|    72.8|
## |Marketing  |        16|    74.9|
## |Operations |        13|    73.1|

## 
##   CSV sudah ada, skip: reports/company_5_report.csv
## 
## 
## ══════════════════════════════════════════════
##   COMPANY 6  —  AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
## 
## [ Summary Statistics ]
## 
## 
## |Metric                |  Value|
## |:---------------------|------:|
## |Total Employees       | 100.00|
## |Avg Salary (k)        |   8.58|
## |Avg Performance Score |  71.20|
## |Avg KPI Score         |  74.90|
## |KPI High (>85)        |  45.00|
## |KPI Mid (65–85)       |  24.00|
## |KPI Low (<65)         |  31.00|
## 
## [ Top 5 Performers ]
## 
## 
## | Employee ID |Department | Salary (k)| Performance|  KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## |     51      |IT         |      14.91|        77.1| 99.5|
## |     19      |Finance    |      11.75|        80.5| 99.1|
## |     25      |Operations |       5.88|        78.2| 99.0|
## |     34      |IT         |       8.45|        78.5| 98.3|
## |     41      |IT         |       7.79|       100.0| 98.1|
## 
## [ Department Breakdown ]
## 
## 
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance    |        18|    75.7|
## |HR         |        22|    71.2|
## |IT         |        21|    80.3|
## |Marketing  |        20|    75.7|
## |Operations |        19|    71.6|

## 
##   CSV sudah ada, skip: reports/company_6_report.csv
## 
## 
## ══════════════════════════════════════════════
##   COMPANY 7  —  AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
## 
## [ Summary Statistics ]
## 
## 
## |Metric                |  Value|
## |:---------------------|------:|
## |Total Employees       | 100.00|
## |Avg Salary (k)        |   8.43|
## |Avg Performance Score |  69.40|
## |Avg KPI Score         |  70.50|
## |KPI High (>85)        |  32.00|
## |KPI Mid (65–85)       |  24.00|
## |KPI Low (<65)         |  44.00|
## 
## [ Top 5 Performers ]
## 
## 
## | Employee ID |Department | Salary (k)| Performance|  KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## |     79      |Operations |       3.18|        77.7| 99.0|
## |     11      |Operations |       9.97|        90.2| 98.9|
## |     64      |Marketing  |       5.36|        77.8| 98.0|
## |     40      |HR         |      12.21|        95.0| 97.9|
## |     86      |Finance    |      13.85|        76.2| 97.9|
## 
## [ Department Breakdown ]
## 
## 
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance    |        26|    69.9|
## |HR         |        22|    71.1|
## |IT         |        13|    72.2|
## |Marketing  |        19|    74.2|
## |Operations |        20|    66.0|

## 
##   CSV sudah ada, skip: reports/company_7_report.csv

Conclusion

This practicum has demonstrated, in a progressive and integrated manner, how functions, loops, and conditional branching are sufficient to build a complete, end-to-end data science pipeline.

Tasks 1–2 established the foundation: well-validated functions and nested loops for formula evaluation and sales simulation. Tasks 3–4 extended the pattern to classification and hierarchical data generation, reflecting real-world business datasets. Task 5 applied the same loop-and-condition skeleton to stochastic simulation, proving that Monte Carlo methods require no specialised libraries — only random sampling and counting. Task 6 addressed the pre-modelling pipeline through data transformation and feature engineering, confirming that linear transformations preserve distributional shape while rescaling the axis. Task 7 consolidated all prior concepts into a full KPI dashboard at scale (700 records, 7 companies), demonstrating that the same code patterns scale effortlessly. Task 8 completed the cycle with automated report generation, where a single function called by a loop produces consistent, structured outputs for every entity — the hallmark of production-ready reporting pipelines.

Throughout all tasks, R and Python implementations were maintained in parallel. While syntax differs — else if vs. elif, rbind() vs. list.append(), base R vs. pandas — the underlying logic is identical, reinforcing language-agnostic programming thinking that is essential in modern data science practice.

References

Functions and Loops — https://bookdown.org/dsciencelabs/data_science_programming/03-Functions-and-Loops.html
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
Sievert, C. (2020). Interactive Web-Based Data Visualisation with R, plotly, and shiny. CRC Press.
Plotly Technologies Inc. (2015). Collaborative data science. https://plot.ly
R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation.
Van Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual. CreateSpace.

Functions Loops Practicum

Assignment Week 5

Fityanandra Athar Adyaksa (52250059)

April 07, 2026

Fityanandra Athar Adyaksa (52250059) Data Science students at Enthusiastic about learning April 07, 2026

Enthusiastic about learning