Foto

Fityanandra Athar Adyaksa (52250059)


Data Science students at

Enthusiastic about learning

April 07, 2026



Preface

This practicum report is a consolidated document covering Task 1 through Task 8 of the Data Science Programming — Functions and Loops module. All tasks are unified into a single file to provide a coherent, end-to-end reference that traces the progressive application of core programming concepts across increasingly complex analytical scenarios.

The practicum centres on three fundamental pillars of programming: functions, loops, and conditional branching. These concepts underpin virtually all data wrangling, simulation, and automated reporting workflows in professional data science practice. Each task is implemented in both R and Python to reinforce language-agnostic thinking and to highlight syntactic differences between the two ecosystems.

All visualisations in this document are interactive, rendered via the plotly library in R and plotly in Python, allowing readers to hover for precise values, zoom into regions of interest, and toggle series visibility directly in the browser.



Research Objectives

The overarching objectives of this practicum are:

  1. Understand and implement functions in R and Python as reusable, self-contained units of logic — including input validation and safe error handling.
  2. Master loop constructs — both single and nested for loops — to automate repetitive computation across multi-dimensional datasets.
  3. Apply conditional logic (if-else if-else / if-elif-else) for data-driven decision making: tiered discounts, performance classification, and KPI assignment.
  4. Build and evaluate data simulations using Monte Carlo sampling and multi-entity nested loop structures.
  5. Perform data transformation and feature engineering through min-max normalisation, Z-score standardisation, and categorical variable creation.
  6. Communicate results through interactive visualisation using plotly for hover-enabled, zoomable charts.
  7. Automate report generation by encapsulating all reporting logic inside a function called iteratively via a loop — the foundation of production-grade automated reporting pipelines.



Task 1 — Function Definition, Nested Loops & Formula Comparison

This task covers Section 3 of Data Science Programming — Functions and Loops. The specific objectives are:

  1. Build a dynamic multi-formula function with conditional branching to evaluate four mathematical expressions.
  2. Apply a nested loop to compute formula values across x = 1 to 20.
  3. Implement input validation so the function fails safely on unrecognised inputs.
  4. Produce an interactive comparative visualisation of all four formulas.

Formulas Overview

Formula Type Expression Behaviour
Linear f(x) = 2x + 3 Constant growth rate — straight line
Quadratic f(x) = x² - 2x + 1 Accelerating growth — U-shaped curve
Cubic f(x) = 0.5x³ - 3x + 2 Initial dip, then steep rise
Exponential f(x) = 2e^(0.2x) Fastest growth — compounds repeatedly


Task 1.1 — Function Definition & Input Validation

compute_formula() accepts a numeric x and a string formula, returning the computed result. The else branch acts as an input guard, returning NA / None with a descriptive warning rather than crashing silently.

Feature R Python
Branch keyword else if elif
Exponential exp() math.exp()
Invalid input handling warning() + return(NA) print() + return None

— R

# compute_formula: evaluates one of four mathematical formulas at a given x.
# Returns NA with a warning if an unrecognised formula name is supplied.
compute_formula <- function(x, formula) {
  if      (formula == "linear")      return(2 * x + 3)
  else if (formula == "quadratic")   return(x^2 - 2 * x + 1)
  else if (formula == "cubic")       return(0.5 * x^3 - 3 * x + 2)
  else if (formula == "exponential") return(2 * exp(0.2 * x))
  else {
    warning(paste0("[compute_formula] Unknown formula: '", formula, "'."))
    return(NA)
  }
}

cat("=== Spot Checks (x = 5) ===\n")
## === Spot Checks (x = 5) ===
cat("linear      :", compute_formula(5, "linear"),                "\n")
## linear      : 13
cat("quadratic   :", compute_formula(5, "quadratic"),             "\n")
## quadratic   : 16
cat("cubic       :", compute_formula(5, "cubic"),                 "\n")
## cubic       : 49.5
cat("exponential :", round(compute_formula(5, "exponential"), 4), "\n")
## exponential : 5.4366
cat("\n=== Validation — Unknown Formula ===\n")
## 
## === Validation — Unknown Formula ===
cat("Returned   :", compute_formula(5, "logarithmic"), "\n")
## Returned   : NA


Task 1.2 — Nested Loop & Dataset Construction

A nested loop evaluates every formula at every x from 1 to 20, producing an 80-row dataset (4 formulas × 20 values).

— R

formula_list <- c("linear", "quadratic", "cubic", "exponential")
x_range      <- 1:20

# FIX: gunakan list lalu do.call(rbind) — jauh lebih cepat dari rbind() per iterasi
rows_list <- vector("list", length(formula_list) * length(x_range))
idx <- 1L
for (f in formula_list) {
  for (x in x_range) {
    rows_list[[idx]] <- data.frame(x = x, y = compute_formula(x, f),
                                   formula = f, stringsAsFactors = FALSE)
    idx <- idx + 1L
  }
}
results_df <- do.call(rbind, rows_list)

cat("Rows generated:", nrow(results_df),
    "(", length(formula_list), "formulas ×", length(x_range), "x values )\n\n")
## Rows generated: 80 ( 4 formulas × 20 x values )
head(results_df, 8)


Task 1.3 — Interactive Visualisation: Four Formulas on One Plot

— R

formula_colours <- c(
  "linear"      = "#2196F3",
  "quadratic"   = "#4CAF50",
  "cubic"       = "#FF9800",
  "exponential" = "#E91E63"
)
formula_labels <- c(
  "linear"      = "Linear: f(x) = 2x + 3",
  "quadratic"   = "Quadratic: f(x) = x\u00B2 - 2x + 1",
  "cubic"       = "Cubic: f(x) = 0.5x\u00B3 - 3x + 2",
  "exponential" = "Exponential: f(x) = 2e^(0.2x)"
)

p1 <- plot_ly()
for (f in formula_list) {
  df_sub <- results_df[results_df$formula == f, ]
  p1 <- add_trace(p1,
    data      = df_sub, x = ~x, y = ~y,
    type      = "scatter", mode = "lines+markers",
    name      = formula_labels[[f]],
    line      = list(color = formula_colours[[f]], width = 2.5),
    marker    = list(color = formula_colours[[f]], size = 6))
}
p1 <- layout(p1,
  title     = list(text = "Task 1: Comparison of Four Mathematical Formulas (x = 1 to 20)",
                   font = list(size = 15)),
  xaxis     = list(title = "x value", showgrid = TRUE, gridcolor = "#e8e8e8"),
  yaxis     = list(title = "f(x)",    showgrid = TRUE, gridcolor = "#e8e8e8"),
  legend    = list(title = list(text = "<b>Formula</b>")),
  hovermode = "x unified",
  plot_bgcolor  = "white",
  paper_bgcolor = "white")
p1


Task 1.4 — Descriptive Statistics per Formula

— R

summary_table <- results_df %>%
  group_by(formula) %>%
  summarise(
    Min    = round(min(y),    2),
    Max    = round(max(y),    2),
    Mean   = round(mean(y),   2),
    Median = round(median(y), 2),
    SD     = round(sd(y),     2),
    .groups = "drop"
  ) %>% rename(Formula = formula)

kable(summary_table,
      col.names = c("Formula","Min","Max","Mean","Median","Std Dev"),
      align = "lrrrrr",
      caption = "Descriptive statistics for each formula across x = 1 to 20")
Descriptive statistics for each formula across x = 1 to 20
Formula Min Max Mean Median Std Dev
cubic -0.50 3942.0 1073.00 553.25 1236.10
exponential 2.44 109.2 29.57 16.41 31.35
linear 5.00 43.0 24.00 24.00 11.83
quadratic 0.00 361.0 123.50 90.50 116.44


Summary — Task 1

  • Function design with conditional branching provides a clean, reusable interface for four distinct formula types.
  • Nested loops are the natural structure for two-dimensional computation, producing an 80-row dataset with minimal code.
  • Input validation via the else branch ensures safe failure — the function returns NA/None with a descriptive warning.
  • Visualisation confirms that exponential growth dominates at large x, while linear remains the most predictable throughout.



Task 2 — Nested Sales Simulation with Conditional Discounts

The objectives for Task 2 are:

  1. Build simulate_sales() — a nested simulation generating a multi-salesperson, multi-day sales dataset.
  2. Apply tiered conditional discount logic based on sales amount thresholds.
  3. Compute cumulative sales per salesperson using a nested helper function.
  4. Produce an interactive cumulative sales chart.

Simulation Parameters

Parameter / Rule Detail
n_salesperson Number of salespersons to simulate
days Number of trading days per salesperson
Discount — High sales_amount > 800 → 20% discount
Discount — Medium sales_amount > 500 → 10% discount
No Discount sales_amount ≤ 500 → no discount


Task 2.1 — Sales Simulation Function

— R

set.seed(42)

# simulate_sales: generates a sales dataset with tiered discount logic.
# FIX: gunakan list lalu do.call(rbind) — jauh lebih cepat dari rbind() per iterasi
simulate_sales <- function(n_salesperson, days) {
  rows_list <- vector("list", n_salesperson * days)
  idx <- 1L
  for (s in 1:n_salesperson) {
    for (d in 1:days) {
      amount   <- round(runif(1, 200, 1000), 2)
      discount <- if      (amount > 800) 0.20
                  else if (amount > 500) 0.10
                  else                  0.00
      rows_list[[idx]] <- data.frame(sales_id = s, day = d,
                                     sales_amount = amount, discount_rate = discount,
                                     stringsAsFactors = FALSE)
      idx <- idx + 1L
    }
  }
  do.call(rbind, rows_list)
}

sales_data <- simulate_sales(n_salesperson = 5, days = 10)
cat("Dataset:", nrow(sales_data), "rows ×", ncol(sales_data), "columns\n\n")
## Dataset: 50 rows × 4 columns
head(sales_data, 10)


Task 2.2 — Cumulative Sales per Salesperson

— R

calc_cumulative <- function(df, sid) {
  sub_df <- df[df$sales_id == sid, ]
  sub_df <- sub_df[order(sub_df$day), ]
  cumsum(sub_df$sales_amount)
}

# FIX: gunakan list lalu do.call(rbind)
cum_rows <- vector("list", length(unique(sales_data$sales_id)))
for (i in seq_along(unique(sales_data$sales_id))) {
  s <- sort(unique(sales_data$sales_id))[i]
  cum_vals <- calc_cumulative(sales_data, s)
  cum_rows[[i]] <- data.frame(sales_id = s, day = seq_along(cum_vals),
                               cumulative = cum_vals, stringsAsFactors = FALSE)
}
cum_df <- do.call(rbind, cum_rows)

head(cum_df, 10)


Task 2.3 — Summary Statistics & Interactive Cumulative Sales Plot

— R

summary_sales <- sales_data %>%
  group_by(sales_id) %>%
  summarise(
    Total_Sales    = round(sum(sales_amount), 2),
    Avg_Daily      = round(mean(sales_amount), 2),
    Total_Discount = round(sum(sales_amount * discount_rate), 2),
    Days_High      = sum(discount_rate == 0.20),
    Days_Medium    = sum(discount_rate == 0.10),
    Days_None      = sum(discount_rate == 0.00),
    .groups = "drop"
  ) %>% rename("Salesperson ID" = sales_id)

kable(summary_sales,
      col.names = c("Salesperson","Total Sales","Avg Daily",
                    "Total Discount","Days High","Days Medium","Days None"),
      align = "crrrrrr",
      caption = "Summary statistics per salesperson (5 persons × 10 days)")
Summary statistics per salesperson (5 persons × 10 days)
Salesperson Total Sales Avg Daily Total Discount Days High Days Medium Days None
1 7090.09 709.01 909.93 3 5 2
2 6720.24 672.02 890.42 3 5 2
3 6923.09 692.31 1101.12 5 3 2
4 6153.74 615.37 801.90 3 4 3
5 7066.52 706.65 1066.84 4 5 1
pal2 <- brewer.pal(5, "Set1")
p2   <- plot_ly()
for (i in seq_along(unique(cum_df$sales_id))) {
  s      <- sort(unique(cum_df$sales_id))[i]
  df_sub <- cum_df[cum_df$sales_id == s, ]
  p2 <- add_trace(p2,
    data = df_sub, x = ~day, y = ~cumulative,
    type = "scatter", mode = "lines+markers",
    name = paste("Salesperson", s),
    line   = list(color = pal2[i], width = 2.2),
    marker = list(color = pal2[i], size  = 6))
}
p2 <- layout(p2,
  title     = list(text = "Task 2: Cumulative Sales per Salesperson (10 Days)"),
  xaxis     = list(title = "Day", dtick = 1),
  yaxis     = list(title = "Cumulative Sales Amount"),
  hovermode = "x unified",
  legend    = list(title = list(text = "<b>Salesperson</b>")),
  plot_bgcolor  = "white",
  paper_bgcolor = "white")
p2



Task 3 — Multi-Level Performance Categorisation

The objectives for Task 3 are:

  1. Build categorize_performance() — a function assigning one of five performance labels to a sales amount.
  2. Apply the function across a vector of 100 values using a loop.
  3. Compute the percentage distribution across all five categories.
  4. Visualise with an interactive bar chart and pie chart.

Performance Categories

Category Condition Plot Colour
Excellent sales_amount > 800 #2ecc71
Very Good sales_amount > 650 #3498db
Good sales_amount > 500 #f39c12
Average sales_amount > 350 #e67e22
Poor sales_amount ≤ 350 #e74c3c


Task 3.1 — Five-Level Classification Function

— R

set.seed(7)

categorize_performance <- function(sales_amount) {
  if      (sales_amount > 800) return("Excellent")
  else if (sales_amount > 650) return("Very Good")
  else if (sales_amount > 500) return("Good")
  else if (sales_amount > 350) return("Average")
  else                          return("Poor")
}

sales_vector <- round(runif(100, 100, 1000), 2)
# FIX: pre-alokasi vektor karakter, bukan rbind row per row
categories   <- character(length(sales_vector))
for (i in seq_along(sales_vector)) {
  categories[i] <- categorize_performance(sales_vector[i])
}

result_df <- data.frame(sales_amount = sales_vector, category = categories,
                         stringsAsFactors = FALSE)
head(result_df, 10)


Task 3.2 — Category Frequency Distribution

— R

level_order           <- c("Excellent","Very Good","Good","Average","Poor")
freq_table            <- as.data.frame(table(result_df$category))
colnames(freq_table)  <- c("Category","Count")
freq_table$Percentage <- round(freq_table$Count / nrow(result_df) * 100, 1)
freq_table$Category   <- factor(freq_table$Category, levels = level_order)
freq_table            <- freq_table[order(freq_table$Category), ]

kable(freq_table, col.names = c("Category","Count","Percentage (%)"),
      align = "lrr",
      caption = "Distribution of performance categories across 100 sales values")
Distribution of performance categories across 100 sales values
Category Count Percentage (%)
2 Excellent 22 22
5 Very Good 17 17
3 Good 15 15
1 Average 21 21
4 Poor 25 25


Task 3.3 — Interactive Bar Chart & Pie Chart

— R

cat_colours  <- c("Excellent"="#2ecc71","Very Good"="#3498db",
                  "Good"="#f39c12","Average"="#e67e22","Poor"="#e74c3c")
freq_plot    <- freq_table
freq_plot$Category <- as.character(freq_plot$Category)

p_bar3 <- plot_ly(freq_plot,
  x      = ~Category, y = ~Count,
  type   = "bar",
  marker = list(color = unname(cat_colours[freq_plot$Category])),
  text   = ~paste0(Percentage, "%"),
  textposition = "outside",
  hovertemplate = "<b>%{x}</b><br>Count: %{y}<br>Share: %{text}<extra></extra>"
) %>% layout(
  title  = "Task 3: Performance Category Distribution (n = 100)",
  xaxis  = list(title = "Category", categoryorder = "array",
                categoryarray = level_order),
  yaxis  = list(title = "Count"),
  showlegend   = FALSE,
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

p_pie3 <- plot_ly(freq_plot,
  labels  = ~Category, values = ~Count,
  type    = "pie",
  marker  = list(colors = unname(cat_colours[freq_plot$Category])),
  textinfo      = "label+percent",
  hovertemplate = "<b>%{label}</b><br>Count: %{value}<br>%{percent}<extra></extra>"
) %>% layout(
  title         = "Task 3: Proportional Share by Category",
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

p_bar3
p_pie3



Task 4 — Multi-Company Dataset Simulation

The objectives for Task 4 are:

  1. Build generate_company_data() with a nested loop to simulate employee records across multiple companies.
  2. Apply conditional KPI assignment for top performers (performance score > 75).
  3. Produce a per-company summary table.
  4. Visualise with an interactive grouped bar chart and scatter plot.

Dataset Schema

Column Type Description
company_id integer Company identifier (1 to n_company)
employee_id integer Employee identifier (1 to n_employees per company)
salary numeric Monthly salary in thousands — Uniform(3, 15)
department character One of: HR, IT, Finance, Marketing, Operations
performance_score numeric Performance score — Normal(70, 15), capped to [0, 100]
KPI_score numeric KPI: Uniform(91,100) if top performer, else Uniform(50,89)


Task 4.1 — Company Data Generation Function

— R

set.seed(123)

# FIX: gunakan list lalu do.call(rbind) — jauh lebih cepat dari rbind() per iterasi
generate_company_data <- function(n_company, n_employees) {
  departments <- c("HR","IT","Finance","Marketing","Operations")
  rows_list   <- vector("list", n_company * n_employees)
  idx <- 1L
  for (c in 1:n_company) {
    for (e in 1:n_employees) {
      salary <- round(runif(1, 3, 15), 2)
      dept   <- sample(departments, 1)
      perf   <- round(max(0, min(100, rnorm(1, mean = 70, sd = 15))), 1)
      kpi    <- if (perf > 75) round(runif(1, 91, 100), 1) else round(runif(1, 50, 89), 1)
      rows_list[[idx]] <- data.frame(company_id = c, employee_id = e, salary = salary,
                                     department = dept, performance_score = perf,
                                     KPI_score = kpi, stringsAsFactors = FALSE)
      idx <- idx + 1L
    }
  }
  do.call(rbind, rows_list)
}

company_data <- generate_company_data(n_company = 4, n_employees = 20)
cat("Dataset:", nrow(company_data), "rows ×", ncol(company_data), "columns\n\n")
## Dataset: 80 rows × 6 columns
head(company_data, 8)


Task 4.2 — Per-Company Summary Table

— R

company_summary <- company_data %>%
  group_by(company_id) %>%
  summarise(
    Avg_Salary      = round(mean(salary), 2),
    Avg_Performance = round(mean(performance_score), 1),
    Max_KPI         = round(max(KPI_score), 1),
    Top_Performers  = sum(performance_score > 75),
    .groups = "drop"
  ) %>% rename("Company ID" = company_id)

kable(company_summary,
      col.names = c("Company","Avg Salary (k)","Avg Performance","Max KPI","Top Performers"),
      align = "crrrc",
      caption = "Per-company summary: 4 companies × 20 employees each")
Per-company summary: 4 companies × 20 employees each
Company Avg Salary (k) Avg Performance Max KPI Top Performers
1 9.31 73.3 97.4 8
2 9.29 71.0 99.4 7
3 9.92 67.8 99.6 5
4 8.71 71.8 99.2 6


Task 4.3 — Interactive Grouped Bar Chart & Scatter Plot

— R

bar_data <- company_summary %>%
  rename(company_id = "Company ID") %>%
  select(company_id, Avg_Salary, Avg_Performance) %>%
  pivot_longer(cols = c(Avg_Salary, Avg_Performance),
               names_to = "Metric", values_to = "Value")

p_bar4 <- plot_ly(bar_data,
  x      = ~factor(company_id), y = ~Value,
  color  = ~Metric,
  colors = c("Avg_Salary" = "#3498db","Avg_Performance" = "#e74c3c"),
  type   = "bar",
  text   = ~round(Value, 2), textposition = "outside"
) %>% layout(
  barmode = "group",
  title   = "Task 4: Average Salary vs. Performance by Company",
  xaxis   = list(title = "Company ID"),
  yaxis   = list(title = "Value"),
  legend  = list(title = list(text = "<b>Metric</b>")),
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

company_data$company_label <- paste("Company", company_data$company_id)

p_scatter4 <- plot_ly(company_data,
  x      = ~performance_score, y = ~KPI_score,
  color  = ~company_label,
  type   = "scatter", mode = "markers",
  marker = list(size = 8, opacity = 0.7),
  hovertemplate = "<b>%{color}</b><br>Performance: %{x}<br>KPI: %{y}<extra></extra>"
) %>% layout(
  title  = "Task 4: Performance Score vs. KPI Score (per Employee)",
  xaxis  = list(title = "Performance Score"),
  yaxis  = list(title = "KPI Score"),
  legend = list(title = list(text = "<b>Company</b>")),
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

p_bar4
p_scatter4



Task 5 — Monte Carlo Simulation: π Estimation & Probability

The objectives for Task 5 are:

  1. Build monte_carlo_pi() to estimate π by random point sampling in a unit square.
  2. Use a loop to count points falling inside the unit circle.
  3. Compute the probability of a point landing in a sub-square region.
  4. Visualise with an interactive scatter plot.

The Monte Carlo Principle

Item Description
Unit square x ∈ [0,1], y ∈ [0,1] — area = 1
Quarter-circle x² + y² ≤ 1 — area = π/4
Key ratio points inside circle / total ≈ π/4
π estimate π ≈ 4 × (points inside / total points)
Sub-square x ∈ [0,0.5], y ∈ [0,0.5] — area = 0.25 → P ≈ 0.25


Task 5.1 — Monte Carlo π Estimation

— R

set.seed(2024)

# FIX: pre-alokasi vektor numerik, bukan rbind per iterasi
monte_carlo_pi <- function(n_points) {
  inside <- 0L; outside <- 0L
  # Pre-alokasi vektor untuk plotting (max 2000 titik)
  plot_n  <- min(n_points, 2000L)
  pts_x   <- numeric(plot_n)
  pts_y   <- numeric(plot_n)
  pts_st  <- character(plot_n)

  for (i in seq_len(n_points)) {
    x <- runif(1); y <- runif(1)
    if (x^2 + y^2 <= 1) { inside  <- inside  + 1L; status <- "inside"  }
    else                 { outside <- outside + 1L; status <- "outside" }
    if (i <= plot_n) {
      pts_x[i] <- x; pts_y[i] <- y; pts_st[i] <- status
    }
  }
  pts <- data.frame(x = pts_x, y = pts_y, status = pts_st, stringsAsFactors = FALSE)
  list(pi_estimate   = 4 * inside / n_points,
       inside_count  = inside,
       outside_count = outside,
       points        = pts)
}

mc_result <- monte_carlo_pi(10000)

cat("=== Monte Carlo π Estimation (n = 10,000) ===\n")
## === Monte Carlo π Estimation (n = 10,000) ===
cat("Points inside circle :", mc_result$inside_count,                      "\n")
## Points inside circle : 7865
cat("Points outside circle:", mc_result$outside_count,                     "\n")
## Points outside circle: 2135
cat("Estimated π          :", round(mc_result$pi_estimate, 6),              "\n")
## Estimated π          : 3.146
cat("True π               :", round(pi, 6),                                "\n")
## True π               : 3.141593
cat("Absolute error       :", round(abs(mc_result$pi_estimate - pi), 6),   "\n")
## Absolute error       : 0.004407


Task 5.2 — Sub-Square Probability Analysis

— R

set.seed(2024)
all_x <- runif(10000); all_y <- runif(10000)

in_subsquare   <- sum(all_x <= 0.5 & all_y <= 0.5)
prob_subsquare <- round(in_subsquare / 10000, 4)

cat("=== Sub-Square Probability (x ≤ 0.5 AND y ≤ 0.5) ===\n")
## === Sub-Square Probability (x ≤ 0.5 AND y ≤ 0.5) ===
cat("Points in sub-square  :", in_subsquare,                           "\n")
## Points in sub-square  : 2549
cat("Estimated probability :", prob_subsquare,                         "\n")
## Estimated probability : 0.2549
cat("Theoretical P         : 0.2500\n")
## Theoretical P         : 0.2500
cat("Absolute error        :", round(abs(prob_subsquare - 0.25), 4),  "\n")
## Absolute error        : 0.0049


Task 5.3 — Interactive Scatter Plot: Inside vs. Outside the Circle

— R

points_df <- mc_result$points
theta_seq <- seq(0, pi / 2, length.out = 200)
arc_df    <- data.frame(x = cos(theta_seq), y = sin(theta_seq))

p_mc <- plot_ly() %>%
  add_trace(data = points_df[points_df$status == "outside", ],
    x = ~x, y = ~y, type = "scatter", mode = "markers",
    name   = "Outside circle",
    marker = list(color = "#e74c3c", size = 3, opacity = 0.5)) %>%
  add_trace(data = points_df[points_df$status == "inside", ],
    x = ~x, y = ~y, type = "scatter", mode = "markers",
    name   = "Inside circle",
    marker = list(color = "#3498db", size = 3, opacity = 0.5)) %>%
  add_trace(data = arc_df, x = ~x, y = ~y,
    type = "scatter", mode = "lines", name = "Circle boundary",
    line = list(color = "black", width = 1.8)) %>%
  layout(
    title = list(text = paste0(
      "Task 5: Monte Carlo Simulation (2,000 points shown)<br>",
      "<sub>Estimated π = ", round(mc_result$pi_estimate, 5),
      " | True π = ", round(pi, 5), "</sub>")),
    xaxis   = list(title = "x", range = c(0, 1), scaleanchor = "y"),
    yaxis   = list(title = "y", range = c(0, 1)),
    shapes  = list(list(
      type = "rect", x0 = 0, x1 = 0.5, y0 = 0, y1 = 0.5,
      line = list(color = "purple", dash = "dash", width = 1.5),
      fillcolor = "rgba(128,0,128,0.05)")),
    annotations = list(list(
      x = 0.25, y = 0.55, text = "Sub-square (P ≈ 0.25)",
      showarrow = FALSE, font = list(color = "purple", size = 11))),
    plot_bgcolor  = "white",
    paper_bgcolor = "white")
p_mc



Task 6 — Advanced Data Transformation & Feature Engineering

The objectives for Task 6 are:

  1. Build normalize_columns() — loop-based min-max normalisation (scales to [0, 1]).
  2. Build z_score() — loop-based Z-score standardisation (mean = 0, sd = 1).
  3. Engineer two categorical features: performance_category and salary_bracket.
  4. Compare distributions before and after transformation with interactive histograms.

Transformation Reference

Method Formula Output Range Typical Use Case
Min-Max Normalisation x_norm = (x − min) / (max − min) [0, 1] — bounded Distance-based models, neural networks
Z-Score Standardisation x_z = (x − mean) / sd Unbounded — centred at 0 Regression, clustering — unit variance


Task 6.1 — Normalisation Functions

— R

set.seed(99)

# FIX: generate_company_data_t6 juga menggunakan list + do.call(rbind)
generate_company_data_t6 <- function(n_company, n_employees) {
  departments <- c("HR","IT","Finance","Marketing","Operations")
  rows_list   <- vector("list", n_company * n_employees)
  idx <- 1L
  for (c in 1:n_company) {
    for (e in 1:n_employees) {
      salary <- round(runif(1, 3, 15), 2)
      dept   <- sample(departments, 1)
      perf   <- round(max(0, min(100, rnorm(1, 70, 15))), 1)
      kpi    <- if (perf > 75) round(runif(1, 91, 100), 1) else round(runif(1, 50, 89), 1)
      rows_list[[idx]] <- data.frame(company_id = c, employee_id = e, salary = salary,
                                     department = dept, performance_score = perf,
                                     KPI_score = kpi, stringsAsFactors = FALSE)
      idx <- idx + 1L
    }
  }
  do.call(rbind, rows_list)
}

company_data_t6 <- generate_company_data_t6(4, 20)

normalize_columns <- function(df) {
  df_norm <- df
  for (col in names(df_norm)) {
    if (is.numeric(df_norm[[col]])) {
      col_min <- min(df_norm[[col]], na.rm = TRUE)
      col_max <- max(df_norm[[col]], na.rm = TRUE)
      if (col_max != col_min)
        df_norm[[col]] <- round((df_norm[[col]] - col_min) / (col_max - col_min), 4)
    }
  }
  return(df_norm)
}

z_score <- function(df) {
  df_z <- df
  for (col in names(df_z)) {
    if (is.numeric(df_z[[col]])) {
      col_mean <- mean(df_z[[col]], na.rm = TRUE)
      col_sd   <- sd(df_z[[col]],   na.rm = TRUE)
      if (col_sd != 0)
        df_z[[col]] <- round((df_z[[col]] - col_mean) / col_sd, 4)
    }
  }
  return(df_z)
}

df_norm_t6 <- normalize_columns(company_data_t6)
df_z_t6    <- z_score(company_data_t6)

cat("=== Original (first 5 rows) ===\n")
## === Original (first 5 rows) ===
head(company_data_t6[, c("salary","performance_score","KPI_score")], 5)
cat("=== After min-max normalisation ===\n")
## === After min-max normalisation ===
head(df_norm_t6[, c("salary","performance_score","KPI_score")], 5)
cat("=== After Z-score standardisation ===\n")
## === After Z-score standardisation ===
head(df_z_t6[, c("salary","performance_score","KPI_score")], 5)


Task 6.2 — Feature Engineering

— R

company_data_t6$performance_category <- NA_character_
company_data_t6$salary_bracket       <- NA_character_

for (i in seq_len(nrow(company_data_t6))) {
  perf   <- company_data_t6$performance_score[i]
  salary <- company_data_t6$salary[i]

  company_data_t6$performance_category[i] <-
    if      (perf > 80) "Excellent"
    else if (perf > 65) "Very Good"
    else if (perf > 50) "Good"
    else if (perf > 35) "Average"
    else                "Poor"

  company_data_t6$salary_bracket[i] <-
    if      (salary > 10) "High"
    else if (salary >  6) "Mid"
    else                  "Low"
}

head(company_data_t6[, c("salary","performance_score",
                           "performance_category","salary_bracket")], 8)


Task 6.3 — Interactive Distribution Comparison

— R

sal_compare <- data.frame(
  Original   = company_data_t6$salary,
  Normalised = df_norm_t6$salary,
  Z_Score    = df_z_t6$salary
) %>% pivot_longer(everything(), names_to = "Transformation", values_to = "Value")

perf_compare <- data.frame(
  Original   = company_data_t6$performance_score,
  Normalised = df_norm_t6$performance_score,
  Z_Score    = df_z_t6$performance_score
) %>% pivot_longer(everything(), names_to = "Transformation", values_to = "Value")

tr_colours <- c("Original"="#3498db","Normalised"="#2ecc71","Z_Score"="#e74c3c")

p_sal_hist <- plot_ly()
for (tr in c("Original","Normalised","Z_Score")) {
  vals <- sal_compare$Value[sal_compare$Transformation == tr]
  p_sal_hist <- add_trace(p_sal_hist,
    x = vals, type = "histogram", nbinsx = 15, name = tr,
    marker = list(color = tr_colours[[tr]], opacity = 0.7))
}
p_sal_hist <- layout(p_sal_hist,
  barmode = "overlay",
  title   = "Task 6: Salary Distribution — Before vs. After Transformation",
  xaxis   = list(title = "Value"), yaxis = list(title = "Count"),
  plot_bgcolor = "white", paper_bgcolor = "white")

p_perf_hist <- plot_ly()
for (tr in c("Original","Normalised","Z_Score")) {
  vals <- perf_compare$Value[perf_compare$Transformation == tr]
  p_perf_hist <- add_trace(p_perf_hist,
    x = vals, type = "histogram", nbinsx = 15, name = tr,
    marker = list(color = tr_colours[[tr]], opacity = 0.7))
}
p_perf_hist <- layout(p_perf_hist,
  barmode = "overlay",
  title   = "Task 6: Performance Score Distribution — Before vs. After Transformation",
  xaxis   = list(title = "Value"), yaxis = list(title = "Count"),
  plot_bgcolor = "white", paper_bgcolor = "white")

p_sal_box <- plot_ly(sal_compare,
  x = ~Transformation, y = ~Value,
  color = ~Transformation, colors = tr_colours,
  type  = "box"
) %>% layout(
  title  = "Task 6: Salary — Boxplot Comparison",
  xaxis  = list(title = ""), yaxis = list(title = "Value"),
  showlegend   = FALSE,
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

p_sal_hist
p_perf_hist
p_sal_box



Task 7 — Mini Project: Company KPI Dashboard

Task 7 is the capstone mini project integrating all concepts from Tasks 1–6. The objectives are:

  1. Generate a large-scale dataset for 7 companies, each with 100 employees.
  2. Categorise every employee into KPI tiers using a loop.
  3. Summarise per-company statistics.
  4. Produce a full interactive dashboard of four charts.

Dataset Scale

Parameter Value / Definition
Companies 7
Employees per company 100
Total records 700
KPI Tier — High KPI_score > 85
KPI Tier — Mid KPI_score 65–85
KPI Tier — Low KPI_score < 65


Task 7.1 — Large-Scale Dataset Generation

— R

set.seed(777)

# FIX: gunakan list + do.call(rbind) untuk dataset 700 baris
generate_company_data_t7 <- function(n_company, n_employees) {
  departments <- c("HR","IT","Finance","Marketing","Operations")
  rows_list   <- vector("list", n_company * n_employees)
  idx <- 1L
  for (c in 1:n_company) {
    for (e in 1:n_employees) {
      salary <- round(runif(1, 3, 15), 2)
      dept   <- sample(departments, 1)
      perf   <- round(max(0, min(100, rnorm(1, 70, 15))), 1)
      kpi    <- if (perf > 75) round(runif(1, 85, 100), 1) else round(runif(1, 40, 84), 1)
      rows_list[[idx]] <- data.frame(company_id = c, employee_id = e, salary = salary,
                                     department = dept, performance_score = perf,
                                     KPI_score = kpi, stringsAsFactors = FALSE)
      idx <- idx + 1L
    }
  }
  do.call(rbind, rows_list)
}

kpi_df <- generate_company_data_t7(n_company = 7, n_employees = 100)
cat("Dataset:", nrow(kpi_df), "rows ×", ncol(kpi_df), "columns\n\n")
## Dataset: 700 rows × 6 columns
head(kpi_df, 6)


Task 7.2 — KPI Tier Assignment & Company Summary

— R

# FIX: gunakan ifelse() vectorised — jauh lebih cepat dari loop per baris untuk 700 baris
kpi_df$KPI_tier <- ifelse(kpi_df$KPI_score > 85, "High",
                    ifelse(kpi_df$KPI_score >= 65, "Mid", "Low"))

company_summary_t7 <- kpi_df %>%
  group_by(company_id) %>%
  summarise(
    Avg_Salary = round(mean(salary), 2),
    Avg_KPI    = round(mean(KPI_score), 1),
    Top_N      = sum(KPI_tier == "High"),
    Mid_N      = sum(KPI_tier == "Mid"),
    Low_N      = sum(KPI_tier == "Low"),
    .groups = "drop"
  )

kable(company_summary_t7,
      col.names = c("Company","Avg Salary (k)","Avg KPI","KPI High","KPI Mid","KPI Low"),
      align     = "crrrrrr",
      caption   = "KPI dashboard summary: 7 companies × 100 employees each")
KPI dashboard summary: 7 companies × 100 employees each
Company Avg Salary (k) Avg KPI KPI High KPI Mid KPI Low
1 8.80 70.9 30 30 40
2 8.99 73.7 33 34 33
3 8.34 73.5 37 26 37
4 8.30 73.9 39 26 35
5 8.88 73.7 36 30 34
6 8.58 74.9 45 24 31
7 8.43 70.5 32 24 44


Task 7.3 — Interactive KPI Dashboard

— R

tier_colours <- c("High"="#2ecc71","Mid"="#f39c12","Low"="#e74c3c")

tier_counts <- kpi_df %>%
  group_by(company_id, KPI_tier) %>%
  summarise(count = n(), .groups = "drop")

p_tier <- plot_ly(tier_counts,
  x      = ~factor(company_id), y = ~count,
  color  = ~KPI_tier, colors = tier_colours,
  type   = "bar",
  hovertemplate = "Company %{x} | Tier: %{color} | Count: %{y}<extra></extra>"
) %>% layout(
  barmode = "group",
  title   = "Chart 1: KPI Tier Distribution per Company",
  xaxis   = list(title = "Company ID"),
  yaxis   = list(title = "Number of Employees"),
  legend  = list(title = list(text = "<b>KPI Tier</b>")),
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

dept_top <- kpi_df %>%
  filter(KPI_tier == "High") %>%
  group_by(department) %>%
  summarise(top_count = n(), .groups = "drop") %>%
  arrange(top_count)

p_dept <- plot_ly(dept_top,
  y = ~department, x = ~top_count,
  type = "bar", orientation = "h",
  marker = list(color = "#3498db", opacity = 0.85),
  hovertemplate = "%{y}: %{x} top performers<extra></extra>"
) %>% layout(
  title  = "Chart 2: Top Performers by Department (KPI > 85)",
  xaxis  = list(title = "Count of Top Performers"),
  yaxis  = list(title = ""),
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

kpi_df$company_label <- paste("Company", kpi_df$company_id)

p_salary <- plot_ly(kpi_df,
  x = ~company_label, y = ~salary,
  color = ~company_label, type = "box", boxpoints = "outliers",
  hovertemplate = "%{x}<br>Salary: %{y:.2f}k<extra></extra>"
) %>% layout(
  title      = "Chart 3: Salary Distribution per Company",
  xaxis      = list(title = "Company"),
  yaxis      = list(title = "Salary (thousands)"),
  showlegend = FALSE,
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

p_perf_kpi <- plot_ly(kpi_df,
  x = ~performance_score, y = ~KPI_score,
  color = ~company_label, type = "scatter", mode = "markers",
  marker = list(size = 5, opacity = 0.6),
  hovertemplate = "<b>%{color}</b><br>Performance: %{x}<br>KPI: %{y}<extra></extra>"
) %>% layout(
  title     = "Chart 4: Performance Score vs. KPI Score",
  xaxis     = list(title = "Performance Score"),
  yaxis     = list(title = "KPI Score"),
  legend    = list(title = list(text = "<b>Company</b>")),
  hovermode = "closest",
  plot_bgcolor  = "white",
  paper_bgcolor = "white")

p_tier
p_dept
p_salary
p_perf_kpi



Task 8 (Bonus) — Automated Report Generation

The objective of Task 8 is to demonstrate automated report generation: one function encapsulates all reporting logic for a single company, and a loop calls it for every company — producing consistent, structured outputs at scale without any repetitive code.



Task 8.1 — Report Generation Function

— R

set.seed(777)

# FIX: gunakan list + do.call(rbind) untuk dataset 700 baris
generate_company_data_t8 <- function(n_company, n_employees) {
  departments <- c("HR","IT","Finance","Marketing","Operations")
  rows_list   <- vector("list", n_company * n_employees)
  idx <- 1L
  for (c in 1:n_company) {
    for (e in 1:n_employees) {
      salary <- round(runif(1, 3, 15), 2)
      dept   <- sample(departments, 1)
      perf   <- round(max(0, min(100, rnorm(1, 70, 15))), 1)
      kpi    <- if (perf > 75) round(runif(1, 85, 100), 1) else round(runif(1, 40, 84), 1)
      rows_list[[idx]] <- data.frame(company_id = c, employee_id = e, salary = salary,
                                     department = dept, performance_score = perf,
                                     KPI_score = kpi, stringsAsFactors = FALSE)
      idx <- idx + 1L
    }
  }
  do.call(rbind, rows_list)
}

kpi_df_t8 <- generate_company_data_t8(7, 100)
kpi_df_t8$KPI_tier <- ifelse(kpi_df_t8$KPI_score > 85, "High",
                       ifelse(kpi_df_t8$KPI_score >= 65, "Mid", "Low"))

generate_company_report <- function(df, cid) {
  co_df        <- df %>% filter(company_id == cid)
  tier_colours <- c("High"="#2ecc71","Mid"="#f39c12","Low"="#e74c3c")

  stats <- data.frame(
    Metric = c("Total Employees","Avg Salary (k)","Avg Performance Score",
               "Avg KPI Score","KPI High (>85)","KPI Mid (65–85)","KPI Low (<65)"),
    Value  = c(nrow(co_df),
               round(mean(co_df$salary), 2),
               round(mean(co_df$performance_score), 1),
               round(mean(co_df$KPI_score), 1),
               sum(co_df$KPI_tier == "High"),
               sum(co_df$KPI_tier == "Mid"),
               sum(co_df$KPI_tier == "Low")))

  cat("\n══════════════════════════════════════════════\n")
  cat(paste0("  COMPANY ", cid, "  —  AUTOMATED KPI REPORT\n"))
  cat("══════════════════════════════════════════════\n\n")

  cat("[ Summary Statistics ]\n")
  print(kable(stats, col.names = c("Metric","Value"), align = "lr"))

  top_df <- co_df %>%
    filter(KPI_tier == "High") %>%
    select(employee_id, department, salary, performance_score, KPI_score) %>%
    arrange(desc(KPI_score)) %>%
    head(5)

  cat("\n[ Top 5 Performers ]\n")
  print(kable(top_df,
    col.names = c("Employee ID","Department","Salary (k)","Performance","KPI"),
    align     = "clrrr"))

  dept_summary <- co_df %>%
    group_by(department) %>%
    summarise(Count = n(), Avg_KPI = round(mean(KPI_score), 1), .groups = "drop")

  cat("\n[ Department Breakdown ]\n")
  print(kable(dept_summary,
    col.names = c("Department","Headcount","Avg KPI"), align = "lrr"))

  p1 <- plot_ly(co_df, x = ~department, color = ~KPI_tier, colors = tier_colours,
    type = "histogram", barnorm = "fraction",
    hovertemplate = "%{x} — %{color}: %{y:.1%}<extra></extra>"
  ) %>% layout(
    title   = paste0("Company ", cid, ": KPI Tier Proportion by Department"),
    xaxis   = list(title = "Department"),
    yaxis   = list(title = "Proportion", tickformat = ".0%"),
    barmode = "stack",
    plot_bgcolor  = "white",
    paper_bgcolor = "white")

  p2 <- plot_ly(co_df, x = ~salary, color = ~KPI_tier, colors = tier_colours,
    type = "histogram", nbinsx = 15, opacity = 0.8,
    hovertemplate = "Salary: %{x:.1f}k<br>Count: %{y}<extra></extra>"
  ) %>% layout(
    barmode = "overlay",
    title   = paste0("Company ", cid, ": Salary Distribution by KPI Tier"),
    xaxis   = list(title = "Salary (thousands)"),
    yaxis   = list(title = "Count"),
    plot_bgcolor  = "white",
    paper_bgcolor = "white")

  print(p1)
  print(p2)

  # FIX: CSV hanya ditulis jika belum ada — mencegah duplikasi di setiap run/knit
  folder    <- "reports"
  if (!dir.exists(folder)) dir.create(folder)
  file_name <- file.path(folder, paste0("company_", cid, "_report.csv"))

  if (!file.exists(file_name)) {
    write.csv(co_df, file = file_name, row.names = FALSE)
    cat(paste0("\n  CSV exported: ", file_name, "\n\n"))
  } else {
    cat(paste0("\n  CSV sudah ada, skip: ", file_name, "\n\n"))
  }
}


Task 8.2 — Automated Loop: One Loop, Seven Reports

— R

# A single for loop produces a complete report for every company automatically.
for (cid in sort(unique(kpi_df_t8$company_id))) {
  generate_company_report(kpi_df_t8, cid)
}
## 
## ══════════════════════════════════════════════
##   COMPANY 1  —  AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
## 
## [ Summary Statistics ]
## 
## 
## |Metric                | Value|
## |:---------------------|-----:|
## |Total Employees       | 100.0|
## |Avg Salary (k)        |   8.8|
## |Avg Performance Score |  69.7|
## |Avg KPI Score         |  70.9|
## |KPI High (>85)        |  30.0|
## |KPI Mid (65–85)       |  30.0|
## |KPI Low (<65)         |  40.0|
## 
## [ Top 5 Performers ]
## 
## 
## | Employee ID |Department | Salary (k)| Performance|  KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## |     49      |Operations |       5.26|        94.8| 99.9|
## |     94      |Marketing  |       3.00|        89.1| 99.7|
## |     86      |Finance    |       9.80|        85.6| 99.0|
## |      8      |HR         |       6.98|       100.0| 98.6|
## |     79      |Marketing  |       3.61|        79.6| 98.4|
## 
## [ Department Breakdown ]
## 
## 
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance    |        22|    61.2|
## |HR         |        19|    77.1|
## |IT         |        25|    70.9|
## |Marketing  |        18|    73.9|
## |Operations |        16|    73.7|
## 
##   CSV sudah ada, skip: reports/company_1_report.csv
## 
## 
## ══════════════════════════════════════════════
##   COMPANY 2  —  AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
## 
## [ Summary Statistics ]
## 
## 
## |Metric                |  Value|
## |:---------------------|------:|
## |Total Employees       | 100.00|
## |Avg Salary (k)        |   8.99|
## |Avg Performance Score |  70.40|
## |Avg KPI Score         |  73.70|
## |KPI High (>85)        |  33.00|
## |KPI Mid (65–85)       |  34.00|
## |KPI Low (<65)         |  33.00|
## 
## [ Top 5 Performers ]
## 
## 
## | Employee ID |Department | Salary (k)| Performance|  KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## |     85      |Marketing  |      12.87|        82.0| 99.8|
## |     93      |Marketing  |      11.43|        88.2| 99.0|
## |     97      |Operations |      12.72|        81.6| 98.9|
## |     98      |Marketing  |       4.03|        92.7| 98.8|
## |     39      |Marketing  |       8.39|        83.5| 98.2|
## 
## [ Department Breakdown ]
## 
## 
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance    |        24|    72.5|
## |HR         |        17|    72.4|
## |IT         |        22|    69.8|
## |Marketing  |        22|    77.9|
## |Operations |        15|    76.5|
## 
##   CSV sudah ada, skip: reports/company_2_report.csv
## 
## 
## ══════════════════════════════════════════════
##   COMPANY 3  —  AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
## 
## [ Summary Statistics ]
## 
## 
## |Metric                |  Value|
## |:---------------------|------:|
## |Total Employees       | 100.00|
## |Avg Salary (k)        |   8.34|
## |Avg Performance Score |  70.50|
## |Avg KPI Score         |  73.50|
## |KPI High (>85)        |  37.00|
## |KPI Mid (65–85)       |  26.00|
## |KPI Low (<65)         |  37.00|
## 
## [ Top 5 Performers ]
## 
## 
## | Employee ID |Department | Salary (k)| Performance|  KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## |     51      |Marketing  |       3.84|       100.0| 99.7|
## |     70      |Operations |      13.13|        79.8| 99.7|
## |     11      |Operations |       6.45|        84.5| 99.3|
## |     59      |Finance    |       5.62|        90.1| 98.9|
## |     26      |HR         |      13.20|        79.9| 98.7|
## 
## [ Department Breakdown ]
## 
## 
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance    |        23|    69.7|
## |HR         |        17|    81.0|
## |IT         |        20|    70.2|
## |Marketing  |        21|    73.8|
## |Operations |        19|    74.6|
## 
##   CSV sudah ada, skip: reports/company_3_report.csv
## 
## 
## ══════════════════════════════════════════════
##   COMPANY 4  —  AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
## 
## [ Summary Statistics ]
## 
## 
## |Metric                | Value|
## |:---------------------|-----:|
## |Total Employees       | 100.0|
## |Avg Salary (k)        |   8.3|
## |Avg Performance Score |  70.4|
## |Avg KPI Score         |  73.9|
## |KPI High (>85)        |  39.0|
## |KPI Mid (65–85)       |  26.0|
## |KPI Low (<65)         |  35.0|
## 
## [ Top 5 Performers ]
## 
## 
## | Employee ID |Department | Salary (k)| Performance|  KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## |     37      |Operations |      13.87|        75.9| 99.4|
## |     19      |HR         |       6.70|        75.6| 99.0|
## |     79      |Operations |      14.66|       100.0| 98.6|
## |     72      |Marketing  |      14.38|        77.0| 98.0|
## |     28      |Finance    |       3.52|        90.2| 97.2|
## 
## [ Department Breakdown ]
## 
## 
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance    |        19|    75.2|
## |HR         |        24|    72.6|
## |IT         |        16|    73.5|
## |Marketing  |        15|    77.4|
## |Operations |        26|    72.5|
## 
##   CSV sudah ada, skip: reports/company_4_report.csv
## 
## 
## ══════════════════════════════════════════════
##   COMPANY 5  —  AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
## 
## [ Summary Statistics ]
## 
## 
## |Metric                |  Value|
## |:---------------------|------:|
## |Total Employees       | 100.00|
## |Avg Salary (k)        |   8.88|
## |Avg Performance Score |  70.10|
## |Avg KPI Score         |  73.70|
## |KPI High (>85)        |  36.00|
## |KPI Mid (65–85)       |  30.00|
## |KPI Low (<65)         |  34.00|
## 
## [ Top 5 Performers ]
## 
## 
## | Employee ID |Department | Salary (k)| Performance|   KPI|
## |:-----------:|:----------|----------:|-----------:|-----:|
## |     47      |Marketing  |       8.21|        79.2| 100.0|
## |      6      |Finance    |      10.40|        86.7|  99.8|
## |     67      |IT         |      11.92|        85.5|  99.7|
## |     98      |HR         |       7.73|        76.9|  99.5|
## |     86      |IT         |      13.85|        76.8|  99.1|
## 
## [ Department Breakdown ]
## 
## 
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance    |        21|    74.9|
## |HR         |        27|    73.3|
## |IT         |        23|    72.8|
## |Marketing  |        16|    74.9|
## |Operations |        13|    73.1|
## 
##   CSV sudah ada, skip: reports/company_5_report.csv
## 
## 
## ══════════════════════════════════════════════
##   COMPANY 6  —  AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
## 
## [ Summary Statistics ]
## 
## 
## |Metric                |  Value|
## |:---------------------|------:|
## |Total Employees       | 100.00|
## |Avg Salary (k)        |   8.58|
## |Avg Performance Score |  71.20|
## |Avg KPI Score         |  74.90|
## |KPI High (>85)        |  45.00|
## |KPI Mid (65–85)       |  24.00|
## |KPI Low (<65)         |  31.00|
## 
## [ Top 5 Performers ]
## 
## 
## | Employee ID |Department | Salary (k)| Performance|  KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## |     51      |IT         |      14.91|        77.1| 99.5|
## |     19      |Finance    |      11.75|        80.5| 99.1|
## |     25      |Operations |       5.88|        78.2| 99.0|
## |     34      |IT         |       8.45|        78.5| 98.3|
## |     41      |IT         |       7.79|       100.0| 98.1|
## 
## [ Department Breakdown ]
## 
## 
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance    |        18|    75.7|
## |HR         |        22|    71.2|
## |IT         |        21|    80.3|
## |Marketing  |        20|    75.7|
## |Operations |        19|    71.6|
## 
##   CSV sudah ada, skip: reports/company_6_report.csv
## 
## 
## ══════════════════════════════════════════════
##   COMPANY 7  —  AUTOMATED KPI REPORT
## ══════════════════════════════════════════════
## 
## [ Summary Statistics ]
## 
## 
## |Metric                |  Value|
## |:---------------------|------:|
## |Total Employees       | 100.00|
## |Avg Salary (k)        |   8.43|
## |Avg Performance Score |  69.40|
## |Avg KPI Score         |  70.50|
## |KPI High (>85)        |  32.00|
## |KPI Mid (65–85)       |  24.00|
## |KPI Low (<65)         |  44.00|
## 
## [ Top 5 Performers ]
## 
## 
## | Employee ID |Department | Salary (k)| Performance|  KPI|
## |:-----------:|:----------|----------:|-----------:|----:|
## |     79      |Operations |       3.18|        77.7| 99.0|
## |     11      |Operations |       9.97|        90.2| 98.9|
## |     64      |Marketing  |       5.36|        77.8| 98.0|
## |     40      |HR         |      12.21|        95.0| 97.9|
## |     86      |Finance    |      13.85|        76.2| 97.9|
## 
## [ Department Breakdown ]
## 
## 
## |Department | Headcount| Avg KPI|
## |:----------|---------:|-------:|
## |Finance    |        26|    69.9|
## |HR         |        22|    71.1|
## |IT         |        13|    72.2|
## |Marketing  |        19|    74.2|
## |Operations |        20|    66.0|
## 
##   CSV sudah ada, skip: reports/company_7_report.csv



Conclusion

This practicum has demonstrated, in a progressive and integrated manner, how functions, loops, and conditional branching are sufficient to build a complete, end-to-end data science pipeline.

Tasks 1–2 established the foundation: well-validated functions and nested loops for formula evaluation and sales simulation. Tasks 3–4 extended the pattern to classification and hierarchical data generation, reflecting real-world business datasets. Task 5 applied the same loop-and-condition skeleton to stochastic simulation, proving that Monte Carlo methods require no specialised libraries — only random sampling and counting. Task 6 addressed the pre-modelling pipeline through data transformation and feature engineering, confirming that linear transformations preserve distributional shape while rescaling the axis. Task 7 consolidated all prior concepts into a full KPI dashboard at scale (700 records, 7 companies), demonstrating that the same code patterns scale effortlessly. Task 8 completed the cycle with automated report generation, where a single function called by a loop produces consistent, structured outputs for every entity — the hallmark of production-ready reporting pipelines.

Throughout all tasks, R and Python implementations were maintained in parallel. While syntax differs — else if vs. elif, rbind() vs. list.append(), base R vs. pandas — the underlying logic is identical, reinforcing language-agnostic programming thinking that is essential in modern data science practice.




References

  1. Functions and Loops — https://bookdown.org/dsciencelabs/data_science_programming/03-Functions-and-Loops.html
  2. Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
  3. Sievert, C. (2020). Interactive Web-Based Data Visualisation with R, plotly, and shiny. CRC Press.
  4. Plotly Technologies Inc. (2015). Collaborative data science. https://plot.ly
  5. R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation.
  6. Van Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual. CreateSpace.