Practicum Week-5 focuses on building advanced functions and loops combined with real-world data science workflows in R. The following core competencies are developed:
All tasks rely on the following R packages. Make sure they are installed before knitting.
# ============================================================
# LIBRARIES — load all required packages
# ============================================================
library(ggplot2) # Grammar of Graphics — all visualizations
library(dplyr) # Data wrangling and transformation
library(tidyr) # Data tidying (pivot, gather, spread)
library(scales) # Axis formatting (comma, percent, etc.)
library(knitr) # Table rendering in HTML output
library(kableExtra)# Enhanced HTML/LaTeX table stylingThis practicum uses simulated datasets generated inside each task function. Each dataset mimics real-world data science scenarios:
# ============================================================
# DATASET OVERVIEW — preview structure for Task 4 / 6 / 7
# ============================================================
# Preview the structure of the company dataset that will be
# generated in Task 4 and reused in Tasks 6, 7, and 8.
# Columns: company_id, employee_id, salary, department,
# performance_score, KPI_score, top_performer
preview_structure <- data.frame(
Column = c("company_id","employee_id","salary","department",
"performance_score","KPI_score","top_performer"),
Type = c("character","character","numeric","character",
"numeric","numeric","character"),
Description = c("Unique company identifier (C1–C5)",
"Unique employee identifier per company",
"Monthly salary (IDR 3,000–15,000)",
"Department name (HR/Finance/IT/Marketing/Ops)",
"Performance score (50–100)",
"KPI score (60–100)",
"Top performer flag: Yes if KPI > 90")
)
kable(preview_structure, caption = "Dataset Schema — Company HR Simulation") %>%
kable_styling(bootstrap_options = c("striped","hover","bordered"),
full_width = TRUE, font_size = 14)| Column | Type | Description |
|---|---|---|
| company_id | character | Unique company identifier (C1–C5) |
| employee_id | character | Unique employee identifier per company |
| salary | numeric | Monthly salary (IDR 3,000–15,000) |
| department | character | Department name (HR/Finance/IT/Marketing/Ops) |
| performance_score | numeric | Performance score (50–100) |
| KPI_score | numeric | KPI score (60–100) |
| top_performer | character | Top performer flag: Yes if KPI > 90 |
Build a function compute_formula(x, formula) that computes one of four mathematical formulas: linear, quadratic, cubic, and exponential. The function validates input, uses a loop for computation, and all four results are plotted on the same graph.
formula input — stop with a clear error message if invalid.x = 1:20.# ============================================================
# TASK 1: Dynamic Multi-Formula Function
# compute_formula(x, formula) — returns a numeric vector
# Formulas: linear | quadratic | cubic | exponential
# ============================================================
compute_formula <- function(x, formula) {
# Define allowed formula names
valid_formulas <- c("linear", "quadratic", "cubic", "exponential")
# Input validation: stop if formula is not recognized
if (!(formula %in% valid_formulas)) {
stop(paste("Invalid formula! Choose from:",
paste(valid_formulas, collapse = ", ")))
}
# Pre-allocate result vector for efficiency
result <- numeric(length(x))
# Inner loop: compute formula value at each x[i]
for (i in seq_along(x)) {
if (formula == "linear") {
result[i] <- 2 * x[i] + 3 # f(x) = 2x + 3
} else if (formula == "quadratic") {
result[i] <- x[i]^2 + 2 * x[i] + 1 # f(x) = x² + 2x + 1
} else if (formula == "cubic") {
result[i] <- x[i]^3 - 3 * x[i]^2 + 2 * x[i] # f(x) = x³ - 3x² + 2x
} else if (formula == "exponential") {
result[i] <- exp(0.3 * x[i]) # f(x) = e^(0.3x)
}
}
return(result)
}
# ---- Outer loop: iterate over all 4 formulas ----
x_vals <- 1:20
formulas <- c("linear", "quadratic", "cubic", "exponential")
results_list <- list()
for (f in formulas) {
# Call compute_formula for each formula type
results_list[[f]] <- data.frame(
x = x_vals,
y = compute_formula(x_vals, f),
formula = f
)
}
# Combine all formula results into one tidy data frame
df_formulas <- bind_rows(results_list)
# Show sample values for all 4 formulas at x = 1, 5, 10, 20
df_formulas %>%
filter(x %in% c(1, 5, 10, 20)) %>%
tidyr::pivot_wider(names_from = formula, values_from = y)# ============================================================
# TASK 1: Plot — All 4 Formulas on the Same Graph
# Color-coded lines with points, log scale for visibility
# ============================================================
formula_colors <- c(
"linear" = "#4CAF50",
"quadratic" = "#2196F3",
"cubic" = "#E91E63",
"exponential" = "#FF9800"
)
ggplot(df_formulas, aes(x = x, y = y, color = formula)) +
geom_line(linewidth = 1.5, alpha = 0.9) + # main trend line
geom_point(size = 2.8, alpha = 0.85) + # individual data points
scale_color_manual(values = formula_colors) +
scale_y_log10(labels = comma) + # log scale to show all 4 on same canvas
labs(
title = "Task 1 — Dynamic Multi-Formula Comparison",
subtitle = "Linear · Quadratic · Cubic · Exponential (x = 1 to 20, log-y scale)",
x = "x value",
y = "f(x) — log scale",
color = "Formula Type",
caption = "Source: Practicum Week-5 — compute_formula()"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 15, color = "#4A148C"),
plot.subtitle = element_text(color = "#6A1B9A", size = 10),
legend.position = "top",
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#FAFAFA", color = NA)
)Build simulate_sales(n_salesperson, days) with a nested helper function for discount logic. The outer function loops over salespersons; the inner loop iterates over days. Conditional discounts are applied based on sales amount.
# ============================================================
# TASK 2: Nested Simulation — Multi-Sales & Discounts
# simulate_sales(n_salesperson, days) → tidy data frame
# Nested function: get_discount(amount) inside simulate_sales
# ============================================================
set.seed(42) # for reproducibility
simulate_sales <- function(n_salesperson, days) {
# ---- NESTED HELPER FUNCTION: discount rule ----
# Returns discount rate based on sales amount thresholds
get_discount <- function(amount) {
if (amount > 800) {
return(0.15) # 15% discount for high-value sales
} else if (amount > 500) {
return(0.10) # 10% for medium-value sales
} else {
return(0.05) # 5% for low-value sales
}
}
# ---- END NESTED FUNCTION ----
records <- list() # accumulate rows
idx <- 1 # flat index for list
# OUTER LOOP: iterate over each salesperson
for (sp in 1:n_salesperson) {
cumulative <- 0 # reset cumulative total per salesperson
# INNER LOOP: iterate over each day for this salesperson
for (d in 1:days) {
# Simulate random daily sales amount (200–1000)
amount <- round(runif(1, 200, 1000), 2)
# Apply discount via nested helper function
disc_rate <- get_discount(amount)
net_amount <- amount * (1 - disc_rate)
# Track cumulative net sales for this salesperson
cumulative <- cumulative + net_amount
# Store each record
records[[idx]] <- data.frame(
sales_id = sp,
day = d,
sales_amount = amount,
discount_rate = disc_rate,
net_amount = round(net_amount, 2),
cumulative_net = round(cumulative, 2)
)
idx <- idx + 1
}
}
return(bind_rows(records))
}
# Run: 5 salespersons, 10 days each
sales_df <- simulate_sales(n_salesperson = 5, days = 10)
# Show first 10 rows
head(sales_df, 10)# ============================================================
# TASK 2: Summary statistics aggregated per salesperson
# ============================================================
sales_summary <- sales_df %>%
group_by(sales_id) %>%
summarise(
Total_Sales_Amount = round(sum(sales_amount), 0),
Total_Net_Revenue = round(sum(net_amount), 0),
Avg_Discount_Pct = paste0(round(mean(discount_rate) * 100, 1), "%"),
Max_Cumulative = round(max(cumulative_net), 0),
.groups = "drop"
)
kable(sales_summary,
caption = "Task 2 — Salesperson Performance Summary (5 SP × 10 Days)",
col.names = c("Sales ID","Total Sales","Net Revenue","Avg Discount %","Max Cumulative")) %>%
kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE)| Sales ID | Total Sales | Net Revenue | Avg Discount % | Max Cumulative |
|---|---|---|---|---|
| 1 | 7090 | 6281 | 10.5% | 6281 |
| 2 | 6720 | 5939 | 10.5% | 5939 |
| 3 | 6923 | 6026 | 11.5% | 6026 |
| 4 | 6154 | 5445 | 10% | 5445 |
| 5 | 7067 | 6180 | 11.5% | 6180 |
# ============================================================
# TASK 2: Line Chart — Cumulative Net Sales per Salesperson
# ============================================================
# Convert sales_id to labelled factor for better legend
sales_df$sales_id <- factor(sales_df$sales_id,
labels = paste0("SP-", 1:5))
ggplot(sales_df, aes(x = day, y = cumulative_net, color = sales_id)) +
geom_line(linewidth = 1.3) +
geom_point(size = 3, shape = 21,
aes(fill = sales_id), color = "white", stroke = 1.5) +
scale_color_brewer(palette = "Set1") +
scale_fill_brewer(palette = "Set1") +
scale_y_continuous(labels = comma) +
scale_x_continuous(breaks = 1:10) +
labs(
title = "Task 2 — Cumulative Net Sales per Salesperson (10 Days)",
subtitle = "Discount rule: Amount >800 → 15% | >500 → 10% | else → 5%",
x = "Day",
y = "Cumulative Net Sales",
color = "Salesperson",
fill = "Salesperson",
caption = "Source: Practicum Week-5 — simulate_sales()"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 15, color = "#01579B"),
plot.subtitle = element_text(color = "#0277BD", size = 10),
legend.position = "right",
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#F0F8FF", color = NA)
)get_discount() demonstrates a clean software design pattern: encapsulating business logic inside the parent function to keep code modular and testable.
Build categorize_performance(sales_amount) that assigns one of five categories: Excellent, Very Good, Good, Average, Poor. Output percentage breakdown, bar chart, and pie chart.
# ============================================================
# TASK 3: Multi-Level Performance Categorization
# categorize_performance(x) → character vector of 5 tiers
# Uses a for loop + if-else chain over the full vector
# ============================================================
categorize_performance <- function(sales_amount) {
# Pre-allocate output vector
category <- character(length(sales_amount))
# Loop through every element and assign a category
for (i in seq_along(sales_amount)) {
val <- sales_amount[i]
if (val >= 900) {
category[i] <- "Excellent" # Top tier: amount >= 900
} else if (val >= 700) {
category[i] <- "Very Good" # 700 ≤ amount < 900
} else if (val >= 500) {
category[i] <- "Good" # 500 ≤ amount < 700
} else if (val >= 300) {
category[i] <- "Average" # 300 ≤ amount < 500
} else {
category[i] <- "Poor" # amount < 300 — lowest tier
}
}
return(category)
}
# ---- Apply function to 100 simulated sales values ----
set.seed(42)
raw_sales <- round(runif(100, 100, 1000), 2) # uniform random: 100–1000
categories <- categorize_performance(raw_sales)
# Build data frame with ordered factor for correct plot ordering
perf_df <- data.frame(
sales_amount = raw_sales,
category = factor(categories,
levels = c("Poor","Average","Good","Very Good","Excellent"))
)
# Calculate count and percentage per category
perf_pct <- perf_df %>%
count(category) %>%
mutate(pct = round(n / sum(n) * 100, 1),
label = paste0(n, " (", pct, "%)"))
kable(perf_pct, caption = "Task 3 — Category Distribution (n=100 sales records)",
col.names = c("Category","Count","Percentage (%)","Label")) %>%
kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE)| Category | Count | Percentage (%) | Label |
|---|---|---|---|
| Poor | 22 | 22 | 22 (22%) |
| Average | 18 | 18 | 18 (18%) |
| Good | 21 | 21 | 21 (21%) |
| Very Good | 23 | 23 | 23 (23%) |
| Excellent | 16 | 16 | 16 (16%) |
# ============================================================
# TASK 3: Bar Chart — performance category frequency
# ============================================================
cat_colors <- c("Poor"="#EF9A9A","Average"="#FFCC80",
"Good"="#A5D6A7","Very Good"="#81D4FA","Excellent"="#CE93D8")
ggplot(perf_pct, aes(x = category, y = n, fill = category)) +
geom_col(width = 0.65, show.legend = FALSE, alpha = 0.9) +
geom_text(aes(label = label), vjust = -0.5,
fontface = "bold", size = 4, color = "#333333") +
scale_fill_manual(values = cat_colors) +
labs(
title = "Task 3 — Performance Category Distribution (Bar Chart)",
subtitle = "Based on 100 simulated sales values (uniform: 100–1000)",
x = "Performance Category",
y = "Number of Records",
caption = "Source: Practicum Week-5 — categorize_performance()"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 14, color = "#004D40"),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank()
)# ============================================================
# TASK 3: Pie Chart — proportion by category
# ============================================================
ggplot(perf_pct, aes(x = "", y = n, fill = category)) +
geom_col(width = 1, color = "white", linewidth = 0.9) +
coord_polar(theta = "y") +
geom_text(aes(label = paste0(category, "\n", pct, "%")),
position = position_stack(vjust = 0.5),
fontface = "bold", size = 3.8, color = "#1A1A1A") +
scale_fill_manual(values = cat_colors) +
labs(
title = "Task 3 — Performance Category Distribution (Pie Chart)",
fill = "Category",
caption = "Source: Practicum Week-5 — categorize_performance()"
) +
theme_void() +
theme(
plot.title = element_text(face = "bold", size = 14,
hjust = 0.5, color = "#004D40"),
legend.position = "right"
)categorize_performance() loop. In a real sales scenario with a skewed distribution, the chart would reveal which tier is most concentrated, guiding incentive program design.
Build generate_company_data(n_company, n_employees) using nested loops (outer: company, inner: employee). Conditional KPI logic flags top performers. Output includes a summary table and scatter visualization.
# ============================================================
# TASK 4: Multi-Company Dataset Simulation
# generate_company_data(n_company, n_employees) → data frame
# Nested loops: outer = company, inner = employee
# Conditional logic: KPI > 90 → top_performer = "Yes"
# ============================================================
set.seed(123) # fixed seed for reproducibility
generate_company_data <- function(n_company, n_employees) {
# Possible department names
departments <- c("HR", "Finance", "IT", "Marketing", "Operations")
records <- list() # accumulate all employee records
idx <- 1 # flat row index
# OUTER LOOP: iterate over each company
for (c in 1:n_company) {
# INNER LOOP: iterate over each employee in this company
for (e in 1:n_employees) {
# Generate random numeric attributes
salary <- round(runif(1, 3000, 15000), 0) # monthly salary
dept <- sample(departments, 1) # random department
perf_score <- round(runif(1, 50, 100), 1) # performance score
kpi_score <- round(runif(1, 60, 100), 1) # KPI score
# CONDITIONAL: flag top performers (KPI > 90)
top_performer <- ifelse(kpi_score > 90, "Yes", "No")
# Save record
records[[idx]] <- data.frame(
company_id = paste0("C", c),
employee_id = paste0("C", c, "_E", e),
salary = salary,
department = dept,
performance_score = perf_score,
KPI_score = kpi_score,
top_performer = top_performer,
stringsAsFactors = FALSE
)
idx <- idx + 1
}
}
return(bind_rows(records))
}
# Generate: 3 companies, 20 employees each → 60 rows total
company_df <- generate_company_data(n_company = 3, n_employees = 20)
# Show first 8 rows
head(company_df, 8)# ============================================================
# TASK 4: Company-level aggregation using dplyr
# ============================================================
company_summary <- company_df %>%
group_by(company_id) %>%
summarise(
Avg_Salary = formatC(round(mean(salary), 0), big.mark=",", format="d"),
Avg_Performance = round(mean(performance_score), 2),
Max_KPI = max(KPI_score),
Top_Performers = sum(top_performer == "Yes"), # conditional count
.groups = "drop"
)
kable(company_summary,
caption = "Task 4 — Company Summary (3 Companies × 20 Employees)",
col.names = c("Company","Avg Salary","Avg Performance","Max KPI","Top Performers")) %>%
kable_styling(bootstrap_options = c("striped","hover","bordered"),
full_width = TRUE)| Company | Avg Salary | Avg Performance | Max KPI | Top Performers |
|---|---|---|---|---|
| C1 | 8,147 | 76.97 | 97.6 | 5 |
| C2 | 8,937 | 75.44 | 99.4 | 6 |
| C3 | 8,628 | 68.32 | 98.9 | 7 |
# ============================================================
# TASK 4: Scatter Plot — Salary vs KPI, color by company
# Shape encodes top performer status (KPI > 90)
# ============================================================
ggplot(company_df, aes(x = KPI_score, y = salary,
color = company_id, shape = top_performer)) +
geom_point(size = 3.5, alpha = 0.85) +
# Regression line per company (no standard error band for clarity)
geom_smooth(method = "lm", se = FALSE, linewidth = 1,
aes(group = company_id)) +
scale_color_manual(values = c("C1"="#E91E63","C2"="#2196F3","C3"="#4CAF50")) +
scale_shape_manual(values = c("Yes"=17, "No"=16),
labels = c("Yes"="Top Performer (KPI>90)","No"="Regular")) +
scale_y_continuous(labels = comma) +
labs(
title = "Task 4 — Salary vs KPI Score by Company",
subtitle = "Triangle = Top Performer (KPI > 90) | Lines = Linear trend per company",
x = "KPI Score",
y = "Monthly Salary",
color = "Company",
shape = "Status",
caption = "Source: Practicum Week-5 — generate_company_data()"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 14, color = "#BF360C"),
plot.subtitle = element_text(color = "#D84315", size = 10),
legend.position = "right",
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#FFF8F5", color = NA)
)Build monte_carlo_pi(n_points) that estimates π by checking whether random points in a unit square fall inside a unit circle. A secondary analysis computes the probability of points landing in a sub-square region.
# ============================================================
# TASK 5: Monte Carlo Simulation — Pi Estimation
# monte_carlo_pi(n_points) → list with pi estimate, prob, coords
# Uses a for loop over n_points iterations
# ============================================================
set.seed(7) # fixed seed for reproducibility
monte_carlo_pi <- function(n_points) {
# Generate n random points in the square [-1, 1] x [-1, 1]
x_pts <- runif(n_points, -1, 1)
y_pts <- runif(n_points, -1, 1)
# Initialize tracking vectors
in_circle <- numeric(n_points) # 1 if inside unit circle
in_subsquare <- numeric(n_points) # 1 if inside sub-square [-0.5,0.5]^2
# FOR LOOP: check each point
for (i in 1:n_points) {
# Check if point (x, y) is inside the unit circle (r = 1)
# Condition: x² + y² ≤ 1
if (x_pts[i]^2 + y_pts[i]^2 <= 1) {
in_circle[i] <- 1
}
# Check if point is inside the sub-square [-0.5, 0.5]
# Area of sub-square = 1×1 = 1, out of full square 2×2 = 4
# Expected probability ≈ 1/4 = 0.25
if (abs(x_pts[i]) <= 0.5 & abs(y_pts[i]) <= 0.5) {
in_subsquare[i] <- 1
}
}
# Estimate π: area ratio × 4 = π
pi_estimate <- 4 * sum(in_circle) / n_points
prob_subsquare <- sum(in_subsquare) / n_points
return(list(
pi_estimate = pi_estimate,
prob_subsquare = prob_subsquare,
x = x_pts,
y = y_pts,
in_circle = in_circle
))
}
# Run Monte Carlo with 5,000 random points
mc_result <- monte_carlo_pi(5000)
# Print results with comparison to true π
cat("=== Monte Carlo Simulation Results (n = 5,000) ===\n")=== Monte Carlo Simulation Results (n = 5,000) ===
Estimated Pi : 3.131200
True Pi (R built-in): 3.141593
Absolute Error : 0.010393
Error Percentage : 0.3308%
Sub-square Prob : 0.2592
Expected Prob (1/4) : 0.2500
# ============================================================
# TASK 5: Scatter plot — visualize Monte Carlo sampling
# Green = inside circle, red = outside
# Blue circle overlay = theoretical boundary
# Orange dashed rectangle = sub-square region
# ============================================================
mc_plot_df <- data.frame(
x = mc_result$x,
y = mc_result$y,
in_circle = factor(mc_result$in_circle,
levels = c(0, 1),
labels = c("Outside Circle", "Inside Circle"))
)
ggplot(mc_plot_df, aes(x = x, y = y, color = in_circle)) +
geom_point(size = 0.55, alpha = 0.55) +
# Theoretical unit circle overlay
annotate("path",
x = cos(seq(0, 2 * pi, length.out = 300)),
y = sin(seq(0, 2 * pi, length.out = 300)),
color = "#1A237E", linewidth = 1.2) +
# Sub-square region overlay (dashed orange)
annotate("rect", xmin = -0.5, xmax = 0.5, ymin = -0.5, ymax = 0.5,
color = "#E65100", fill = NA, linewidth = 1.1, linetype = "dashed") +
scale_color_manual(values = c("Outside Circle" = "#EF9A9A",
"Inside Circle" = "#66BB6A")) +
coord_fixed() +
labs(
title = "Task 5 — Monte Carlo Pi Estimation (n = 5,000 points)",
subtitle = paste0("Estimated π = ", round(mc_result$pi_estimate, 5),
" | Sub-square hit rate = ",
round(mc_result$prob_subsquare, 4)),
x = "x coordinate",
y = "y coordinate",
color = "Point Position",
caption = "Blue = unit circle | Orange dashed = sub-square [-0.5, 0.5]²"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 14, color = "#1B5E20"),
plot.subtitle = element_text(color = "#2E7D32", size = 10),
legend.position = "bottom",
panel.grid.minor = element_blank()
)Build normalize_columns(df) and z_score(df) using loop-based iteration over column names. Then engineer two new categorical features. Compare distributions before and after transformation using a histogram (faceted) and a violin + boxplot.
# ============================================================
# TASK 6: Loop-based Normalization & Standardization
# normalize_columns(df) → Min-Max to [0, 1]
# z_score(df) → Zero mean, unit standard deviation
# Both iterate over column names with a for loop
# ============================================================
# ---- Function 1: Min-Max Normalization ----
normalize_columns <- function(df) {
# Identify numeric columns automatically
num_cols <- names(df)[sapply(df, is.numeric)]
result <- df
# Loop over each numeric column
for (col in num_cols) {
min_val <- min(df[[col]], na.rm = TRUE)
max_val <- max(df[[col]], na.rm = TRUE)
# Apply Min-Max formula: (x - min) / (max - min)
result[[col]] <- (df[[col]] - min_val) / (max_val - min_val)
}
return(result)
}
# ---- Function 2: Z-Score Standardization ----
z_score <- function(df) {
num_cols <- names(df)[sapply(df, is.numeric)]
result <- df
for (col in num_cols) {
mu <- mean(df[[col]], na.rm = TRUE) # column mean
sigma <- sd(df[[col]], na.rm = TRUE) # column std dev
# Apply Z-score formula: (x - μ) / σ
result[[col]] <- (df[[col]] - mu) / sigma
}
return(result)
}
# ---- Apply to numeric columns from company dataset (Task 4) ----
company_num <- company_df %>% select(salary, performance_score, KPI_score)
company_norm <- normalize_columns(company_num) # Min-Max
company_zscore <- z_score(company_num) # Z-Score
# Compare summary statistics
cat("--- ORIGINAL ---\n")--- ORIGINAL ---
salary performance_score KPI_score
Min. : 3505 Min. :50.30 Min. :60.00
1st Qu.: 5170 1st Qu.:63.20 1st Qu.:73.15
Median : 8430 Median :70.55 Median :85.35
Mean : 8571 Mean :73.58 Mean :82.64
3rd Qu.:11562 3rd Qu.:82.92 3rd Qu.:91.85
Max. :14609 Max. :99.70 Max. :99.40
--- MIN-MAX NORMALIZED [0,1] ---
salary performance_score KPI_score
Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:0.1499 1st Qu.:0.2611 1st Qu.:0.3338
Median :0.4436 Median :0.4099 Median :0.6434
Mean :0.4562 Mean :0.4712 Mean :0.5746
3rd Qu.:0.7256 3rd Qu.:0.6604 3rd Qu.:0.8084
Max. :1.0000 Max. :1.0000 Max. :1.0000
--- Z-SCORE STANDARDIZED (mean=0, sd=1) ---
salary performance_score KPI_score
Min. :-1.44752 Min. :-1.7673 Min. :-1.9302
1st Qu.:-0.97188 1st Qu.:-0.7878 1st Qu.:-0.8091
Median :-0.04004 Median :-0.2297 Median : 0.2310
Mean : 0.00000 Mean : 0.0000 Mean : 0.0000
3rd Qu.: 0.85473 3rd Qu.: 0.7100 3rd Qu.: 0.7852
Max. : 1.72549 Max. : 1.9837 Max. : 1.4289
# ============================================================
# TASK 6: Feature Engineering — create 2 new categorical columns
# performance_category: Low / Medium / High (from perf_score)
# salary_bracket: Entry / Mid / Top (from salary)
# ============================================================
company_df <- company_df %>%
mutate(
# New feature 1: performance tier based on score
performance_category = case_when(
performance_score >= 85 ~ "High",
performance_score >= 70 ~ "Medium",
TRUE ~ "Low"
),
# New feature 2: salary bracket
salary_bracket = case_when(
salary >= 12000 ~ "Top",
salary >= 7000 ~ "Mid",
TRUE ~ "Entry"
)
)
# Distribution summary of new features
cat("=== New Feature: performance_category ===\n")=== New Feature: performance_category ===
High Low Medium
14 28 18
=== New Feature: salary_bracket ===
Entry Mid Top
23 23 14
# ============================================================
# TASK 6: Histogram Comparison — salary before & after transform
# Faceted into 3 panels: Original | Min-Max | Z-Score
# ============================================================
# Stack all three salary distributions into one tidy data frame
hist_df <- bind_rows(
data.frame(value = company_num$salary, type = "1. Original"),
data.frame(value = company_norm$salary, type = "2. Min-Max Normalized"),
data.frame(value = company_zscore$salary, type = "3. Z-Score Standardized")
)
ggplot(hist_df, aes(x = value, fill = type)) +
geom_histogram(bins = 18, color = "white", alpha = 0.88) +
facet_wrap(~type, scales = "free_x", nrow = 1) +
scale_fill_manual(values = c("1. Original" = "#90CAF9",
"2. Min-Max Normalized" = "#A5D6A7",
"3. Z-Score Standardized"= "#CE93D8")) +
labs(
title = "Task 6 — Salary Distribution: Before vs After Transformation",
subtitle = "Shape is identical across all 3 — only the scale changes",
x = "Value",
y = "Count",
caption = "Source: Practicum Week-5 — normalize_columns() & z_score()"
) +
theme_minimal(base_size = 11) +
theme(
plot.title = element_text(face = "bold", size = 13, color = "#4A148C"),
plot.subtitle = element_text(color = "#6A1B9A", size = 10),
legend.position = "none",
strip.text = element_text(face = "bold", size = 10),
panel.grid.minor = element_blank()
)# ============================================================
# TASK 6: Violin + Boxplot — shows full distribution shape
# Violins reveal the density, boxplots show median & IQR
# ============================================================
company_df$performance_category <- factor(company_df$performance_category,
levels = c("Low","Medium","High"))
ggplot(company_df, aes(x = performance_category, y = salary,
fill = performance_category)) +
# Violin: shows density/distribution shape
geom_violin(alpha = 0.55, trim = FALSE) +
# Boxplot overlay: shows median, IQR, and outliers
geom_boxplot(width = 0.18, alpha = 0.85, outlier.size = 3,
outlier.color = "#B71C1C", color = "#333333") +
# Individual data points
geom_jitter(width = 0.06, alpha = 0.45, size = 2, color = "#455A64") +
scale_fill_manual(values = c("Low"="#FFCDD2","Medium"="#FFF9C4","High"="#C8E6C9")) +
scale_y_continuous(labels = comma) +
labs(
title = "Task 6 — Salary Distribution by Performance Category",
subtitle = "Violin = distribution shape | Box = median & IQR | Points = individual employees",
x = "Performance Category",
y = "Monthly Salary",
fill = "Category",
caption = "Source: Practicum Week-5 — Feature Engineering"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 14, color = "#4A148C"),
plot.subtitle = element_text(color = "#6A1B9A", size = 10),
legend.position = "none",
panel.grid.minor = element_blank()
)Generate a full dataset for 5 companies × 50 employees each (250 rows total). Use a loop to categorize employees into 4 KPI tiers. Output: summary table, grouped bar chart, scatter with regression, department analysis, and salary distribution.
# ============================================================
# TASK 7: Full KPI Dashboard Dataset — 5 companies × 50 employees
# Reuses generate_company_data() from Task 4
# Adds KPI tier classification via a for loop
# ============================================================
set.seed(99)
dashboard_df <- generate_company_data(n_company = 5, n_employees = 50)
# ---- FOR LOOP: classify each employee into a KPI tier ----
kpi_tier <- character(nrow(dashboard_df)) # pre-allocate
for (i in 1:nrow(dashboard_df)) {
kpi <- dashboard_df$KPI_score[i]
if (kpi >= 90) {
kpi_tier[i] <- "Elite" # top 10% KPI achievers
} else if (kpi >= 75) {
kpi_tier[i] <- "Strong" # solid performers
} else if (kpi >= 60) {
kpi_tier[i] <- "Developing" # on track but room to grow
} else {
kpi_tier[i] <- "At Risk" # needs intervention
}
}
# Attach tier as ordered factor
dashboard_df$kpi_tier <- factor(kpi_tier,
levels = c("At Risk","Developing","Strong","Elite"))
cat(sprintf("Dataset: %d employees across %d companies\n",
nrow(dashboard_df), length(unique(dashboard_df$company_id))))Dataset: 250 employees across 5 companies
# ============================================================
# TASK 7: Aggregate KPI dashboard summary per company
# ============================================================
dashboard_summary <- dashboard_df %>%
group_by(company_id) %>%
summarise(
Avg_Salary = formatC(round(mean(salary), 0), big.mark=",", format="d"),
Avg_KPI = round(mean(KPI_score), 2),
Top_Performers = sum(top_performer == "Yes"),
Elite_Count = sum(kpi_tier == "Elite"),
Developing_Count = sum(kpi_tier == "Developing"),
At_Risk_Count = sum(kpi_tier == "At Risk"),
.groups = "drop"
)
kable(dashboard_summary,
caption = "Task 7 — KPI Dashboard: Company Summary (5 Companies × 50 Employees)",
col.names = c("Company","Avg Salary","Avg KPI","Top Performers",
"Elite","Developing","At Risk")) %>%
kable_styling(bootstrap_options = c("striped","hover","bordered"),
full_width = TRUE)| Company | Avg Salary | Avg KPI | Top Performers | Elite | Developing | At Risk |
|---|---|---|---|---|---|---|
| C1 | 8,617 | 82.73 | 15 | 15 | 14 | 0 |
| C2 | 8,928 | 80.05 | 14 | 14 | 20 | 0 |
| C3 | 9,304 | 79.57 | 11 | 11 | 19 | 0 |
| C4 | 8,564 | 79.70 | 10 | 10 | 17 | 0 |
| C5 | 8,492 | 80.15 | 11 | 11 | 14 | 0 |
# ============================================================
# TASK 7: Grouped Bar — employee count per KPI tier, per company
# ============================================================
tier_dist <- dashboard_df %>%
count(company_id, kpi_tier) %>%
group_by(company_id) %>%
mutate(pct = round(n / sum(n) * 100, 1))
tier_colors <- c("At Risk" = "#EF9A9A",
"Developing" = "#FFCC80",
"Strong" = "#81D4FA",
"Elite" = "#CE93D8")
ggplot(tier_dist, aes(x = company_id, y = n, fill = kpi_tier)) +
geom_col(position = "dodge", width = 0.72, alpha = 0.9) +
geom_text(aes(label = paste0(pct, "%")),
position = position_dodge(width = 0.72),
vjust = -0.5, size = 3.2, fontface = "bold") +
scale_fill_manual(values = tier_colors) +
labs(
title = "Task 7 — KPI Tier Distribution per Company",
subtitle = "Elite (≥90) · Strong (75–89) · Developing (60–74) · At Risk (<60)",
x = "Company",
y = "Number of Employees",
fill = "KPI Tier",
caption = "Source: Practicum Week-5 — KPI Dashboard (n=250)"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 14, color = "#8C6900"),
plot.subtitle = element_text(color = "#A07800", size = 10),
legend.position = "top",
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank()
)# ============================================================
# TASK 7: Faceted Scatter — Salary vs KPI per company
# Regression line + 95% CI band per facet
# ============================================================
comp_colors <- c("C1"="#E91E63","C2"="#2196F3","C3"="#4CAF50",
"C4"="#FF9800","C5"="#9C27B0")
ggplot(dashboard_df, aes(x = KPI_score, y = salary, color = company_id)) +
geom_point(aes(shape = kpi_tier), size = 2.5, alpha = 0.72) +
geom_smooth(method = "lm", se = TRUE, linewidth = 1.1,
aes(fill = company_id), alpha = 0.10) +
scale_color_manual(values = comp_colors) +
scale_fill_manual(values = comp_colors) +
scale_y_continuous(labels = comma) +
scale_shape_manual(values = c("At Risk"=4,"Developing"=16,
"Strong"=17,"Elite"=18)) +
facet_wrap(~company_id, nrow = 2) +
labs(
title = "Task 7 — Salary vs KPI Score with Regression Lines (Faceted)",
subtitle = "Shaded area = 95% confidence interval | Shape = KPI tier",
x = "KPI Score",
y = "Monthly Salary",
color = "Company",
shape = "KPI Tier",
caption = "Source: Practicum Week-5 — KPI Dashboard"
) +
theme_minimal(base_size = 11) +
theme(
plot.title = element_text(face = "bold", size = 14, color = "#311B92"),
plot.subtitle = element_text(color = "#4527A0", size = 10),
legend.position = "bottom",
panel.grid.minor = element_blank(),
strip.text = element_text(face = "bold", size = 10)
)# ============================================================
# TASK 7: Horizontal bar — department vs avg salary
# Secondary axis shows avg KPI score
# ============================================================
dept_summary <- dashboard_df %>%
group_by(department) %>%
summarise(
Avg_Salary = mean(salary),
Avg_KPI = mean(KPI_score),
Count = n(),
.groups = "drop"
) %>%
arrange(desc(Avg_Salary))
ggplot(dept_summary, aes(x = reorder(department, Avg_Salary))) +
geom_col(aes(y = Avg_Salary, fill = department), alpha = 0.85, width = 0.55) +
geom_point(aes(y = Avg_KPI * 100), color = "#1A237E", size = 5) +
geom_line(aes(y = Avg_KPI * 100, group = 1),
color = "#1A237E", linewidth = 1.1, linetype = "dashed") +
scale_fill_brewer(palette = "Pastel1") +
scale_y_continuous(
name = "Avg Monthly Salary",
labels = comma,
sec.axis = sec_axis(~. / 100, name = "Avg KPI Score")
) +
coord_flip() +
labs(
title = "Task 7 — Department: Avg Salary & KPI Score",
x = "Department",
fill = "Department",
caption = "Bars = Avg Salary | Blue points/line = Avg KPI (scaled ×100)"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 14, color = "#311B92"),
legend.position = "none",
panel.grid.minor = element_blank()
)# ============================================================
# TASK 7: Area Chart — salary density by KPI tier
# Overlapping filled density curves per tier
# ============================================================
ggplot(dashboard_df, aes(x = salary, fill = kpi_tier, color = kpi_tier)) +
geom_density(alpha = 0.35, linewidth = 0.8) +
scale_fill_manual(values = tier_colors) +
scale_color_manual(values = c("At Risk" = "#C62828",
"Developing" = "#E65100",
"Strong" = "#01579B",
"Elite" = "#6A1B9A")) +
scale_x_continuous(labels = comma) +
labs(
title = "Task 7 — Salary Density by KPI Tier",
subtitle = "Area chart showing salary distribution overlap across all four KPI tiers",
x = "Monthly Salary",
y = "Density",
fill = "KPI Tier",
color = "KPI Tier",
caption = "Source: Practicum Week-5 — KPI Dashboard (n=250)"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 14, color = "#311B92"),
plot.subtitle = element_text(color = "#4527A0", size = 10),
legend.position = "right",
panel.grid.minor = element_blank()
)Use functions + loops to automatically generate a structured company summary report for each company in the dashboard dataset. The report includes: per-company statistics, KPI tier breakdown, department breakdown, top performer listing, and a mini ggplot2 visualization. All content is generated programmatically from a single loop.
# ============================================================
# TASK 8 (BONUS): Automated Report Generation per Company
# generate_company_report(company_id, df) → prints full summary
# Called inside a for loop to process all companies
# ============================================================
generate_company_report <- function(cid, df) {
# Filter data for this specific company
cdata <- df %>% filter(company_id == cid)
# ---- Compute summary statistics ----
avg_salary <- round(mean(cdata$salary), 0)
avg_kpi <- round(mean(cdata$KPI_score), 2)
avg_perf <- round(mean(cdata$performance_score), 2)
n_employees <- nrow(cdata)
# Count top performers (KPI > 90)
n_top <- sum(cdata$top_performer == "Yes")
pct_top <- round(n_top / n_employees * 100, 1)
# KPI tier breakdown (using table)
tier_tbl <- table(cdata$kpi_tier)
# Department breakdown
dept_tbl <- cdata %>%
group_by(department) %>%
summarise(Count = n(), Avg_KPI = round(mean(KPI_score), 1), .groups = "drop") %>%
arrange(desc(Avg_KPI))
# List top 3 employees by KPI
top3 <- cdata %>%
arrange(desc(KPI_score)) %>%
select(employee_id, department, KPI_score, salary) %>%
head(3)
# ---- Print formatted report ----
cat(rep("=", 60), "\n", sep="")
cat(sprintf(" AUTOMATED REPORT — COMPANY %s\n", cid))
cat(rep("=", 60), "\n\n", sep="")
cat(sprintf(" Total Employees : %d\n", n_employees))
cat(sprintf(" Avg Monthly Salary : IDR %s\n", formatC(avg_salary, big.mark=",", format="d")))
cat(sprintf(" Avg KPI Score : %.2f\n", avg_kpi))
cat(sprintf(" Avg Perf Score : %.2f\n", avg_perf))
cat(sprintf(" Top Performers : %d (%.1f%% of workforce)\n\n", n_top, pct_top))
cat(" --- KPI TIER BREAKDOWN ---\n")
for (tier in names(tier_tbl)) {
bar_len <- round(tier_tbl[[tier]] / n_employees * 30) # progress bar scale
bar <- paste0(rep("█", bar_len), collapse = "")
cat(sprintf(" %-12s : %2d employees %s\n", tier, tier_tbl[[tier]], bar))
}
cat("\n --- DEPARTMENT SUMMARY ---\n")
for (r in 1:nrow(dept_tbl)) {
cat(sprintf(" %-12s : %2d emp | Avg KPI = %.1f\n",
dept_tbl$department[r], dept_tbl$Count[r], dept_tbl$Avg_KPI[r]))
}
cat("\n --- TOP 3 PERFORMERS ---\n")
for (r in 1:nrow(top3)) {
cat(sprintf(" #%d %-12s | Dept: %-12s | KPI: %.1f | Salary: %s\n",
r,
top3$employee_id[r],
top3$department[r],
top3$KPI_score[r],
formatC(top3$salary[r], big.mark=",", format="d")))
}
cat("\n")
}# ============================================================
# TASK 8 (BONUS): Loop over all companies and print reports
# ============================================================
# Get sorted list of company IDs
all_companies <- sort(unique(dashboard_df$company_id))
# MAIN LOOP: generate report for each company automatically
for (cid in all_companies) {
generate_company_report(cid, dashboard_df)
}============================================================
AUTOMATED REPORT — COMPANY C1
============================================================
Total Employees : 50
Avg Monthly Salary : IDR 8,617
Avg KPI Score : 82.73
Avg Perf Score : 72.66
Top Performers : 15 (30.0% of workforce)
--- KPI TIER BREAKDOWN ---
At Risk : 0 employees
Developing : 14 employees ████████
Strong : 21 employees █████████████
Elite : 15 employees █████████
--- DEPARTMENT SUMMARY ---
HR : 5 emp | Avg KPI = 94.1
IT : 8 emp | Avg KPI = 87.1
Operations : 10 emp | Avg KPI = 84.8
Marketing : 12 emp | Avg KPI = 81.2
Finance : 15 emp | Avg KPI = 76.5
--- TOP 3 PERFORMERS ---
#1 C1_E25 | Dept: HR | KPI: 99.9 | Salary: 4,893
#2 C1_E1 | Dept: HR | KPI: 99.7 | Salary: 10,017
#3 C1_E27 | Dept: IT | KPI: 99.6 | Salary: 5,724
============================================================
AUTOMATED REPORT — COMPANY C2
============================================================
Total Employees : 50
Avg Monthly Salary : IDR 8,928
Avg KPI Score : 80.05
Avg Perf Score : 74.62
Top Performers : 14 (28.0% of workforce)
--- KPI TIER BREAKDOWN ---
At Risk : 0 employees
Developing : 20 employees ████████████
Strong : 16 employees ██████████
Elite : 14 employees ████████
--- DEPARTMENT SUMMARY ---
IT : 7 emp | Avg KPI = 82.6
Finance : 12 emp | Avg KPI = 81.7
HR : 12 emp | Avg KPI = 81.0
Marketing : 10 emp | Avg KPI = 77.7
Operations : 9 emp | Avg KPI = 77.3
--- TOP 3 PERFORMERS ---
#1 C2_E12 | Dept: Finance | KPI: 99.4 | Salary: 10,431
#2 C2_E27 | Dept: Marketing | KPI: 98.6 | Salary: 10,790
#3 C2_E9 | Dept: Marketing | KPI: 97.8 | Salary: 13,123
============================================================
AUTOMATED REPORT — COMPANY C3
============================================================
Total Employees : 50
Avg Monthly Salary : IDR 9,304
Avg KPI Score : 79.57
Avg Perf Score : 77.65
Top Performers : 11 (22.0% of workforce)
--- KPI TIER BREAKDOWN ---
At Risk : 0 employees
Developing : 19 employees ███████████
Strong : 20 employees ████████████
Elite : 11 employees ███████
--- DEPARTMENT SUMMARY ---
Operations : 11 emp | Avg KPI = 85.4
Finance : 6 emp | Avg KPI = 81.2
IT : 11 emp | Avg KPI = 78.2
HR : 13 emp | Avg KPI = 77.4
Marketing : 9 emp | Avg KPI = 76.2
--- TOP 3 PERFORMERS ---
#1 C3_E7 | Dept: Operations | KPI: 99.4 | Salary: 5,298
#2 C3_E25 | Dept: Operations | KPI: 97.6 | Salary: 12,027
#3 C3_E31 | Dept: IT | KPI: 97.6 | Salary: 13,232
============================================================
AUTOMATED REPORT — COMPANY C4
============================================================
Total Employees : 50
Avg Monthly Salary : IDR 8,564
Avg KPI Score : 79.70
Avg Perf Score : 76.71
Top Performers : 10 (20.0% of workforce)
--- KPI TIER BREAKDOWN ---
At Risk : 0 employees
Developing : 17 employees ██████████
Strong : 23 employees ██████████████
Elite : 10 employees ██████
--- DEPARTMENT SUMMARY ---
HR : 9 emp | Avg KPI = 80.6
Operations : 17 emp | Avg KPI = 80.2
Finance : 11 emp | Avg KPI = 79.8
IT : 8 emp | Avg KPI = 79.0
Marketing : 5 emp | Avg KPI = 77.2
--- TOP 3 PERFORMERS ---
#1 C4_E7 | Dept: Finance | KPI: 99.6 | Salary: 4,168
#2 C4_E31 | Dept: Finance | KPI: 99.1 | Salary: 8,905
#3 C4_E22 | Dept: IT | KPI: 97.4 | Salary: 13,243
============================================================
AUTOMATED REPORT — COMPANY C5
============================================================
Total Employees : 50
Avg Monthly Salary : IDR 8,492
Avg KPI Score : 80.15
Avg Perf Score : 71.12
Top Performers : 11 (22.0% of workforce)
--- KPI TIER BREAKDOWN ---
At Risk : 0 employees
Developing : 14 employees ████████
Strong : 25 employees ███████████████
Elite : 11 employees ███████
--- DEPARTMENT SUMMARY ---
HR : 12 emp | Avg KPI = 83.2
Marketing : 13 emp | Avg KPI = 82.8
Operations : 7 emp | Avg KPI = 79.8
IT : 7 emp | Avg KPI = 77.4
Finance : 11 emp | Avg KPI = 75.7
--- TOP 3 PERFORMERS ---
#1 C5_E8 | Dept: HR | KPI: 98.4 | Salary: 3,197
#2 C5_E12 | Dept: Marketing | KPI: 97.5 | Salary: 7,022
#3 C5_E26 | Dept: Marketing | KPI: 97.1 | Salary: 3,995
# ============================================================
# TASK 8 (BONUS): Export company summary to CSV
# Demonstrates automated file output from a function
# ============================================================
# Build export data frame using a loop
export_rows <- list()
for (cid in all_companies) {
cdata <- dashboard_df %>% filter(company_id == cid)
export_rows[[cid]] <- data.frame(
company_id = cid,
n_employees = nrow(cdata),
avg_salary = round(mean(cdata$salary), 0),
avg_kpi = round(mean(cdata$KPI_score), 2),
avg_performance = round(mean(cdata$performance_score), 2),
top_performers = sum(cdata$top_performer == "Yes"),
elite_count = sum(cdata$kpi_tier == "Elite"),
at_risk_count = sum(cdata$kpi_tier == "At Risk")
)
}
export_df <- bind_rows(export_rows)
# Write to CSV (will appear in knit working directory)
write.csv(export_df, "company_kpi_summary.csv", row.names = FALSE)
cat("CSV exported: company_kpi_summary.csv\n\n")CSV exported: company_kpi_summary.csv
kable(export_df, caption = "Task 8 (Bonus) — Automated Export: Company KPI Summary") %>%
kable_styling(bootstrap_options = c("striped","hover","bordered"), full_width = TRUE)| company_id | n_employees | avg_salary | avg_kpi | avg_performance | top_performers | elite_count | at_risk_count |
|---|---|---|---|---|---|---|---|
| C1 | 50 | 8617 | 82.73 | 72.66 | 15 | 15 | 0 |
| C2 | 50 | 8928 | 80.05 | 74.62 | 14 | 14 | 0 |
| C3 | 50 | 9304 | 79.57 | 77.65 | 11 | 11 | 0 |
| C4 | 50 | 8564 | 79.70 | 76.71 | 10 | 10 | 0 |
| C5 | 50 | 8492 | 80.15 | 71.12 | 11 | 11 | 0 |
# ============================================================
# TASK 8 (BONUS): Loop-generated plots — KPI distribution
# Uses a for loop to build one ggplot per company,
# then patchworks them into a single dashboard figure
# ============================================================
# Build a list of plots (one per company)
plot_list <- list()
for (cid in all_companies) {
cdata <- dashboard_df %>% filter(company_id == cid)
p <- ggplot(cdata, aes(x = kpi_tier, fill = kpi_tier)) +
geom_bar(alpha = 0.88, show.legend = FALSE) +
geom_text(stat = "count", aes(label = after_stat(count)),
vjust = -0.4, fontface = "bold", size = 3.8) +
scale_fill_manual(values = tier_colors) +
scale_y_continuous(limits = c(0, 28)) +
labs(title = paste0("Company ", cid),
x = NULL, y = "Employees") +
theme_minimal(base_size = 10) +
theme(
plot.title = element_text(face = "bold", color = "#311B92", size = 11),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank()
)
plot_list[[cid]] <- p
}
# Use gridExtra to arrange all 5 plots in one figure
library(gridExtra)
grid.arrange(grobs = plot_list, nrow = 2,
top = "Task 8 (Bonus) — Automated KPI Tier Chart per Company")generate_company_report() function encapsulates all the summary logic, and a for loop runs it automatically for every company without any manual repetition. The CSV export and the loop-generated grid of plots show how this pattern scales to any number of companies. In a real business context, this approach would allow analysts to refresh an entire portfolio-level HR report by simply re-running one loop — a core principle of reproducible data science workflows.
# ============================================================
# 11. Summary Table — all 8 tasks
# ============================================================
summary_df <- data.frame(
Task = c("Task 1","Task 2","Task 3","Task 4",
"Task 5","Task 6","Task 7","Task 8 ⭐"),
Concept = c(
"Dynamic Function + Nested Loop",
"Nested Simulation + Discount Logic",
"Categorization Function + Loop",
"Multi-Company Data Generation",
"Monte Carlo Pi Estimation",
"Normalization & Feature Engineering",
"KPI Dashboard — Mini Project",
"Automated Report Generation (Bonus)"
),
Key_Function = c(
"compute_formula(x, formula)",
"simulate_sales(n_sp, days) + get_discount()",
"categorize_performance(sales_amount)",
"generate_company_data(n_co, n_emp)",
"monte_carlo_pi(n_points)",
"normalize_columns(df) + z_score(df)",
"Integrated pipeline across Tasks 4–6",
"generate_company_report(cid, df)"
),
Visualization = c(
"Multi-line chart (log scale)",
"Cumulative line chart per salesperson",
"Bar chart + pie chart",
"Scatter with regression + summary table",
"Point plot with circle & sub-square overlay",
"Faceted histogram + violin+boxplot",
"Grouped bar + faceted scatter + area chart",
"Automated text reports + grid of bar charts"
),
check.names = FALSE
)
kable(summary_df,
caption = "Practicum Week-5 — Complete Task Summary",
col.names = c("Task","Concept","Key Function","Visualization")) %>%
kable_styling(bootstrap_options = c("striped","hover","bordered"),
full_width = TRUE, font_size = 13)| Task | Concept | Key Function | Visualization |
|---|---|---|---|
| Task 1 | Dynamic Function + Nested Loop | compute_formula(x, formula) | Multi-line chart (log scale) |
| Task 2 | Nested Simulation + Discount Logic | simulate_sales(n_sp, days) + get_discount() | Cumulative line chart per salesperson |
| Task 3 | Categorization Function + Loop | categorize_performance(sales_amount) | Bar chart + pie chart |
| Task 4 | Multi-Company Data Generation | generate_company_data(n_co, n_emp) | Scatter with regression + summary table |
| Task 5 | Monte Carlo Pi Estimation | monte_carlo_pi(n_points) | Point plot with circle & sub-square overlay |
| Task 6 | Normalization & Feature Engineering | normalize_columns(df) + z_score(df) | Faceted histogram + violin+boxplot |
| Task 7 | KPI Dashboard — Mini Project | Integrated pipeline across Tasks 4–6 | Grouped bar + faceted scatter + area chart |
| Task 8 ⭐ | | utomated Report Generation (Bonus) | | enerate_company_report(cid, df) | | utomated text reports + grid of bar charts | |
Practicum Week-5 has successfully demonstrated the integration of functions, loops, simulation, and ggplot2 visualization in R for advanced data science workflows. The following key points were established:
get_discount) from simulation orchestration (simulate_sales) — the single responsibility principle in action.Together, these eight tasks confirm that reusable functions + disciplined looping + expressive visualization + automated reporting form the complete toolkit of a professional data scientist working in R.