ITSB

Institut Teknologi Sains Bandung

Academic Year 2026 / 2027

Practicum Week 5

Functions & Loops

Student

Nailatul Wafiroh

NIM 52250003

Student Major in Data Science

Institut Teknologi Sains Bandung

R Programming Data Science Statistics

April 06, 2026

Lecturer

Bakti Siregar, M.Sc., CDS

ITSB Data Science Program

Even Semester 2026/2027

1 Task 1 — Multi-Formula Function

1.1 Description

Build a function compute_formula(x, formula) that computes values for linear, quadratic, cubic, and exponential formulas, then plots all on the same graph for x = 1:20.

compute_formula <- function(x, formula) {
  # Validate formula input
  valid_formulas <- c("linear", "quadratic", "cubic", "exponential")
  if (!formula %in% valid_formulas) {
    stop(paste("Invalid formula. Choose from:", paste(valid_formulas, collapse = ", ")))
  }

  # Compute based on formula type
  if (formula == "linear") {
    return(2 * x + 3)
  } else if (formula == "quadratic") {
    return(x^2 + 2 * x + 1)
  } else if (formula == "cubic") {
    return(x^3 - 2 * x^2 + x - 1)
  } else if (formula == "exponential") {
    return(exp(0.3 * x))
  }
}

# Define x range
x_vals   <- 1:20
formulas <- c("linear", "quadratic", "cubic", "exponential")

# Build result data frame using nested loop
results <- data.frame()

for (formula in formulas) {
  for (x in x_vals) {
    y       <- compute_formula(x, formula)
    results <- rbind(results, data.frame(x = x, y = round(y, 4), formula = formula))
  }
}

# Show sample output (first 3 rows per formula) as table
sample_output <- do.call(rbind, lapply(formulas, function(f) {
  head(results[results$formula == f, ], 3)
}))

knitr::kable(sample_output, align = "c", row.names = FALSE,
             caption = "Sample Output: First 3 Rows per Formula") |>
  kable_styling(full_width = TRUE) |>
  row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
  row_spec(c(1,3,5,7,9,11), extra_css = "background: #ffffff; color: #1a3a5c;") |>
  row_spec(c(2,4,6,8,10,12), extra_css = "background: #f4f1ea; color: #1a3a5c;")

Table 1.1: Table 1.2: Sample Output: First 3 Rows per Formula
x	y	formula
1	5.0000	linear
2	7.0000	linear
3	9.0000	linear
1	4.0000	quadratic
2	9.0000	quadratic
3	16.0000	quadratic
1	-1.0000	cubic
2	1.0000	cubic
3	11.0000	cubic
1	1.3499	exponential
2	1.8221	exponential
3	2.4596	exponential

Figure 1.1: All formulas plotted for x = 1 to 20

Interpretation: The compute_formula() function demonstrates how a single function can dynamically handle multiple mathematical models. From the graph, each formula exhibits a distinct growth pattern. The linear function increases at a constant rate, while the quadratic and cubic functions show accelerating growth due to higher-order terms. The exponential function grows the fastest, especially at larger values of x, indicating multiplicative growth. This comparison highlights how different mathematical models can produce significantly different outcomes within the same input range.

2 Task 2 — Sales Simulation

2.1 Description

Build a function simulate_sales(n_salesperson, days) that generates a dataset of sales_id, day, sales_amount, and discount_rate, with conditional discounts and cumulative sales per salesperson.

set.seed(42)

simulate_sales <- function(n_salesperson, days) {

  # Inner function to calculate cumulative sales
  calc_cumulative <- function(amounts) {
    cum_vals <- numeric(length(amounts))
    running  <- 0
    for (i in seq_along(amounts)) {
      running     <- running + amounts[i]
      cum_vals[i] <- running
    }
    return(cum_vals)
  }

  # Apply discount based on sales amount thresholds
  get_discount <- function(amount) {
    if (amount >= 9000) {
      return(0.20)
    } else if (amount >= 6000) {
      return(0.15)
    } else if (amount >= 3000) {
      return(0.10)
    } else {
      return(0.05)
    }
  }

  # Generate random sales values
  sales_data <- data.frame()

  for (sp in 1:n_salesperson) {
    amounts   <- round(runif(days, min = 1000, max = 12000), 0)
    discounts <- sapply(amounts, get_discount)
    cum_sales <- calc_cumulative(amounts)

    sp_data <- data.frame(
      sales_id         = paste0("SP", sprintf("%02d", sp)),
      day              = 1:days,
      sales_amount     = amounts,
      discount_rate    = discounts,
      net_sales        = amounts * (1 - discounts),
      cumulative_sales = cum_sales
    )

    sales_data <- rbind(sales_data, sp_data)
  }

  return(sales_data)
}

# Run simulation: 5 salespeople over 10 days
sales_df <- simulate_sales(n_salesperson = 5, days = 10)

# Summary statistics table
summary_sales <- sales_df %>%
  group_by(sales_id) %>%
  summarise(
    Total_Sales  = sum(sales_amount),
    Avg_Sales    = round(mean(sales_amount), 2),
    Max_Sales    = max(sales_amount),
    Min_Sales    = min(sales_amount),
    Avg_Discount = paste0(round(mean(discount_rate) * 100, 1), "%"),
    Total_Net    = round(sum(net_sales), 2)
  )

knitr::kable(summary_sales, align = "c", caption = "Summary Statistics per Salesperson") |>
  kable_styling(full_width = TRUE) |>
  row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
  row_spec(seq(1, nrow(summary_sales), 2), extra_css = "background: #ffffff; color: #1a3a5c;") |>
  row_spec(seq(2, nrow(summary_sales), 2), extra_css = "background: #f4f1ea; color: #1a3a5c;")

Table 2.1: Table 2.2: Summary Statistics per Salesperson
sales_id	Total_Sales	Avg_Sales	Max_Sales	Min_Sales	Avg_Discount	Total_Net
SP01	79989	7998.9	11308	2481	15.5%	66365.75
SP02	74902	7490.2	11760	2292	15%	62367.35
SP03	77692	7769.2	11878	1907	14.5%	64272.25
SP04	67115	6711.5	10973	1043	14%	55679.25
SP05	79664	7966.4	11709	1412	14.5%	66407.05

Figure 2.1: Cumulative Sales per Salesperson

Interpretation: The simulation shows that cumulative sales are influenced not only by large individual transactions but also by consistency over time. Salespersons with stable daily performance can achieve competitive cumulative results compared to those with occasional high sales. Additionally, the discount system reduces net sales, creating a trade-off between generating high revenue and maintaining profitability. This reflects real-world business scenarios where discount strategies must be applied carefully.

3 Task 3 — Performance Categorization

3.1 Description

Build a function categorize_performance(sales_amount) with 5 categories: Excellent, Very Good, Good, Average, and Poor. Loop through a vector, calculate percentages, and visualize with a bar plot and pie chart.

categorize_performance <- function(sales_amount) {
  categories <- character(length(sales_amount))

  # Loop through each value and assign category
  for (i in seq_along(sales_amount)) {
    val <- sales_amount[i]
    if (val >= 10000) {
      categories[i] <- "Excellent"
    } else if (val >= 7500) {
      categories[i] <- "Very Good"
    } else if (val >= 5000) {
      categories[i] <- "Good"
    } else if (val >= 2500) {
      categories[i] <- "Average"
    } else {
      categories[i] <- "Poor"
    }
  }
  return(categories)
}

# Generate sales vector
set.seed(123)
sales_vector <- round(runif(200, min = 500, max = 12000), 0)

# Apply categorization
categories <- categorize_performance(sales_vector)

# Build frequency table
cat_table            <- as.data.frame(table(Category = categories))
cat_table$Percentage <- round(cat_table$Freq / sum(cat_table$Freq) * 100, 2)
cat_table$Category   <- factor(cat_table$Category,
                                levels = c("Excellent","Very Good","Good","Average","Poor"))
cat_table <- cat_table[order(cat_table$Category), ]

knitr::kable(cat_table, align = "c",
             col.names = c("Category", "Count", "Percentage (%)"),
             caption   = "Performance Category Distribution") |>
  kable_styling(full_width = TRUE) |>
  row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
  row_spec(c(1,3,5), extra_css = "background: #ffffff; color: #1a3a5c;") |>
  row_spec(c(2,4),   extra_css = "background: #f4f1ea; color: #1a3a5c;")

Table 3.1: Table 3.2: Performance Category Distribution
	Category	Count	Percentage (%)
2	Excellent	33	16.5
5	Very Good	45	22.5
3	Good	45	22.5
1	Average	50	25.0
4	Poor	27	13.5

Figure 3.1: Performance Category Distribution

Interpretation: The categorization process groups sales data into meaningful performance levels based on predefined thresholds. The distribution appears relatively balanced due to the random nature of the data generation. The bar chart highlights the frequency of each category, while the pie chart emphasizes the proportional distribution. This approach simplifies the interpretation of numerical data by converting it into categorical insights that are easier to analyze.

4 Task 4 — Multi-Company Simulation

4.1 Description

Build a function generate_company_data(n_company, n_employees) that generates company_id, employee_id, salary, department, performance_score, and KPI_score, with conditional logic for top performers.

set.seed(2024)

generate_company_data <- function(n_company, n_employees) {

  departments <- c("Finance","Marketing","Operations","IT","HR")
  all_data    <- data.frame()

  # Nested loops: company -> employee
  for (comp in 1:n_company) {
    for (emp in 1:n_employees) {
      salary     <- round(runif(1, 4000, 20000), 0)
      perf_score <- round(runif(1, 50, 100), 1)
      kpi_score  <- round(runif(1, 40, 100), 1)
      dept       <- sample(departments, 1)

      # Apply KPI boost for high-performing employees
      if (perf_score >= 90) {
        kpi_score <- min(100, kpi_score + 10)
      }

      row <- data.frame(
        company_id        = paste0("COMP", sprintf("%02d", comp)),
        employee_id       = paste0("EMP", sprintf("%03d", (comp - 1) * n_employees + emp)),
        salary            = salary,
        department        = dept,
        performance_score = perf_score,
        KPI_score         = kpi_score
      )
      all_data <- rbind(all_data, row)
    }
  }
  return(all_data)
}

# Generate dataset: 4 companies, 15 employees each
company_df <- generate_company_data(n_company = 4, n_employees = 15)

# Summary per company
company_summary <- company_df %>%
  group_by(company_id) %>%
  summarise(
    Avg_Salary      = round(mean(salary), 0),
    Avg_Performance = round(mean(performance_score), 2),
    Max_KPI         = max(KPI_score),
    Top_Performers  = sum(performance_score >= 90)
  )

knitr::kable(company_summary, align = "c", caption = "Summary per Company") |>
  kable_styling(full_width = TRUE) |>
  row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
  row_spec(c(1,3), extra_css = "background: #ffffff; color: #1a3a5c;") |>
  row_spec(c(2,4), extra_css = "background: #f4f1ea; color: #1a3a5c;")

Table 4.1: Table 4.2: Summary per Company
company_id	Avg_Salary	Avg_Performance	Max_KPI	Top_Performers
COMP01	11392	78.54	100.0	5
COMP02	13182	70.89	98.5	2
COMP03	11060	69.44	97.9	1
COMP04	11715	73.14	95.9	3

Figure 4.1: Average Salary and KPI per Company

Interpretation: The generated dataset illustrates how employee-level data can be structured across multiple companies. The inclusion of conditional logic, such as boosting KPI scores for high-performing employees, reflects real-world performance evaluation systems. From the summary, variations between companies can be observed in terms of salary levels and performance metrics, indicating that organizational characteristics can influence overall outcomes.

5 Task 5 — Monte Carlo: Pi & Probability

5.1 Description

Build monte_carlo_pi(n_points) that estimates pi by simulating random points inside a unit circle, plus a probability analysis for points falling in a sub-square.

set.seed(99)

monte_carlo_pi <- function(n_points) {

  x      <- runif(n_points, -1, 1)
  y      <- runif(n_points, -1, 1)
  inside <- integer(n_points)

  # Check whether each point is inside the unit circle
  for (i in 1:n_points) {
    if (x[i]^2 + y[i]^2 <= 1) {
      inside[i] <- 1
    } else {
      inside[i] <- 0
    }
  }

  # Estimate pi
  pi_estimate <- 4 * sum(inside) / n_points

  # Probability inside sub-square
  in_subsquare   <- sum(abs(x) <= 0.5 & abs(y) <= 0.5)
  prob_subsquare <- in_subsquare / n_points

  return(list(
    pi_estimate    = pi_estimate,
    prob_subsquare = prob_subsquare,
    x              = x,
    y              = y,
    inside         = inside
  ))
}

# Run with 5000 points
mc_result <- monte_carlo_pi(5000)

# Display results as table
mc_summary <- data.frame(
  Metric = c("Estimated Pi", "Actual Pi", "Error", "P(Sub-square)", "Theoretical P"),
  Value  = c(
    round(mc_result$pi_estimate, 5),
    round(pi, 5),
    round(abs(mc_result$pi_estimate - pi), 5),
    round(mc_result$prob_subsquare, 4),
    0.25
  )
)

knitr::kable(mc_summary, align = "c", caption = "Monte Carlo Results (n = 5000)") |>
  kable_styling(full_width = TRUE) |>
  row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
  row_spec(c(1,3,5), extra_css = "background: #ffffff; color: #1a3a5c;") |>
  row_spec(c(2,4),   extra_css = "background: #f4f1ea; color: #1a3a5c;")

Table 5.1: Table 5.2: Monte Carlo Results (n = 5000)
Metric	Value
Estimated Pi	3.14960
Actual Pi	3.14159
Error	0.00801
P(Sub-square)	0.25220
Theoretical P	0.25000

Figure 5.1: Monte Carlo: Points Inside vs Outside Circle

Interpretation: The Monte Carlo simulation estimates the value of π by comparing the proportion of randomly generated points inside the unit circle to the total number of points. As the number of points increases, the estimate becomes closer to the true value of π, demonstrating the Law of Large Numbers. Additionally, the probability of points falling within the sub-square approaches its theoretical value, showing that random sampling can effectively approximate geometric probabilities.

6 Task 6 — Data Transformation

6.1 Description

Build functions normalize_columns(df) and z_score(df) for loop-based normalization, then create new engineered features. Visualize distributions before and after transformation.

# Reuse company_df from Task 4
df_raw <- company_df

# Min-Max normalization using loop
normalize_columns <- function(df, cols) {
  df_norm <- df
  for (col in cols) {
    min_val <- min(df[[col]], na.rm = TRUE)
    max_val <- max(df[[col]], na.rm = TRUE)
    df_norm[[paste0(col, "_norm")]] <- (df[[col]] - min_val) / (max_val - min_val)
  }
  return(df_norm)
}

# Z-score standardization using loop
z_score <- function(df, cols) {
  df_z <- df
  for (col in cols) {
    mu    <- mean(df[[col]], na.rm = TRUE)
    sigma <- sd(df[[col]],   na.rm = TRUE)
    df_z[[paste0(col, "_zscore")]] <- (df[[col]] - mu) / sigma
  }
  return(df_z)
}

# Apply transformations
numeric_cols   <- c("salary", "performance_score", "KPI_score")
df_transformed <- normalize_columns(df_raw, numeric_cols)
df_transformed <- z_score(df_transformed, numeric_cols)

# Feature Engineering
df_transformed <- df_transformed %>%
  mutate(
    performance_category = case_when(
      performance_score >= 90 ~ "Excellent",
      performance_score >= 75 ~ "Very Good",
      performance_score >= 60 ~ "Good",
      performance_score >= 45 ~ "Average",
      TRUE                    ~ "Poor"
    ),
    salary_bracket = case_when(
      salary >= 15000 ~ "High",
      salary >= 9000  ~ "Mid",
      TRUE            ~ "Low"
    )
  )

# Display sample as table
knitr::kable(
  head(df_transformed %>%
         select(employee_id, salary, salary_norm, salary_zscore,
                performance_category, salary_bracket), 8),
  align   = "c",
  digits  = 4,
  caption = "Sample Transformed Data (First 8 Rows)"
) |>
  kable_styling(full_width = TRUE) |>
  row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
  row_spec(c(1,3,5,7), extra_css = "background: #ffffff; color: #1a3a5c;") |>
  row_spec(c(2,4,6,8), extra_css = "background: #f4f1ea; color: #1a3a5c;")

Table 6.1: Table 6.2: Sample Transformed Data (First 8 Rows)
employee_id	salary	salary_norm	salary_zscore	performance_category	salary_bracket
EMP001	17391	0.8377	1.2516	Good	High
EMP002	11312	0.4484	-0.1184	Very Good	Mid
EMP003	5905	0.1021	-1.3369	Excellent	Low
EMP004	12253	0.5086	0.0937	Good	Mid
EMP005	6097	0.1144	-1.2936	Good	Low
EMP006	14887	0.6773	0.6873	Excellent	Mid
EMP007	13844	0.6105	0.4522	Excellent	Mid
EMP008	11058	0.4321	-0.1756	Good	Mid

Figure 6.1: Salary Distribution Before and After Transformation

Figure 6.2: Salary Distribution Before and After Transformation

Interpretation: Normalization and standardization transform data into comparable scales without altering the overall distribution shape. Min-Max normalization rescales values into a fixed range, while Z-score standardization measures how far each value deviates from the mean. These techniques are essential in data analysis and machine learning to prevent variables with larger scales from dominating others. Feature engineering further enhances interpretability by grouping raw numerical data into meaningful categories.

7 Task 7 — KPI Dashboard

7.1 Description

Generate a dataset for 5 companies with 50 employees each, summarize KPIs, categorize employees into tiers, and produce advanced visualizations.

set.seed(777)

# Generate dataset: 5 companies x 50 employees
kpi_df <- generate_company_data(n_company = 5, n_employees = 50)

# Categorize employees into KPI tiers based on KPI score
kpi_df$kpi_tier <- ""
for (i in 1:nrow(kpi_df)) {
  kpi <- kpi_df$KPI_score[i]
  if (kpi >= 90) {
    kpi_df$kpi_tier[i] <- "Platinum"
  } else if (kpi >= 75) {
    kpi_df$kpi_tier[i] <- "Gold"
  } else if (kpi >= 60) {
    kpi_df$kpi_tier[i] <- "Silver"
  } else {
    kpi_df$kpi_tier[i] <- "Bronze"
  }
}

# Summary per company
company_kpi_summary <- kpi_df %>%
  group_by(company_id) %>%
  summarise(
    Avg_Salary      = round(mean(salary),            0),
    Avg_KPI         = round(mean(KPI_score),         2),
    Avg_Performance = round(mean(performance_score), 2),
    Top_Performers  = sum(performance_score >= 90),
    Platinum_Count  = sum(kpi_tier == "Platinum")
  )

knitr::kable(company_kpi_summary, align = "c", caption = "Company KPI Dashboard Summary") |>
  kable_styling(full_width = TRUE) |>
  row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
  row_spec(c(1,3,5), extra_css = "background: #ffffff; color: #1a3a5c;") |>
  row_spec(c(2,4),   extra_css = "background: #f4f1ea; color: #1a3a5c;")

Table 7.1: Table 7.2: Company KPI Dashboard Summary
company_id	Avg_Salary	Avg_KPI	Avg_Performance	Top_Performers	Platinum_Count
COMP01	11540	73.46	74.19	11	8
COMP02	11721	69.99	73.88	8	8
COMP03	12835	72.53	73.85	9	9
COMP04	11567	70.45	75.43	11	6
COMP05	11677	74.58	72.97	9	8

Figure 7.1: Company KPI Dashboard

Interpretation: The KPI dashboard provides a comprehensive overview of company performance using multiple metrics. The categorization of employees into KPI tiers helps identify the distribution of performance levels within each company. The visualizations reveal patterns such as the relationship between performance scores and KPI values, as well as differences across companies and departments. This type of analysis supports data-driven decision-making in organizational settings.

8 Task 8 — Automated Report (Bonus)

8.1 Description

Use functions and loops to generate an automated HTML summary report per company, including tables and plots.

if (!exists("kpi_df")) stop("Run Task 7 first to create kpi_df")

library(grid)
library(gridExtra)

# function to generate summary report for each company
generate_company_report <- function(df, company) {
  comp_data <- df[df$company_id == company, ]
  list(
    company        = company,
    n_employees    = nrow(comp_data),
    avg_salary     = round(mean(comp_data$salary), 0),
    avg_kpi        = round(mean(comp_data$KPI_score), 2),
    avg_perf       = round(mean(comp_data$performance_score), 2),
    top_performers = sum(comp_data$performance_score >= 90),
    dominant_dept  = names(which.max(table(comp_data$department)))
  )
}

# generate all reports via loop
companies   <- sort(unique(kpi_df$company_id))
all_reports <- lapply(companies, function(comp) generate_company_report(kpi_df, comp))
names(all_reports) <- companies

# build summary table
report_summary_table <- do.call(rbind, lapply(companies, function(comp) {
  r <- all_reports[[comp]]
  data.frame(
    Company         = r$company,
    Employees       = r$n_employees,
    Avg_Salary      = format(r$avg_salary, big.mark = ","),
    Avg_KPI         = r$avg_kpi,
    Avg_Performance = r$avg_perf,
    Top_Performers  = r$top_performers,
    Dominant_Dept   = r$dominant_dept
  )
}))

# render summary table
knitr::kable(report_summary_table, align = "c", caption = "Automated Report Summary") |>
  kable_styling(full_width = TRUE) |>
  row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
  row_spec(c(1,3,5), extra_css = "background: #ffffff; color: #1a3a5c;") |>
  row_spec(c(2,4),   extra_css = "background: #f4f1ea; color: #1a3a5c;")

Table 8.1: Table 8.2: Automated Report Summary
Company	Employees	Avg_Salary	Avg_KPI	Avg_Performance	Top_Performers	Dominant_Dept
COMP01	50	11,540	73.46	74.19	11	HR
COMP02	50	11,721	69.99	73.88	8	Marketing
COMP03	50	12,835	72.53	73.85	9	Marketing
COMP04	50	11,567	70.45	75.43	11	HR
COMP05	50	11,677	74.58	72.97	9	IT

📊 COMP01

Employees	50
Avg Salary	11,540
Avg KPI	73.46
Avg Performance	74.19
Top Performers	11
Dominant Dept	HR

📊 COMP02

Employees	50
Avg Salary	11,721
Avg KPI	69.99
Avg Performance	73.88
Top Performers	8
Dominant Dept	Marketing

📊 COMP03

Employees	50
Avg Salary	12,835
Avg KPI	72.53
Avg Performance	73.85
Top Performers	9
Dominant Dept	Marketing

📊 COMP04

Employees	50
Avg Salary	11,567
Avg KPI	70.45
Avg Performance	75.43
Top Performers	11
Dominant Dept	HR

📊 COMP05

Employees	50
Avg Salary	11,677
Avg KPI	74.58
Avg Performance	72.97
Top Performers	9
Dominant Dept	IT

# ──  to CSV ─────────────────────────────────────────────
write.csv(report_summary_table, "company_report.csv", row.names = FALSE)

# ── Export to PDF ─────────────────────────────────────────────
pdf("company_report.pdf", width = 11, height = 8.5)

# page 1: summary table
grid.newpage()

# header
grid.rect(x = 0.5, y = 0.93, width = 1, height = 0.13,
          gp = gpar(fill = "#1a3a5c", col = NA))
grid.text("Automated Company Report - Task 8",
          x = 0.5, y = 0.93,
          gp = gpar(col = "white", fontsize = 16, fontface = "bold"))
grid.text("Data Science Programming | ITSB | Even Semester 2026/2027",
          x = 0.5, y = 0.88,
          gp = gpar(col = "white", fontsize = 9))

# summary table
tbl <- tableGrob(
  report_summary_table,
  rows  = NULL,
  theme = ttheme_minimal(
    core    = list(
      fg_params = list(col = "#1a3a5c", fontsize = 9),
      bg_params = list(fill = c("#ffffff", "#f4f1ea"), col = "#dddddd")
    ),
    colhead = list(
      fg_params = list(col = "white", fontface = "bold", fontsize = 10),
      bg_params = list(fill = "#1a3a5c", col = "#1a3a5c")
    )
  )
)
grid.draw(tbl)

# footer
grid.rect(x = 0.5, y = 0.03, width = 1, height = 0.06,
          gp = gpar(fill = "#f4f1ea", col = NA))
grid.text("Generated automatically using functions & loops in R",
          x = 0.5, y = 0.03,
          gp = gpar(col = "#1a3a5c", fontsize = 8, fontface = "italic"))

# page 2: individual company cards
grid.newpage()

# page title
grid.rect(x = 0.5, y = 0.95, width = 1, height = 0.09,
          gp = gpar(fill = "#c9972e", col = NA))
grid.text("Company Summary Cards",
          x = 0.5, y = 0.95,
          gp = gpar(col = "white", fontsize = 14, fontface = "bold"))

# layout: 2 columns x 3 rows for cards
n_comp   <- length(companies)
n_cols   <- 2
n_rows   <- ceiling(n_comp / n_cols)
card_w   <- 0.44
card_h   <- 0.22
x_starts <- c(0.04, 0.52)
y_start  <- 0.83

for (i in seq_along(companies)) {
  r     <- all_reports[[companies[i]]]
  col_i <- ((i - 1) %% n_cols) + 1
  row_i <- ceiling(i / n_cols)
  cx    <- x_starts[col_i]
  cy    <- y_start - (row_i - 1) * (card_h + 0.04)

  # card border
  grid.rect(x = cx + card_w / 2, y = cy - card_h / 2,
            width = card_w, height = card_h,
            gp = gpar(col = "#c9972e", fill = "#fdfaf3", lwd = 1.5))

  # card header
  grid.rect(x = cx + card_w / 2, y = cy - 0.025,
            width = card_w, height = 0.05,
            gp = gpar(fill = "#1a3a5c", col = NA))
  grid.text(r$company,
            x = cx + 0.04, y = cy - 0.025,
            just = "left",
            gp = gpar(col = "white", fontsize = 10, fontface = "bold"))

  # card content
  labels <- c("Employees", "Avg Salary", "Avg KPI",
               "Avg Performance", "Top Performers", "Dominant Dept")
  values <- c(r$n_employees,
               format(r$avg_salary, big.mark = ","),
               r$avg_kpi, r$avg_perf,
               r$top_performers, r$dominant_dept)

  for (j in seq_along(labels)) {
    row_y <- cy - 0.06 - (j - 1) * 0.026
    grid.text(labels[j],
              x = cx + 0.02, y = row_y,
              just = "left",
              gp = gpar(col = "#555555", fontsize = 8))
    grid.text(as.character(values[j]),
              x = cx + card_w - 0.02, y = row_y,
              just = "right",
              gp = gpar(col = "#1a3a5c", fontsize = 8, fontface = "bold"))
  }
}

dev.off()

png 2

Figure 8.1: Automated Report: Company Overview Heatmap

Interpretation: The automated report demonstrates how functions and loops can be used to generate consistent summaries for multiple entities. Each company is analyzed using the same structure, allowing for easy comparison across different metrics. The use of visual summaries, such as heatmaps, enhances the ability to quickly identify patterns and differences between companies, making the reporting process more efficient and scalable.

9 Conclusion

This practicum demonstrates how fundamental programming concepts such as functions, loops, and conditional statements can be applied to solve complex data-related problems in a structured and scalable way. Each task represents a different aspect of data science, including mathematical modeling, simulation, data transformation, and performance analysis.

The use of simulations, such as the Monte Carlo method, highlights the importance of probabilistic approaches in approximating real-world phenomena. In addition, data transformation techniques like normalization and standardization ensure that data is properly prepared for analysis and modeling.

Furthermore, feature engineering and categorization help convert raw data into meaningful insights, while visualization and dashboard development improve interpretability and communication of results. Overall, this practicum not only strengthens technical programming skills but also enhances analytical thinking and the ability to extract insights from data to support decision-making.

10 References

Siregar, B. (2024). Data Science Programming: Functions and Loops.
Retrieved from https://bookdown.org/dsciencelabs/data_science_programming/