ITSB
ITSB
Institut Teknologi Sains Bandung
Academic Year 2026 / 2027
Practicum Week 5
Functions & Loops
Student
👤
Nailatul Wafiroh
NIM 52250003
Student Major in Data Science
Institut Teknologi Sains Bandung
R Programming Data Science Statistics
April 06, 2026
Lecturer
👨
Bakti Siregar, M.Sc., CDS
ITSB Data Science Program
Even Semester 2026/2027

1 Task 1 — Multi-Formula Function

1.1 Description

Build a function compute_formula(x, formula) that computes values for linear, quadratic, cubic, and exponential formulas, then plots all on the same graph for x = 1:20.

compute_formula <- function(x, formula) {
  # Validate formula input
  valid_formulas <- c("linear", "quadratic", "cubic", "exponential")
  if (!formula %in% valid_formulas) {
    stop(paste("Invalid formula. Choose from:", paste(valid_formulas, collapse = ", ")))
  }

  # Compute based on formula type
  if (formula == "linear") {
    return(2 * x + 3)
  } else if (formula == "quadratic") {
    return(x^2 + 2 * x + 1)
  } else if (formula == "cubic") {
    return(x^3 - 2 * x^2 + x - 1)
  } else if (formula == "exponential") {
    return(exp(0.3 * x))
  }
}

# Define x range
x_vals   <- 1:20
formulas <- c("linear", "quadratic", "cubic", "exponential")

# Build result data frame using nested loop
results <- data.frame()

for (formula in formulas) {
  for (x in x_vals) {
    y       <- compute_formula(x, formula)
    results <- rbind(results, data.frame(x = x, y = round(y, 4), formula = formula))
  }
}

# Show sample output (first 3 rows per formula) as table
sample_output <- do.call(rbind, lapply(formulas, function(f) {
  head(results[results$formula == f, ], 3)
}))

knitr::kable(sample_output, align = "c", row.names = FALSE,
             caption = "Sample Output: First 3 Rows per Formula") |>
  kable_styling(full_width = TRUE) |>
  row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
  row_spec(c(1,3,5,7,9,11), extra_css = "background: #ffffff; color: #1a3a5c;") |>
  row_spec(c(2,4,6,8,10,12), extra_css = "background: #f4f1ea; color: #1a3a5c;")
Table 1.1: Table 1.2: Sample Output: First 3 Rows per Formula
x y formula
1 5.0000 linear
2 7.0000 linear
3 9.0000 linear
1 4.0000 quadratic
2 9.0000 quadratic
3 16.0000 quadratic
1 -1.0000 cubic
2 1.0000 cubic
3 11.0000 cubic
1 1.3499 exponential
2 1.8221 exponential
3 2.4596 exponential
All formulas plotted for x = 1 to 20

Figure 1.1: All formulas plotted for x = 1 to 20

Interpretation: The compute_formula() function demonstrates how a single function can dynamically handle multiple mathematical models. From the graph, each formula exhibits a distinct growth pattern. The linear function increases at a constant rate, while the quadratic and cubic functions show accelerating growth due to higher-order terms. The exponential function grows the fastest, especially at larger values of x, indicating multiplicative growth. This comparison highlights how different mathematical models can produce significantly different outcomes within the same input range.


2 Task 2 — Sales Simulation

2.1 Description

Build a function simulate_sales(n_salesperson, days) that generates a dataset of sales_id, day, sales_amount, and discount_rate, with conditional discounts and cumulative sales per salesperson.

set.seed(42)

simulate_sales <- function(n_salesperson, days) {

  # Inner function to calculate cumulative sales
  calc_cumulative <- function(amounts) {
    cum_vals <- numeric(length(amounts))
    running  <- 0
    for (i in seq_along(amounts)) {
      running     <- running + amounts[i]
      cum_vals[i] <- running
    }
    return(cum_vals)
  }

  # Apply discount based on sales amount thresholds
  get_discount <- function(amount) {
    if (amount >= 9000) {
      return(0.20)
    } else if (amount >= 6000) {
      return(0.15)
    } else if (amount >= 3000) {
      return(0.10)
    } else {
      return(0.05)
    }
  }

  # Generate random sales values
  sales_data <- data.frame()

  for (sp in 1:n_salesperson) {
    amounts   <- round(runif(days, min = 1000, max = 12000), 0)
    discounts <- sapply(amounts, get_discount)
    cum_sales <- calc_cumulative(amounts)

    sp_data <- data.frame(
      sales_id         = paste0("SP", sprintf("%02d", sp)),
      day              = 1:days,
      sales_amount     = amounts,
      discount_rate    = discounts,
      net_sales        = amounts * (1 - discounts),
      cumulative_sales = cum_sales
    )

    sales_data <- rbind(sales_data, sp_data)
  }

  return(sales_data)
}

# Run simulation: 5 salespeople over 10 days
sales_df <- simulate_sales(n_salesperson = 5, days = 10)

# Summary statistics table
summary_sales <- sales_df %>%
  group_by(sales_id) %>%
  summarise(
    Total_Sales  = sum(sales_amount),
    Avg_Sales    = round(mean(sales_amount), 2),
    Max_Sales    = max(sales_amount),
    Min_Sales    = min(sales_amount),
    Avg_Discount = paste0(round(mean(discount_rate) * 100, 1), "%"),
    Total_Net    = round(sum(net_sales), 2)
  )

knitr::kable(summary_sales, align = "c", caption = "Summary Statistics per Salesperson") |>
  kable_styling(full_width = TRUE) |>
  row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
  row_spec(seq(1, nrow(summary_sales), 2), extra_css = "background: #ffffff; color: #1a3a5c;") |>
  row_spec(seq(2, nrow(summary_sales), 2), extra_css = "background: #f4f1ea; color: #1a3a5c;")
Table 2.1: Table 2.2: Summary Statistics per Salesperson
sales_id Total_Sales Avg_Sales Max_Sales Min_Sales Avg_Discount Total_Net
SP01 79989 7998.9 11308 2481 15.5% 66365.75
SP02 74902 7490.2 11760 2292 15% 62367.35
SP03 77692 7769.2 11878 1907 14.5% 64272.25
SP04 67115 6711.5 10973 1043 14% 55679.25
SP05 79664 7966.4 11709 1412 14.5% 66407.05
Cumulative Sales per Salesperson

Figure 2.1: Cumulative Sales per Salesperson

Interpretation: The simulation shows that cumulative sales are influenced not only by large individual transactions but also by consistency over time. Salespersons with stable daily performance can achieve competitive cumulative results compared to those with occasional high sales. Additionally, the discount system reduces net sales, creating a trade-off between generating high revenue and maintaining profitability. This reflects real-world business scenarios where discount strategies must be applied carefully.


3 Task 3 — Performance Categorization

3.1 Description

Build a function categorize_performance(sales_amount) with 5 categories: Excellent, Very Good, Good, Average, and Poor. Loop through a vector, calculate percentages, and visualize with a bar plot and pie chart.

categorize_performance <- function(sales_amount) {
  categories <- character(length(sales_amount))

  # Loop through each value and assign category
  for (i in seq_along(sales_amount)) {
    val <- sales_amount[i]
    if (val >= 10000) {
      categories[i] <- "Excellent"
    } else if (val >= 7500) {
      categories[i] <- "Very Good"
    } else if (val >= 5000) {
      categories[i] <- "Good"
    } else if (val >= 2500) {
      categories[i] <- "Average"
    } else {
      categories[i] <- "Poor"
    }
  }
  return(categories)
}

# Generate sales vector
set.seed(123)
sales_vector <- round(runif(200, min = 500, max = 12000), 0)

# Apply categorization
categories <- categorize_performance(sales_vector)

# Build frequency table
cat_table            <- as.data.frame(table(Category = categories))
cat_table$Percentage <- round(cat_table$Freq / sum(cat_table$Freq) * 100, 2)
cat_table$Category   <- factor(cat_table$Category,
                                levels = c("Excellent","Very Good","Good","Average","Poor"))
cat_table <- cat_table[order(cat_table$Category), ]

knitr::kable(cat_table, align = "c",
             col.names = c("Category", "Count", "Percentage (%)"),
             caption   = "Performance Category Distribution") |>
  kable_styling(full_width = TRUE) |>
  row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
  row_spec(c(1,3,5), extra_css = "background: #ffffff; color: #1a3a5c;") |>
  row_spec(c(2,4),   extra_css = "background: #f4f1ea; color: #1a3a5c;")
Table 3.1: Table 3.2: Performance Category Distribution
Category Count Percentage (%)
2 Excellent 33 16.5
5 Very Good 45 22.5
3 Good 45 22.5
1 Average 50 25.0
4 Poor 27 13.5
Performance Category Distribution

Figure 3.1: Performance Category Distribution

Interpretation: The categorization process groups sales data into meaningful performance levels based on predefined thresholds. The distribution appears relatively balanced due to the random nature of the data generation. The bar chart highlights the frequency of each category, while the pie chart emphasizes the proportional distribution. This approach simplifies the interpretation of numerical data by converting it into categorical insights that are easier to analyze.


4 Task 4 — Multi-Company Simulation

4.1 Description

Build a function generate_company_data(n_company, n_employees) that generates company_id, employee_id, salary, department, performance_score, and KPI_score, with conditional logic for top performers.

set.seed(2024)

generate_company_data <- function(n_company, n_employees) {

  departments <- c("Finance","Marketing","Operations","IT","HR")
  all_data    <- data.frame()

  # Nested loops: company -> employee
  for (comp in 1:n_company) {
    for (emp in 1:n_employees) {
      salary     <- round(runif(1, 4000, 20000), 0)
      perf_score <- round(runif(1, 50, 100), 1)
      kpi_score  <- round(runif(1, 40, 100), 1)
      dept       <- sample(departments, 1)

      # Apply KPI boost for high-performing employees
      if (perf_score >= 90) {
        kpi_score <- min(100, kpi_score + 10)
      }

      row <- data.frame(
        company_id        = paste0("COMP", sprintf("%02d", comp)),
        employee_id       = paste0("EMP", sprintf("%03d", (comp - 1) * n_employees + emp)),
        salary            = salary,
        department        = dept,
        performance_score = perf_score,
        KPI_score         = kpi_score
      )
      all_data <- rbind(all_data, row)
    }
  }
  return(all_data)
}

# Generate dataset: 4 companies, 15 employees each
company_df <- generate_company_data(n_company = 4, n_employees = 15)

# Summary per company
company_summary <- company_df %>%
  group_by(company_id) %>%
  summarise(
    Avg_Salary      = round(mean(salary), 0),
    Avg_Performance = round(mean(performance_score), 2),
    Max_KPI         = max(KPI_score),
    Top_Performers  = sum(performance_score >= 90)
  )

knitr::kable(company_summary, align = "c", caption = "Summary per Company") |>
  kable_styling(full_width = TRUE) |>
  row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
  row_spec(c(1,3), extra_css = "background: #ffffff; color: #1a3a5c;") |>
  row_spec(c(2,4), extra_css = "background: #f4f1ea; color: #1a3a5c;")
Table 4.1: Table 4.2: Summary per Company
company_id Avg_Salary Avg_Performance Max_KPI Top_Performers
COMP01 11392 78.54 100.0 5
COMP02 13182 70.89 98.5 2
COMP03 11060 69.44 97.9 1
COMP04 11715 73.14 95.9 3
Average Salary and KPI per Company

Figure 4.1: Average Salary and KPI per Company

Interpretation: The generated dataset illustrates how employee-level data can be structured across multiple companies. The inclusion of conditional logic, such as boosting KPI scores for high-performing employees, reflects real-world performance evaluation systems. From the summary, variations between companies can be observed in terms of salary levels and performance metrics, indicating that organizational characteristics can influence overall outcomes.


5 Task 5 — Monte Carlo: Pi & Probability

5.1 Description

Build monte_carlo_pi(n_points) that estimates pi by simulating random points inside a unit circle, plus a probability analysis for points falling in a sub-square.

set.seed(99)

monte_carlo_pi <- function(n_points) {

  x      <- runif(n_points, -1, 1)
  y      <- runif(n_points, -1, 1)
  inside <- integer(n_points)

  # Check whether each point is inside the unit circle
  for (i in 1:n_points) {
    if (x[i]^2 + y[i]^2 <= 1) {
      inside[i] <- 1
    } else {
      inside[i] <- 0
    }
  }

  # Estimate pi
  pi_estimate <- 4 * sum(inside) / n_points

  # Probability inside sub-square
  in_subsquare   <- sum(abs(x) <= 0.5 & abs(y) <= 0.5)
  prob_subsquare <- in_subsquare / n_points

  return(list(
    pi_estimate    = pi_estimate,
    prob_subsquare = prob_subsquare,
    x              = x,
    y              = y,
    inside         = inside
  ))
}

# Run with 5000 points
mc_result <- monte_carlo_pi(5000)

# Display results as table
mc_summary <- data.frame(
  Metric = c("Estimated Pi", "Actual Pi", "Error", "P(Sub-square)", "Theoretical P"),
  Value  = c(
    round(mc_result$pi_estimate, 5),
    round(pi, 5),
    round(abs(mc_result$pi_estimate - pi), 5),
    round(mc_result$prob_subsquare, 4),
    0.25
  )
)

knitr::kable(mc_summary, align = "c", caption = "Monte Carlo Results (n = 5000)") |>
  kable_styling(full_width = TRUE) |>
  row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
  row_spec(c(1,3,5), extra_css = "background: #ffffff; color: #1a3a5c;") |>
  row_spec(c(2,4),   extra_css = "background: #f4f1ea; color: #1a3a5c;")
Table 5.1: Table 5.2: Monte Carlo Results (n = 5000)
Metric Value
Estimated Pi 3.14960
Actual Pi 3.14159
Error 0.00801
P(Sub-square) 0.25220
Theoretical P 0.25000
Monte Carlo: Points Inside vs Outside Circle

Figure 5.1: Monte Carlo: Points Inside vs Outside Circle

Interpretation: The Monte Carlo simulation estimates the value of π by comparing the proportion of randomly generated points inside the unit circle to the total number of points. As the number of points increases, the estimate becomes closer to the true value of π, demonstrating the Law of Large Numbers. Additionally, the probability of points falling within the sub-square approaches its theoretical value, showing that random sampling can effectively approximate geometric probabilities.


6 Task 6 — Data Transformation

6.1 Description

Build functions normalize_columns(df) and z_score(df) for loop-based normalization, then create new engineered features. Visualize distributions before and after transformation.

# Reuse company_df from Task 4
df_raw <- company_df

# Min-Max normalization using loop
normalize_columns <- function(df, cols) {
  df_norm <- df
  for (col in cols) {
    min_val <- min(df[[col]], na.rm = TRUE)
    max_val <- max(df[[col]], na.rm = TRUE)
    df_norm[[paste0(col, "_norm")]] <- (df[[col]] - min_val) / (max_val - min_val)
  }
  return(df_norm)
}

# Z-score standardization using loop
z_score <- function(df, cols) {
  df_z <- df
  for (col in cols) {
    mu    <- mean(df[[col]], na.rm = TRUE)
    sigma <- sd(df[[col]],   na.rm = TRUE)
    df_z[[paste0(col, "_zscore")]] <- (df[[col]] - mu) / sigma
  }
  return(df_z)
}

# Apply transformations
numeric_cols   <- c("salary", "performance_score", "KPI_score")
df_transformed <- normalize_columns(df_raw, numeric_cols)
df_transformed <- z_score(df_transformed, numeric_cols)

# Feature Engineering
df_transformed <- df_transformed %>%
  mutate(
    performance_category = case_when(
      performance_score >= 90 ~ "Excellent",
      performance_score >= 75 ~ "Very Good",
      performance_score >= 60 ~ "Good",
      performance_score >= 45 ~ "Average",
      TRUE                    ~ "Poor"
    ),
    salary_bracket = case_when(
      salary >= 15000 ~ "High",
      salary >= 9000  ~ "Mid",
      TRUE            ~ "Low"
    )
  )

# Display sample as table
knitr::kable(
  head(df_transformed %>%
         select(employee_id, salary, salary_norm, salary_zscore,
                performance_category, salary_bracket), 8),
  align   = "c",
  digits  = 4,
  caption = "Sample Transformed Data (First 8 Rows)"
) |>
  kable_styling(full_width = TRUE) |>
  row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
  row_spec(c(1,3,5,7), extra_css = "background: #ffffff; color: #1a3a5c;") |>
  row_spec(c(2,4,6,8), extra_css = "background: #f4f1ea; color: #1a3a5c;")
Table 6.1: Table 6.2: Sample Transformed Data (First 8 Rows)
employee_id salary salary_norm salary_zscore performance_category salary_bracket
EMP001 17391 0.8377 1.2516 Good High
EMP002 11312 0.4484 -0.1184 Very Good Mid
EMP003 5905 0.1021 -1.3369 Excellent Low
EMP004 12253 0.5086 0.0937 Good Mid
EMP005 6097 0.1144 -1.2936 Good Low
EMP006 14887 0.6773 0.6873 Excellent Mid
EMP007 13844 0.6105 0.4522 Excellent Mid
EMP008 11058 0.4321 -0.1756 Good Mid
Salary Distribution Before and After Transformation

Figure 6.1: Salary Distribution Before and After Transformation

Salary Distribution Before and After Transformation

Figure 6.2: Salary Distribution Before and After Transformation

Interpretation: Normalization and standardization transform data into comparable scales without altering the overall distribution shape. Min-Max normalization rescales values into a fixed range, while Z-score standardization measures how far each value deviates from the mean. These techniques are essential in data analysis and machine learning to prevent variables with larger scales from dominating others. Feature engineering further enhances interpretability by grouping raw numerical data into meaningful categories.


7 Task 7 — KPI Dashboard

7.1 Description

Generate a dataset for 5 companies with 50 employees each, summarize KPIs, categorize employees into tiers, and produce advanced visualizations.

set.seed(777)

# Generate dataset: 5 companies x 50 employees
kpi_df <- generate_company_data(n_company = 5, n_employees = 50)

# Categorize employees into KPI tiers based on KPI score
kpi_df$kpi_tier <- ""
for (i in 1:nrow(kpi_df)) {
  kpi <- kpi_df$KPI_score[i]
  if (kpi >= 90) {
    kpi_df$kpi_tier[i] <- "Platinum"
  } else if (kpi >= 75) {
    kpi_df$kpi_tier[i] <- "Gold"
  } else if (kpi >= 60) {
    kpi_df$kpi_tier[i] <- "Silver"
  } else {
    kpi_df$kpi_tier[i] <- "Bronze"
  }
}

# Summary per company
company_kpi_summary <- kpi_df %>%
  group_by(company_id) %>%
  summarise(
    Avg_Salary      = round(mean(salary),            0),
    Avg_KPI         = round(mean(KPI_score),         2),
    Avg_Performance = round(mean(performance_score), 2),
    Top_Performers  = sum(performance_score >= 90),
    Platinum_Count  = sum(kpi_tier == "Platinum")
  )

knitr::kable(company_kpi_summary, align = "c", caption = "Company KPI Dashboard Summary") |>
  kable_styling(full_width = TRUE) |>
  row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
  row_spec(c(1,3,5), extra_css = "background: #ffffff; color: #1a3a5c;") |>
  row_spec(c(2,4),   extra_css = "background: #f4f1ea; color: #1a3a5c;")
Table 7.1: Table 7.2: Company KPI Dashboard Summary
company_id Avg_Salary Avg_KPI Avg_Performance Top_Performers Platinum_Count
COMP01 11540 73.46 74.19 11 8
COMP02 11721 69.99 73.88 8 8
COMP03 12835 72.53 73.85 9 9
COMP04 11567 70.45 75.43 11 6
COMP05 11677 74.58 72.97 9 8
Company KPI Dashboard

Figure 7.1: Company KPI Dashboard

Interpretation: The KPI dashboard provides a comprehensive overview of company performance using multiple metrics. The categorization of employees into KPI tiers helps identify the distribution of performance levels within each company. The visualizations reveal patterns such as the relationship between performance scores and KPI values, as well as differences across companies and departments. This type of analysis supports data-driven decision-making in organizational settings.


8 Task 8 — Automated Report (Bonus)

8.1 Description

Use functions and loops to generate an automated HTML summary report per company, including tables and plots.

if (!exists("kpi_df")) stop("Run Task 7 first to create kpi_df")

library(grid)
library(gridExtra)

# function to generate summary report for each company
generate_company_report <- function(df, company) {
  comp_data <- df[df$company_id == company, ]
  list(
    company        = company,
    n_employees    = nrow(comp_data),
    avg_salary     = round(mean(comp_data$salary), 0),
    avg_kpi        = round(mean(comp_data$KPI_score), 2),
    avg_perf       = round(mean(comp_data$performance_score), 2),
    top_performers = sum(comp_data$performance_score >= 90),
    dominant_dept  = names(which.max(table(comp_data$department)))
  )
}

# generate all reports via loop
companies   <- sort(unique(kpi_df$company_id))
all_reports <- lapply(companies, function(comp) generate_company_report(kpi_df, comp))
names(all_reports) <- companies

# build summary table
report_summary_table <- do.call(rbind, lapply(companies, function(comp) {
  r <- all_reports[[comp]]
  data.frame(
    Company         = r$company,
    Employees       = r$n_employees,
    Avg_Salary      = format(r$avg_salary, big.mark = ","),
    Avg_KPI         = r$avg_kpi,
    Avg_Performance = r$avg_perf,
    Top_Performers  = r$top_performers,
    Dominant_Dept   = r$dominant_dept
  )
}))

# render summary table
knitr::kable(report_summary_table, align = "c", caption = "Automated Report Summary") |>
  kable_styling(full_width = TRUE) |>
  row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
  row_spec(c(1,3,5), extra_css = "background: #ffffff; color: #1a3a5c;") |>
  row_spec(c(2,4),   extra_css = "background: #f4f1ea; color: #1a3a5c;")
Table 8.1: Table 8.2: Automated Report Summary
Company Employees Avg_Salary Avg_KPI Avg_Performance Top_Performers Dominant_Dept
COMP01 50 11,540 73.46 74.19 11 HR
COMP02 50 11,721 69.99 73.88 8 Marketing
COMP03 50 12,835 72.53 73.85 9 Marketing
COMP04 50 11,567 70.45 75.43 11 HR
COMP05 50 11,677 74.58 72.97 9 IT
📊 COMP01
Employees 50
Avg Salary 11,540
Avg KPI 73.46
Avg Performance 74.19
Top Performers 11
Dominant Dept HR
📊 COMP02
Employees 50
Avg Salary 11,721
Avg KPI 69.99
Avg Performance 73.88
Top Performers 8
Dominant Dept Marketing
📊 COMP03
Employees 50
Avg Salary 12,835
Avg KPI 72.53
Avg Performance 73.85
Top Performers 9
Dominant Dept Marketing
📊 COMP04
Employees 50
Avg Salary 11,567
Avg KPI 70.45
Avg Performance 75.43
Top Performers 11
Dominant Dept HR
📊 COMP05
Employees 50
Avg Salary 11,677
Avg KPI 74.58
Avg Performance 72.97
Top Performers 9
Dominant Dept IT
# ──  to CSV ─────────────────────────────────────────────
write.csv(report_summary_table, "company_report.csv", row.names = FALSE)

# ── Export to PDF ─────────────────────────────────────────────
pdf("company_report.pdf", width = 11, height = 8.5)

# page 1: summary table
grid.newpage()

# header
grid.rect(x = 0.5, y = 0.93, width = 1, height = 0.13,
          gp = gpar(fill = "#1a3a5c", col = NA))
grid.text("Automated Company Report - Task 8",
          x = 0.5, y = 0.93,
          gp = gpar(col = "white", fontsize = 16, fontface = "bold"))
grid.text("Data Science Programming | ITSB | Even Semester 2026/2027",
          x = 0.5, y = 0.88,
          gp = gpar(col = "white", fontsize = 9))

# summary table
tbl <- tableGrob(
  report_summary_table,
  rows  = NULL,
  theme = ttheme_minimal(
    core    = list(
      fg_params = list(col = "#1a3a5c", fontsize = 9),
      bg_params = list(fill = c("#ffffff", "#f4f1ea"), col = "#dddddd")
    ),
    colhead = list(
      fg_params = list(col = "white", fontface = "bold", fontsize = 10),
      bg_params = list(fill = "#1a3a5c", col = "#1a3a5c")
    )
  )
)
grid.draw(tbl)

# footer
grid.rect(x = 0.5, y = 0.03, width = 1, height = 0.06,
          gp = gpar(fill = "#f4f1ea", col = NA))
grid.text("Generated automatically using functions & loops in R",
          x = 0.5, y = 0.03,
          gp = gpar(col = "#1a3a5c", fontsize = 8, fontface = "italic"))

# page 2: individual company cards
grid.newpage()

# page title
grid.rect(x = 0.5, y = 0.95, width = 1, height = 0.09,
          gp = gpar(fill = "#c9972e", col = NA))
grid.text("Company Summary Cards",
          x = 0.5, y = 0.95,
          gp = gpar(col = "white", fontsize = 14, fontface = "bold"))

# layout: 2 columns x 3 rows for cards
n_comp   <- length(companies)
n_cols   <- 2
n_rows   <- ceiling(n_comp / n_cols)
card_w   <- 0.44
card_h   <- 0.22
x_starts <- c(0.04, 0.52)
y_start  <- 0.83

for (i in seq_along(companies)) {
  r     <- all_reports[[companies[i]]]
  col_i <- ((i - 1) %% n_cols) + 1
  row_i <- ceiling(i / n_cols)
  cx    <- x_starts[col_i]
  cy    <- y_start - (row_i - 1) * (card_h + 0.04)

  # card border
  grid.rect(x = cx + card_w / 2, y = cy - card_h / 2,
            width = card_w, height = card_h,
            gp = gpar(col = "#c9972e", fill = "#fdfaf3", lwd = 1.5))

  # card header
  grid.rect(x = cx + card_w / 2, y = cy - 0.025,
            width = card_w, height = 0.05,
            gp = gpar(fill = "#1a3a5c", col = NA))
  grid.text(r$company,
            x = cx + 0.04, y = cy - 0.025,
            just = "left",
            gp = gpar(col = "white", fontsize = 10, fontface = "bold"))

  # card content
  labels <- c("Employees", "Avg Salary", "Avg KPI",
               "Avg Performance", "Top Performers", "Dominant Dept")
  values <- c(r$n_employees,
               format(r$avg_salary, big.mark = ","),
               r$avg_kpi, r$avg_perf,
               r$top_performers, r$dominant_dept)

  for (j in seq_along(labels)) {
    row_y <- cy - 0.06 - (j - 1) * 0.026
    grid.text(labels[j],
              x = cx + 0.02, y = row_y,
              just = "left",
              gp = gpar(col = "#555555", fontsize = 8))
    grid.text(as.character(values[j]),
              x = cx + card_w - 0.02, y = row_y,
              just = "right",
              gp = gpar(col = "#1a3a5c", fontsize = 8, fontface = "bold"))
  }
}

dev.off()

png 2

Automated Report: Company Overview Heatmap

Figure 8.1: Automated Report: Company Overview Heatmap

Interpretation: The automated report demonstrates how functions and loops can be used to generate consistent summaries for multiple entities. Each company is analyzed using the same structure, allowing for easy comparison across different metrics. The use of visual summaries, such as heatmaps, enhances the ability to quickly identify patterns and differences between companies, making the reporting process more efficient and scalable.


9 Conclusion

This practicum demonstrates how fundamental programming concepts such as functions, loops, and conditional statements can be applied to solve complex data-related problems in a structured and scalable way. Each task represents a different aspect of data science, including mathematical modeling, simulation, data transformation, and performance analysis.

The use of simulations, such as the Monte Carlo method, highlights the importance of probabilistic approaches in approximating real-world phenomena. In addition, data transformation techniques like normalization and standardization ensure that data is properly prepared for analysis and modeling.

Furthermore, feature engineering and categorization help convert raw data into meaningful insights, while visualization and dashboard development improve interpretability and communication of results. Overall, this practicum not only strengthens technical programming skills but also enhances analytical thinking and the ability to extract insights from data to support decision-making.


10 References

Siregar, B. (2024). Data Science Programming: Functions and Loops.
Retrieved from https://bookdown.org/dsciencelabs/data_science_programming/