Data Science Programming
April 06, 2026
1 Task 1 — Multi-Formula Function
1.1 Description
Build a function compute_formula(x, formula) that computes values for linear, quadratic, cubic, and exponential formulas, then plots all on the same graph for x = 1:20.
compute_formula <- function(x, formula) {
# Validate formula input
valid_formulas <- c("linear", "quadratic", "cubic", "exponential")
if (!formula %in% valid_formulas) {
stop(paste("Invalid formula. Choose from:", paste(valid_formulas, collapse = ", ")))
}
# Compute based on formula type
if (formula == "linear") {
return(2 * x + 3)
} else if (formula == "quadratic") {
return(x^2 + 2 * x + 1)
} else if (formula == "cubic") {
return(x^3 - 2 * x^2 + x - 1)
} else if (formula == "exponential") {
return(exp(0.3 * x))
}
}
# Define x range
x_vals <- 1:20
formulas <- c("linear", "quadratic", "cubic", "exponential")
# Build result data frame using nested loop
results <- data.frame()
for (formula in formulas) {
for (x in x_vals) {
y <- compute_formula(x, formula)
results <- rbind(results, data.frame(x = x, y = round(y, 4), formula = formula))
}
}
# Show sample output (first 3 rows per formula) as table
sample_output <- do.call(rbind, lapply(formulas, function(f) {
head(results[results$formula == f, ], 3)
}))
knitr::kable(sample_output, align = "c", row.names = FALSE,
caption = "Sample Output: First 3 Rows per Formula") |>
kable_styling(full_width = TRUE) |>
row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
row_spec(c(1,3,5,7,9,11), extra_css = "background: #ffffff; color: #1a3a5c;") |>
row_spec(c(2,4,6,8,10,12), extra_css = "background: #f4f1ea; color: #1a3a5c;")| x | y | formula |
|---|---|---|
| 1 | 5.0000 | linear |
| 2 | 7.0000 | linear |
| 3 | 9.0000 | linear |
| 1 | 4.0000 | quadratic |
| 2 | 9.0000 | quadratic |
| 3 | 16.0000 | quadratic |
| 1 | -1.0000 | cubic |
| 2 | 1.0000 | cubic |
| 3 | 11.0000 | cubic |
| 1 | 1.3499 | exponential |
| 2 | 1.8221 | exponential |
| 3 | 2.4596 | exponential |
Figure 1.1: All formulas plotted for x = 1 to 20
Interpretation: The compute_formula() function demonstrates how a single function can dynamically handle multiple mathematical models. From the graph, each formula exhibits a distinct growth pattern. The linear function increases at a constant rate, while the quadratic and cubic functions show accelerating growth due to higher-order terms. The exponential function grows the fastest, especially at larger values of x, indicating multiplicative growth. This comparison highlights how different mathematical models can produce significantly different outcomes within the same input range.
2 Task 2 — Sales Simulation
2.1 Description
Build a function simulate_sales(n_salesperson, days) that generates a dataset of sales_id, day, sales_amount, and discount_rate, with conditional discounts and cumulative sales per salesperson.
set.seed(42)
simulate_sales <- function(n_salesperson, days) {
# Inner function to calculate cumulative sales
calc_cumulative <- function(amounts) {
cum_vals <- numeric(length(amounts))
running <- 0
for (i in seq_along(amounts)) {
running <- running + amounts[i]
cum_vals[i] <- running
}
return(cum_vals)
}
# Apply discount based on sales amount thresholds
get_discount <- function(amount) {
if (amount >= 9000) {
return(0.20)
} else if (amount >= 6000) {
return(0.15)
} else if (amount >= 3000) {
return(0.10)
} else {
return(0.05)
}
}
# Generate random sales values
sales_data <- data.frame()
for (sp in 1:n_salesperson) {
amounts <- round(runif(days, min = 1000, max = 12000), 0)
discounts <- sapply(amounts, get_discount)
cum_sales <- calc_cumulative(amounts)
sp_data <- data.frame(
sales_id = paste0("SP", sprintf("%02d", sp)),
day = 1:days,
sales_amount = amounts,
discount_rate = discounts,
net_sales = amounts * (1 - discounts),
cumulative_sales = cum_sales
)
sales_data <- rbind(sales_data, sp_data)
}
return(sales_data)
}
# Run simulation: 5 salespeople over 10 days
sales_df <- simulate_sales(n_salesperson = 5, days = 10)
# Summary statistics table
summary_sales <- sales_df %>%
group_by(sales_id) %>%
summarise(
Total_Sales = sum(sales_amount),
Avg_Sales = round(mean(sales_amount), 2),
Max_Sales = max(sales_amount),
Min_Sales = min(sales_amount),
Avg_Discount = paste0(round(mean(discount_rate) * 100, 1), "%"),
Total_Net = round(sum(net_sales), 2)
)
knitr::kable(summary_sales, align = "c", caption = "Summary Statistics per Salesperson") |>
kable_styling(full_width = TRUE) |>
row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
row_spec(seq(1, nrow(summary_sales), 2), extra_css = "background: #ffffff; color: #1a3a5c;") |>
row_spec(seq(2, nrow(summary_sales), 2), extra_css = "background: #f4f1ea; color: #1a3a5c;")| sales_id | Total_Sales | Avg_Sales | Max_Sales | Min_Sales | Avg_Discount | Total_Net |
|---|---|---|---|---|---|---|
| SP01 | 79989 | 7998.9 | 11308 | 2481 | 15.5% | 66365.75 |
| SP02 | 74902 | 7490.2 | 11760 | 2292 | 15% | 62367.35 |
| SP03 | 77692 | 7769.2 | 11878 | 1907 | 14.5% | 64272.25 |
| SP04 | 67115 | 6711.5 | 10973 | 1043 | 14% | 55679.25 |
| SP05 | 79664 | 7966.4 | 11709 | 1412 | 14.5% | 66407.05 |
Figure 2.1: Cumulative Sales per Salesperson
Interpretation: The simulation shows that cumulative sales are influenced not only by large individual transactions but also by consistency over time. Salespersons with stable daily performance can achieve competitive cumulative results compared to those with occasional high sales. Additionally, the discount system reduces net sales, creating a trade-off between generating high revenue and maintaining profitability. This reflects real-world business scenarios where discount strategies must be applied carefully.
3 Task 3 — Performance Categorization
3.1 Description
Build a function categorize_performance(sales_amount) with 5 categories: Excellent, Very Good, Good, Average, and Poor. Loop through a vector, calculate percentages, and visualize with a bar plot and pie chart.
categorize_performance <- function(sales_amount) {
categories <- character(length(sales_amount))
# Loop through each value and assign category
for (i in seq_along(sales_amount)) {
val <- sales_amount[i]
if (val >= 10000) {
categories[i] <- "Excellent"
} else if (val >= 7500) {
categories[i] <- "Very Good"
} else if (val >= 5000) {
categories[i] <- "Good"
} else if (val >= 2500) {
categories[i] <- "Average"
} else {
categories[i] <- "Poor"
}
}
return(categories)
}
# Generate sales vector
set.seed(123)
sales_vector <- round(runif(200, min = 500, max = 12000), 0)
# Apply categorization
categories <- categorize_performance(sales_vector)
# Build frequency table
cat_table <- as.data.frame(table(Category = categories))
cat_table$Percentage <- round(cat_table$Freq / sum(cat_table$Freq) * 100, 2)
cat_table$Category <- factor(cat_table$Category,
levels = c("Excellent","Very Good","Good","Average","Poor"))
cat_table <- cat_table[order(cat_table$Category), ]
knitr::kable(cat_table, align = "c",
col.names = c("Category", "Count", "Percentage (%)"),
caption = "Performance Category Distribution") |>
kable_styling(full_width = TRUE) |>
row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
row_spec(c(1,3,5), extra_css = "background: #ffffff; color: #1a3a5c;") |>
row_spec(c(2,4), extra_css = "background: #f4f1ea; color: #1a3a5c;")| Category | Count | Percentage (%) | |
|---|---|---|---|
| 2 | Excellent | 33 | 16.5 |
| 5 | Very Good | 45 | 22.5 |
| 3 | Good | 45 | 22.5 |
| 1 | Average | 50 | 25.0 |
| 4 | Poor | 27 | 13.5 |
Figure 3.1: Performance Category Distribution
Interpretation: The categorization process groups sales data into meaningful performance levels based on predefined thresholds. The distribution appears relatively balanced due to the random nature of the data generation. The bar chart highlights the frequency of each category, while the pie chart emphasizes the proportional distribution. This approach simplifies the interpretation of numerical data by converting it into categorical insights that are easier to analyze.
4 Task 4 — Multi-Company Simulation
4.1 Description
Build a function generate_company_data(n_company, n_employees) that generates company_id, employee_id, salary, department, performance_score, and KPI_score, with conditional logic for top performers.
set.seed(2024)
generate_company_data <- function(n_company, n_employees) {
departments <- c("Finance","Marketing","Operations","IT","HR")
all_data <- data.frame()
# Nested loops: company -> employee
for (comp in 1:n_company) {
for (emp in 1:n_employees) {
salary <- round(runif(1, 4000, 20000), 0)
perf_score <- round(runif(1, 50, 100), 1)
kpi_score <- round(runif(1, 40, 100), 1)
dept <- sample(departments, 1)
# Apply KPI boost for high-performing employees
if (perf_score >= 90) {
kpi_score <- min(100, kpi_score + 10)
}
row <- data.frame(
company_id = paste0("COMP", sprintf("%02d", comp)),
employee_id = paste0("EMP", sprintf("%03d", (comp - 1) * n_employees + emp)),
salary = salary,
department = dept,
performance_score = perf_score,
KPI_score = kpi_score
)
all_data <- rbind(all_data, row)
}
}
return(all_data)
}
# Generate dataset: 4 companies, 15 employees each
company_df <- generate_company_data(n_company = 4, n_employees = 15)
# Summary per company
company_summary <- company_df %>%
group_by(company_id) %>%
summarise(
Avg_Salary = round(mean(salary), 0),
Avg_Performance = round(mean(performance_score), 2),
Max_KPI = max(KPI_score),
Top_Performers = sum(performance_score >= 90)
)
knitr::kable(company_summary, align = "c", caption = "Summary per Company") |>
kable_styling(full_width = TRUE) |>
row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
row_spec(c(1,3), extra_css = "background: #ffffff; color: #1a3a5c;") |>
row_spec(c(2,4), extra_css = "background: #f4f1ea; color: #1a3a5c;")| company_id | Avg_Salary | Avg_Performance | Max_KPI | Top_Performers |
|---|---|---|---|---|
| COMP01 | 11392 | 78.54 | 100.0 | 5 |
| COMP02 | 13182 | 70.89 | 98.5 | 2 |
| COMP03 | 11060 | 69.44 | 97.9 | 1 |
| COMP04 | 11715 | 73.14 | 95.9 | 3 |
Figure 4.1: Average Salary and KPI per Company
Interpretation: The generated dataset illustrates how employee-level data can be structured across multiple companies. The inclusion of conditional logic, such as boosting KPI scores for high-performing employees, reflects real-world performance evaluation systems. From the summary, variations between companies can be observed in terms of salary levels and performance metrics, indicating that organizational characteristics can influence overall outcomes.
5 Task 5 — Monte Carlo: Pi & Probability
5.1 Description
Build monte_carlo_pi(n_points) that estimates pi by simulating random points inside a unit circle, plus a probability analysis for points falling in a sub-square.
set.seed(99)
monte_carlo_pi <- function(n_points) {
x <- runif(n_points, -1, 1)
y <- runif(n_points, -1, 1)
inside <- integer(n_points)
# Check whether each point is inside the unit circle
for (i in 1:n_points) {
if (x[i]^2 + y[i]^2 <= 1) {
inside[i] <- 1
} else {
inside[i] <- 0
}
}
# Estimate pi
pi_estimate <- 4 * sum(inside) / n_points
# Probability inside sub-square
in_subsquare <- sum(abs(x) <= 0.5 & abs(y) <= 0.5)
prob_subsquare <- in_subsquare / n_points
return(list(
pi_estimate = pi_estimate,
prob_subsquare = prob_subsquare,
x = x,
y = y,
inside = inside
))
}
# Run with 5000 points
mc_result <- monte_carlo_pi(5000)
# Display results as table
mc_summary <- data.frame(
Metric = c("Estimated Pi", "Actual Pi", "Error", "P(Sub-square)", "Theoretical P"),
Value = c(
round(mc_result$pi_estimate, 5),
round(pi, 5),
round(abs(mc_result$pi_estimate - pi), 5),
round(mc_result$prob_subsquare, 4),
0.25
)
)
knitr::kable(mc_summary, align = "c", caption = "Monte Carlo Results (n = 5000)") |>
kable_styling(full_width = TRUE) |>
row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
row_spec(c(1,3,5), extra_css = "background: #ffffff; color: #1a3a5c;") |>
row_spec(c(2,4), extra_css = "background: #f4f1ea; color: #1a3a5c;")| Metric | Value |
|---|---|
| Estimated Pi | 3.14960 |
| Actual Pi | 3.14159 |
| Error | 0.00801 |
| P(Sub-square) | 0.25220 |
| Theoretical P | 0.25000 |
Figure 5.1: Monte Carlo: Points Inside vs Outside Circle
Interpretation: The Monte Carlo simulation estimates the value of π by comparing the proportion of randomly generated points inside the unit circle to the total number of points. As the number of points increases, the estimate becomes closer to the true value of π, demonstrating the Law of Large Numbers. Additionally, the probability of points falling within the sub-square approaches its theoretical value, showing that random sampling can effectively approximate geometric probabilities.
6 Task 6 — Data Transformation
6.1 Description
Build functions normalize_columns(df) and z_score(df) for loop-based normalization, then create new engineered features. Visualize distributions before and after transformation.
# Reuse company_df from Task 4
df_raw <- company_df
# Min-Max normalization using loop
normalize_columns <- function(df, cols) {
df_norm <- df
for (col in cols) {
min_val <- min(df[[col]], na.rm = TRUE)
max_val <- max(df[[col]], na.rm = TRUE)
df_norm[[paste0(col, "_norm")]] <- (df[[col]] - min_val) / (max_val - min_val)
}
return(df_norm)
}
# Z-score standardization using loop
z_score <- function(df, cols) {
df_z <- df
for (col in cols) {
mu <- mean(df[[col]], na.rm = TRUE)
sigma <- sd(df[[col]], na.rm = TRUE)
df_z[[paste0(col, "_zscore")]] <- (df[[col]] - mu) / sigma
}
return(df_z)
}
# Apply transformations
numeric_cols <- c("salary", "performance_score", "KPI_score")
df_transformed <- normalize_columns(df_raw, numeric_cols)
df_transformed <- z_score(df_transformed, numeric_cols)
# Feature Engineering
df_transformed <- df_transformed %>%
mutate(
performance_category = case_when(
performance_score >= 90 ~ "Excellent",
performance_score >= 75 ~ "Very Good",
performance_score >= 60 ~ "Good",
performance_score >= 45 ~ "Average",
TRUE ~ "Poor"
),
salary_bracket = case_when(
salary >= 15000 ~ "High",
salary >= 9000 ~ "Mid",
TRUE ~ "Low"
)
)
# Display sample as table
knitr::kable(
head(df_transformed %>%
select(employee_id, salary, salary_norm, salary_zscore,
performance_category, salary_bracket), 8),
align = "c",
digits = 4,
caption = "Sample Transformed Data (First 8 Rows)"
) |>
kable_styling(full_width = TRUE) |>
row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
row_spec(c(1,3,5,7), extra_css = "background: #ffffff; color: #1a3a5c;") |>
row_spec(c(2,4,6,8), extra_css = "background: #f4f1ea; color: #1a3a5c;")| employee_id | salary | salary_norm | salary_zscore | performance_category | salary_bracket |
|---|---|---|---|---|---|
| EMP001 | 17391 | 0.8377 | 1.2516 | Good | High |
| EMP002 | 11312 | 0.4484 | -0.1184 | Very Good | Mid |
| EMP003 | 5905 | 0.1021 | -1.3369 | Excellent | Low |
| EMP004 | 12253 | 0.5086 | 0.0937 | Good | Mid |
| EMP005 | 6097 | 0.1144 | -1.2936 | Good | Low |
| EMP006 | 14887 | 0.6773 | 0.6873 | Excellent | Mid |
| EMP007 | 13844 | 0.6105 | 0.4522 | Excellent | Mid |
| EMP008 | 11058 | 0.4321 | -0.1756 | Good | Mid |
Figure 6.1: Salary Distribution Before and After Transformation
Figure 6.2: Salary Distribution Before and After Transformation
Interpretation: Normalization and standardization transform data into comparable scales without altering the overall distribution shape. Min-Max normalization rescales values into a fixed range, while Z-score standardization measures how far each value deviates from the mean. These techniques are essential in data analysis and machine learning to prevent variables with larger scales from dominating others. Feature engineering further enhances interpretability by grouping raw numerical data into meaningful categories.
7 Task 7 — KPI Dashboard
7.1 Description
Generate a dataset for 5 companies with 50 employees each, summarize KPIs, categorize employees into tiers, and produce advanced visualizations.
set.seed(777)
# Generate dataset: 5 companies x 50 employees
kpi_df <- generate_company_data(n_company = 5, n_employees = 50)
# Categorize employees into KPI tiers based on KPI score
kpi_df$kpi_tier <- ""
for (i in 1:nrow(kpi_df)) {
kpi <- kpi_df$KPI_score[i]
if (kpi >= 90) {
kpi_df$kpi_tier[i] <- "Platinum"
} else if (kpi >= 75) {
kpi_df$kpi_tier[i] <- "Gold"
} else if (kpi >= 60) {
kpi_df$kpi_tier[i] <- "Silver"
} else {
kpi_df$kpi_tier[i] <- "Bronze"
}
}
# Summary per company
company_kpi_summary <- kpi_df %>%
group_by(company_id) %>%
summarise(
Avg_Salary = round(mean(salary), 0),
Avg_KPI = round(mean(KPI_score), 2),
Avg_Performance = round(mean(performance_score), 2),
Top_Performers = sum(performance_score >= 90),
Platinum_Count = sum(kpi_tier == "Platinum")
)
knitr::kable(company_kpi_summary, align = "c", caption = "Company KPI Dashboard Summary") |>
kable_styling(full_width = TRUE) |>
row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
row_spec(c(1,3,5), extra_css = "background: #ffffff; color: #1a3a5c;") |>
row_spec(c(2,4), extra_css = "background: #f4f1ea; color: #1a3a5c;")| company_id | Avg_Salary | Avg_KPI | Avg_Performance | Top_Performers | Platinum_Count |
|---|---|---|---|---|---|
| COMP01 | 11540 | 73.46 | 74.19 | 11 | 8 |
| COMP02 | 11721 | 69.99 | 73.88 | 8 | 8 |
| COMP03 | 12835 | 72.53 | 73.85 | 9 | 9 |
| COMP04 | 11567 | 70.45 | 75.43 | 11 | 6 |
| COMP05 | 11677 | 74.58 | 72.97 | 9 | 8 |
Figure 7.1: Company KPI Dashboard
Interpretation: The KPI dashboard provides a comprehensive overview of company performance using multiple metrics. The categorization of employees into KPI tiers helps identify the distribution of performance levels within each company. The visualizations reveal patterns such as the relationship between performance scores and KPI values, as well as differences across companies and departments. This type of analysis supports data-driven decision-making in organizational settings.
8 Task 8 — Automated Report (Bonus)
8.1 Description
Use functions and loops to generate an automated HTML summary report per company, including tables and plots.
if (!exists("kpi_df")) stop("Run Task 7 first to create kpi_df")
library(grid)
library(gridExtra)
# function to generate summary report for each company
generate_company_report <- function(df, company) {
comp_data <- df[df$company_id == company, ]
list(
company = company,
n_employees = nrow(comp_data),
avg_salary = round(mean(comp_data$salary), 0),
avg_kpi = round(mean(comp_data$KPI_score), 2),
avg_perf = round(mean(comp_data$performance_score), 2),
top_performers = sum(comp_data$performance_score >= 90),
dominant_dept = names(which.max(table(comp_data$department)))
)
}
# generate all reports via loop
companies <- sort(unique(kpi_df$company_id))
all_reports <- lapply(companies, function(comp) generate_company_report(kpi_df, comp))
names(all_reports) <- companies
# build summary table
report_summary_table <- do.call(rbind, lapply(companies, function(comp) {
r <- all_reports[[comp]]
data.frame(
Company = r$company,
Employees = r$n_employees,
Avg_Salary = format(r$avg_salary, big.mark = ","),
Avg_KPI = r$avg_kpi,
Avg_Performance = r$avg_perf,
Top_Performers = r$top_performers,
Dominant_Dept = r$dominant_dept
)
}))
# render summary table
knitr::kable(report_summary_table, align = "c", caption = "Automated Report Summary") |>
kable_styling(full_width = TRUE) |>
row_spec(0, extra_css = "background: #1a3a5c; color: white;") |>
row_spec(c(1,3,5), extra_css = "background: #ffffff; color: #1a3a5c;") |>
row_spec(c(2,4), extra_css = "background: #f4f1ea; color: #1a3a5c;")| Company | Employees | Avg_Salary | Avg_KPI | Avg_Performance | Top_Performers | Dominant_Dept |
|---|---|---|---|---|---|---|
| COMP01 | 50 | 11,540 | 73.46 | 74.19 | 11 | HR |
| COMP02 | 50 | 11,721 | 69.99 | 73.88 | 8 | Marketing |
| COMP03 | 50 | 12,835 | 72.53 | 73.85 | 9 | Marketing |
| COMP04 | 50 | 11,567 | 70.45 | 75.43 | 11 | HR |
| COMP05 | 50 | 11,677 | 74.58 | 72.97 | 9 | IT |
| Employees | 50 |
| Avg Salary | 11,540 |
| Avg KPI | 73.46 |
| Avg Performance | 74.19 |
| Top Performers | 11 |
| Dominant Dept | HR |
| Employees | 50 |
| Avg Salary | 11,721 |
| Avg KPI | 69.99 |
| Avg Performance | 73.88 |
| Top Performers | 8 |
| Dominant Dept | Marketing |
| Employees | 50 |
| Avg Salary | 12,835 |
| Avg KPI | 72.53 |
| Avg Performance | 73.85 |
| Top Performers | 9 |
| Dominant Dept | Marketing |
| Employees | 50 |
| Avg Salary | 11,567 |
| Avg KPI | 70.45 |
| Avg Performance | 75.43 |
| Top Performers | 11 |
| Dominant Dept | HR |
| Employees | 50 |
| Avg Salary | 11,677 |
| Avg KPI | 74.58 |
| Avg Performance | 72.97 |
| Top Performers | 9 |
| Dominant Dept | IT |
# ── to CSV ─────────────────────────────────────────────
write.csv(report_summary_table, "company_report.csv", row.names = FALSE)
# ── Export to PDF ─────────────────────────────────────────────
pdf("company_report.pdf", width = 11, height = 8.5)
# page 1: summary table
grid.newpage()
# header
grid.rect(x = 0.5, y = 0.93, width = 1, height = 0.13,
gp = gpar(fill = "#1a3a5c", col = NA))
grid.text("Automated Company Report - Task 8",
x = 0.5, y = 0.93,
gp = gpar(col = "white", fontsize = 16, fontface = "bold"))
grid.text("Data Science Programming | ITSB | Even Semester 2026/2027",
x = 0.5, y = 0.88,
gp = gpar(col = "white", fontsize = 9))
# summary table
tbl <- tableGrob(
report_summary_table,
rows = NULL,
theme = ttheme_minimal(
core = list(
fg_params = list(col = "#1a3a5c", fontsize = 9),
bg_params = list(fill = c("#ffffff", "#f4f1ea"), col = "#dddddd")
),
colhead = list(
fg_params = list(col = "white", fontface = "bold", fontsize = 10),
bg_params = list(fill = "#1a3a5c", col = "#1a3a5c")
)
)
)
grid.draw(tbl)
# footer
grid.rect(x = 0.5, y = 0.03, width = 1, height = 0.06,
gp = gpar(fill = "#f4f1ea", col = NA))
grid.text("Generated automatically using functions & loops in R",
x = 0.5, y = 0.03,
gp = gpar(col = "#1a3a5c", fontsize = 8, fontface = "italic"))
# page 2: individual company cards
grid.newpage()
# page title
grid.rect(x = 0.5, y = 0.95, width = 1, height = 0.09,
gp = gpar(fill = "#c9972e", col = NA))
grid.text("Company Summary Cards",
x = 0.5, y = 0.95,
gp = gpar(col = "white", fontsize = 14, fontface = "bold"))
# layout: 2 columns x 3 rows for cards
n_comp <- length(companies)
n_cols <- 2
n_rows <- ceiling(n_comp / n_cols)
card_w <- 0.44
card_h <- 0.22
x_starts <- c(0.04, 0.52)
y_start <- 0.83
for (i in seq_along(companies)) {
r <- all_reports[[companies[i]]]
col_i <- ((i - 1) %% n_cols) + 1
row_i <- ceiling(i / n_cols)
cx <- x_starts[col_i]
cy <- y_start - (row_i - 1) * (card_h + 0.04)
# card border
grid.rect(x = cx + card_w / 2, y = cy - card_h / 2,
width = card_w, height = card_h,
gp = gpar(col = "#c9972e", fill = "#fdfaf3", lwd = 1.5))
# card header
grid.rect(x = cx + card_w / 2, y = cy - 0.025,
width = card_w, height = 0.05,
gp = gpar(fill = "#1a3a5c", col = NA))
grid.text(r$company,
x = cx + 0.04, y = cy - 0.025,
just = "left",
gp = gpar(col = "white", fontsize = 10, fontface = "bold"))
# card content
labels <- c("Employees", "Avg Salary", "Avg KPI",
"Avg Performance", "Top Performers", "Dominant Dept")
values <- c(r$n_employees,
format(r$avg_salary, big.mark = ","),
r$avg_kpi, r$avg_perf,
r$top_performers, r$dominant_dept)
for (j in seq_along(labels)) {
row_y <- cy - 0.06 - (j - 1) * 0.026
grid.text(labels[j],
x = cx + 0.02, y = row_y,
just = "left",
gp = gpar(col = "#555555", fontsize = 8))
grid.text(as.character(values[j]),
x = cx + card_w - 0.02, y = row_y,
just = "right",
gp = gpar(col = "#1a3a5c", fontsize = 8, fontface = "bold"))
}
}
dev.off()png 2
Figure 8.1: Automated Report: Company Overview Heatmap
Interpretation: The automated report demonstrates how functions and loops can be used to generate consistent summaries for multiple entities. Each company is analyzed using the same structure, allowing for easy comparison across different metrics. The use of visual summaries, such as heatmaps, enhances the ability to quickly identify patterns and differences between companies, making the reporting process more efficient and scalable.
9 Conclusion
This practicum demonstrates how fundamental programming concepts such as functions, loops, and conditional statements can be applied to solve complex data-related problems in a structured and scalable way. Each task represents a different aspect of data science, including mathematical modeling, simulation, data transformation, and performance analysis.
The use of simulations, such as the Monte Carlo method, highlights the importance of probabilistic approaches in approximating real-world phenomena. In addition, data transformation techniques like normalization and standardization ensure that data is properly prepared for analysis and modeling.
Furthermore, feature engineering and categorization help convert raw data into meaningful insights, while visualization and dashboard development improve interpretability and communication of results. Overall, this practicum not only strengthens technical programming skills but also enhances analytical thinking and the ability to extract insights from data to support decision-making.
10 References
Siregar, B. (2024). Data Science Programming: Functions and Loops.
Retrieved from https://bookdown.org/dsciencelabs/data_science_programming/