Syntax and Control Flow

Practicum ~ Week 4

Data Science | ITSB

Naifah Edria Arta

Digging into data, uncovering stories, and shaping the future one insight at a time.

Skill Focus

R Program Data Visualization Data Analysis Statistics

Course: Data Science Programming
Academic Advisor: Bakti Siregar, M.Sc., CDS

- Introduction

This report is prepared to fulfill the Advanced Practicum requirements for the Data Science Programming course under the guidance of Bakti Siregar, M.Sc. The primary objective of this practicum is to develop an automated data science workflow by integrating multi-layer functions, nested loops, and complex conditional logic.The tasks within this practicum simulate real-world data science challenges, ranging from dynamic formula computations and Monte Carlo simulations to multi-company KPI analysis. By focusing on advanced statistics, data transformation, and visualization, this report demonstrates the practical application of R and Python in solving sophisticated analytical problems.

1 Dynamic Multi-Formula Function

1.1 Implementation

# =========================
# LIBRARY
# =========================
library(ggplot2)
library(tidyr)
library(plotly)

# =========================
# FUNCTION
# =========================
compute_formula <- function(x, formulas) {
  results <- list()
  
  for (f in formulas) {
    y <- numeric(length(x))
    
    for (i in seq_along(x)) {
      if (f == "linear") {
        y[i] <- x[i]
      } else if (f == "quadratic") {
        y[i] <- x[i]^2
      } else if (f == "cubic") {
        y[i] <- x[i]^3
      } else if (f == "exponential") {
        y[i] <- exp(x[i] / 5)
      } else {
        stop(paste("Formula tidak valid:", f))
      }
    }
    
    results[[f]] <- y
  }
  
  return(as.data.frame(results))
}

# =========================
# INPUT
# =========================
x_values <- 1:20
formulas <- c("linear", "quadratic", "cubic", "exponential")

# =========================
# RUN FUNCTION
# =========================
df <- compute_formula(x_values, formulas)
df$x <- x_values

# =========================
# TRANSFORM
# =========================
df_long <- pivot_longer(df,
                        cols = -x,
                        names_to = "formula",
                        values_to = "y")

# =========================
# PLOT
# =========================
p <- ggplot(
  df_long,
  aes(
    x = x,
    y = y,
    color = formula,
    text = paste0(
      "x: ", x,
      "<br>y: ", round(y,2),
      "<br>Formula: ", formula
    )
  )
) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  labs(
    title = "Dynamic Multi-Formula Plot",
    subtitle = "Linear, Quadratic, Cubic, Exponential",
    x = "X Value",
    y = "Y Value"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "top"
  )

ggplotly(p, tooltip = "text") %>%
  layout(hovermode = "x unified")

1.2 Interpretation

This implementation demonstrates how different mathematical models behave across the same range of input values using nested loops and conditional logic.

The visualization shows that each formula has a distinct growth pattern. The linear function increases at a constant rate, while the quadratic and cubic functions grow progressively faster as the value of x increases. The exponential function exhibits the most rapid growth, especially at higher values of x, highlighting its sensitivity to change.

Overall, the comparison clearly illustrates how higher-order and exponential functions can lead to significantly larger outputs, which is important in understanding model selection and data behavior in real-world applications.

2 Nested Simulation – Multi-Sales & Discounts

2.1 Implementation

library(dplyr)
library(plotly)
library(knitr)

# Fungsi simulasi (tetap sama)
simulate_sales <- function(n_salesperson, days) {
  
  all_data <- data.frame()
  
  for (sp in 1:n_salesperson) {
    
    cumulative_sales <- 0
    
    for (d in 1:days) {
      
      sales_amount <- sample(100:1000, 1)
      
      if (sales_amount > 800) {
        discount_rate <- 0.20
      } else if (sales_amount > 500) {
        discount_rate <- 0.10
      } else {
        discount_rate <- 0.05
      }
      
      cumulative_sales <- cumulative_sales + sales_amount
      
      temp <- data.frame(
        salesperson = paste0("SP", sp),
        day = d,
        sales_amount = sales_amount,
        discount_rate = discount_rate,
        cumulative_sales = cumulative_sales
      )
      
      all_data <- rbind(all_data, temp)
    }
  }
  
  return(all_data)
}

# Jalankan simulasi
set.seed(123)
data_sales <- simulate_sales(3, 10)


cat(" Table 1: Sales Data\n")
##  Table 1: Sales Data
kable(data_sales)
salesperson day sales_amount discount_rate cumulative_sales
SP1 1 514 0.10 514
SP1 2 562 0.10 1076
SP1 3 278 0.05 1354
SP1 4 625 0.10 1979
SP1 5 294 0.05 2273
SP1 6 917 0.20 3190
SP1 7 217 0.05 3407
SP1 8 398 0.05 3805
SP1 9 328 0.05 4133
SP1 10 343 0.05 4476
SP2 1 113 0.05 113
SP2 2 473 0.05 586
SP2 3 764 0.10 1350
SP2 4 701 0.10 2051
SP2 5 702 0.10 2753
SP2 6 867 0.20 3620
SP2 7 808 0.20 4428
SP2 8 190 0.05 4618
SP2 9 447 0.05 5065
SP2 10 748 0.10 5813
SP3 1 454 0.05 454
SP3 2 939 0.20 1393
SP3 3 125 0.05 1518
SP3 4 618 0.10 2136
SP3 5 525 0.10 2661
SP3 6 748 0.10 3409
SP3 7 865 0.20 4274
SP3 8 310 0.05 4584
SP3 9 689 0.10 5273
SP3 10 692 0.10 5965
summary_stats <- data_sales %>%
  group_by(salesperson) %>%
  summarise(
    total_sales = sum(sales_amount),
    mean_sales = mean(sales_amount)
  )

cat(" Table 2: Summary Statistics\n")
##  Table 2: Summary Statistics
kable(summary_stats)
salesperson total_sales mean_sales
SP1 4476 447.6
SP2 5813 581.3
SP3 5965 596.5
# =========================
# 📈 PLOTLY VISUALIZATION
# =========================
fig <- plot_ly(data_sales,
               x = ~day,
               y = ~cumulative_sales,
               color = ~salesperson,
               type = 'scatter',
               mode = 'lines+markers')

fig <- fig %>%
  layout(title = "Cumulative Sales per Salesperson",
         xaxis = list(title = "Day"),
         yaxis = list(title = "Cumulative Sales"))

fig

2.2 Interpretation

This implementation simulates daily sales activity for multiple salespersons over a given period using a structured approach with functions, loops, and conditionals.

The simulate_sales function generates random sales amounts for each salesperson across several days. A conditional logic is applied to assign discount rates based on the sales value, reflecting real-world business rules. The use of nested loops allows the model to iterate through each salesperson and each day systematically.

Cumulative sales are calculated progressively, enabling tracking of overall performance over time. The resulting dataset is then summarized to show total and average sales per salesperson, providing a clear comparison of performance.

Finally, the interactive Plotly visualization helps illustrate how cumulative sales grow over time for each salesperson, making it easier to identify trends and differences in sales performance.

3 3. Multi-Level Performance Categorization

3.1 Implementation

library(dplyr)
library(plotly)
library(knitr)

# =========================
# DATA SIMULATION
# =========================
set.seed(123)
sales_amount <- sample(100:1000, 30)

# =========================
# FUNCTION: Categorize Performance
# =========================
categorize_performance <- function(sales_amount) {
  
  categories <- c()
  
  for (s in sales_amount) {
    if (s >= 900) {
      categories <- c(categories, "Excellent")
    } else if (s >= 700) {
      categories <- c(categories, "Very Good")
    } else if (s >= 500) {
      categories <- c(categories, "Good")
    } else if (s >= 300) {
      categories <- c(categories, "Average")
    } else {
      categories <- c(categories, "Poor")
    }
  }
  
  return(categories)
}

# =========================
# APPLY FUNCTION
# =========================
performance <- categorize_performance(sales_amount)

data_perf <- data.frame(
  sales_amount = sales_amount,
  category = performance
)

# Tambah ID biar rapi
data_perf <- data_perf %>%
  mutate(id = row_number()) %>%
  select(id, everything())

# =========================
# 📋 TABLE: IMPLEMENTATION RESULT
# =========================
cat("### Table 1: Sales Performance Categorization\n")
## ### Table 1: Sales Performance Categorization
kable(data_perf)
id sales_amount category
1 514 Good
2 562 Good
3 278 Poor
4 625 Good
5 294 Poor
6 917 Excellent
7 217 Poor
8 398 Average
9 328 Average
10 343 Average
11 113 Poor
12 473 Average
13 764 Very Good
14 701 Very Good
15 702 Very Good
16 867 Very Good
17 808 Very Good
18 190 Poor
19 447 Average
20 748 Very Good
21 454 Average
22 939 Excellent
23 125 Poor
24 618 Good
25 525 Good
26 981 Excellent
27 865 Very Good
28 310 Average
29 689 Good
30 692 Good
# =========================
# 📊 SUMMARY TABLE
# =========================
summary_perf <- data_perf %>%
  group_by(category) %>%
  summarise(count = n()) %>%
  mutate(percentage = round((count / sum(count)) * 100, 2))

cat("\n### Table 2: Summary Statistics\n")
## 
## ### Table 2: Summary Statistics
kable(summary_perf)
category count percentage
Average 7 23.33
Excellent 3 10.00
Good 7 23.33
Poor 6 20.00
Very Good 7 23.33
# =========================
# 📈 BAR PLOT (Plotly)
# =========================
bar_plot <- plot_ly(summary_perf,
                   x = ~category,
                   y = ~count,
                   type = "bar")

bar_plot <- bar_plot %>%
  layout(title = "Performance Distribution (Bar Plot)",
         xaxis = list(title = "Category"),
         yaxis = list(title = "Count"))

bar_plot
# =========================
# 🥧 PIE CHART (Plotly)
# =========================
pie_chart <- plot_ly(summary_perf,
                    labels = ~category,
                    values = ~percentage,
                    type = "pie")

pie_chart <- pie_chart %>%
  layout(title = "Performance Distribution (Pie Chart)")

pie_chart

3.2 Interpretation

This implementation categorizes sales performance into five levels: Excellent, Very Good, Good, Average, and Poor based on sales amount thresholds.

A loop is used to assign each sales value into a category using conditional logic, simulating a real-world evaluation system. The results show the distribution of performance levels, both in counts and percentages.

The bar plot highlights the number of occurrences in each category, while the pie chart provides a clear view of their proportional distribution. This helps identify which performance level dominates and supports decision-making in evaluating overall sales performance.

4 Multi-Company Dataset Simulation

4.1 Implementation

library(dplyr)
library(plotly)
library(knitr)

# =========================
# FUNCTION: Generate Company Data
# =========================
generate_company_data <- function(n_company, n_employees) {
  
  all_data <- data.frame()
  departments <- c("HR", "Finance", "IT", "Marketing")
  
  for (c in 1:n_company) {
    
    for (e in 1:n_employees) {
      
      salary <- sample(3000:10000, 1)
      performance_score <- sample(60:100, 1)
      KPI_score <- sample(70:100, 1)
      department <- sample(departments, 1)
      
      # Conditional: Top Performer
      if (KPI_score > 90) {
        category <- "Top Performer"
      } else {
        category <- "Regular"
      }
      
      temp <- data.frame(
        company_id = paste0("C", c),
        employee_id = paste0("E", e),
        department = department,
        salary = salary,
        performance_score = performance_score,
        KPI_score = KPI_score,
        category = category
      )
      
      all_data <- rbind(all_data, temp)
    }
  }
  
  return(all_data)
}

# =========================
# GENERATE DATA
# =========================
set.seed(123)
company_data <- generate_company_data(3, 10)

# =========================
# 📋 TABLE 1: FULL DATA
# =========================
cat("### Table 1: Company Employee Data\n")
## ### Table 1: Company Employee Data
kable(company_data)
company_id employee_id department salary performance_score KPI_score category
C1 E1 Finance 5462 74 88 Regular
C1 E2 Finance 7290 96 89 Regular
C1 E3 IT 6445 84 95 Top Performer
C1 E4 Marketing 5756 86 94 Top Performer
C1 E5 IT 4016 68 98 Top Performer
C1 E6 Finance 5887 85 76 Regular
C1 E7 Finance 8768 78 73 Regular
C1 E8 Marketing 9736 98 90 Regular
C1 E9 HR 4166 91 79 Regular
C1 E10 Finance 4798 68 78 Regular
C2 E1 HR 4046 86 97 Top Performer
C2 E2 HR 6206 86 75 Regular
C2 E3 Marketing 4313 88 74 Regular
C2 E4 HR 3587 72 87 Regular
C2 E5 Finance 7088 86 94 Top Performer
C2 E6 IT 3276 74 78 Regular
C2 E7 Marketing 9233 90 85 Regular
C2 E8 Finance 5821 67 91 Top Performer
C2 E9 Finance 4182 76 91 Top Performer
C2 E10 HR 9128 93 73 Regular
C3 E1 Finance 5116 84 89 Regular
C3 E2 HR 8208 91 83 Regular
C3 E3 Finance 5338 99 85 Regular
C3 E4 Finance 6979 90 94 Top Performer
C3 E5 HR 6229 94 83 Regular
C3 E6 IT 7575 66 72 Regular
C3 E7 HR 4913 74 90 Regular
C3 E8 Finance 4074 69 87 Regular
C3 E9 Finance 5283 93 79 Regular
C3 E10 Finance 7222 71 89 Regular
# =========================
# 📊 SUMMARY PER COMPANY
# =========================
summary_company <- company_data %>%
  group_by(company_id) %>%
  summarise(
    avg_salary = mean(salary),
    avg_performance = mean(performance_score),
    max_KPI = max(KPI_score)
  )

cat("\n### Table 2: Company Summary\n")
## 
## ### Table 2: Company Summary
kable(summary_company)
company_id avg_salary avg_performance max_KPI
C1 6232.4 82.8 98
C2 5688.0 81.8 97
C3 6093.7 83.1 94
# =========================
# 📈 PLOT 1: AVG SALARY
# =========================
plot_salary <- plot_ly(summary_company,
                      x = ~company_id,
                      y = ~avg_salary,
                      type = "bar")

plot_salary <- plot_salary %>%
  layout(title = "Average Salary per Company",
         xaxis = list(title = "Company"),
         yaxis = list(title = "Average Salary"))

plot_salary
# =========================
# 📈 PLOT 2: AVG PERFORMANCE
# =========================
plot_perf <- plot_ly(summary_company,
                    x = ~company_id,
                    y = ~avg_performance,
                    type = "bar")

plot_perf <- plot_perf %>%
  layout(title = "Average Performance per Company",
         xaxis = list(title = "Company"),
         yaxis = list(title = "Performance Score"))

plot_perf
# =========================
# 🥧 PIE CHART: CATEGORY DISTRIBUTION
# =========================
category_dist <- company_data %>%
  group_by(category) %>%
  summarise(count = n()) %>%
  mutate(percentage = round(count/sum(count)*100,2))

pie_chart <- plot_ly(category_dist,
                    labels = ~category,
                    values = ~percentage,
                    type = "pie")

pie_chart <- pie_chart %>%
  layout(title = "Employee Category Distribution")

pie_chart

4.2 Interpretation

This implementation simulates employee data across multiple companies using nested loops to represent companies and their employees. Each employee is assigned attributes such as salary, department, performance score, and KPI score.

A conditional rule is applied to classify employees as “Top Performer” when their KPI score exceeds 90, reflecting performance evaluation in real-world organizations.

The summary table provides key insights per company, including average salary, average performance, and maximum KPI score. The visualizations help compare company performance and highlight the distribution of top-performing employees.

5 Monte Carlo Simulation (Pi & Probability)

5.1 Implementation

library(plotly)
library(dplyr)
library(knitr)

# =========================
# FUNCTION: Monte Carlo Pi
# =========================
monte_carlo_pi <- function(n_points) {
  
  x_vals <- c()
  y_vals <- c()
  inside <- c()
  
  count_inside <- 0
  count_square <- 0
  
  for (i in 1:n_points) {
    
    # Generate random point
    x <- runif(1, -1, 1)
    y <- runif(1, -1, 1)
    
    x_vals <- c(x_vals, x)
    y_vals <- c(y_vals, y)
    
    # Check inside circle
    if (x^2 + y^2 <= 1) {
      inside <- c(inside, "Inside Circle")
      count_inside <- count_inside + 1
    } else {
      inside <- c(inside, "Outside Circle")
    }
    
    # Probability: sub-square (-0.5 to 0.5)
    if (x >= -0.5 && x <= 0.5 && y >= -0.5 && y <= 0.5) {
      count_square <- count_square + 1
    }
  }
  
  # Estimate Pi
  pi_estimate <- 4 * (count_inside / n_points)
  
  # Probability result
  prob_square <- count_square / n_points
  
  # Data frame
  data <- data.frame(
    x = x_vals,
    y = y_vals,
    status = inside
  )
  
  return(list(
    data = data,
    pi_estimate = pi_estimate,
    prob_square = prob_square
  ))
}


set.seed(123)
result <- monte_carlo_pi(1000)

data_mc <- result$data

# =========================
# 📋 TABLE RESULT
# =========================
cat(" Table: Monte Carlo Sample Points\n")
##  Table: Monte Carlo Sample Points
kable(head(data_mc, 20)) 
x y status
-0.4248450 0.5766103 Inside Circle
-0.1820462 0.7660348 Inside Circle
0.8809346 -0.9088870 Outside Circle
0.0562110 0.7848381 Inside Circle
0.1028700 -0.0867705 Inside Circle
0.9136667 -0.0933317 Inside Circle
0.3551413 0.1452668 Inside Circle
-0.7941506 0.7996499 Outside Circle
-0.5078245 -0.9158809 Outside Circle
-0.3441586 0.9090073 Inside Circle
0.7790786 0.3856068 Inside Circle
0.2810136 0.9885396 Outside Circle
0.3114116 0.4170609 Inside Circle
0.0881320 0.1882840 Inside Circle
-0.4216805 -0.7057727 Inside Circle
0.9260485 0.8045981 Outside Circle
0.3814106 0.5909348 Inside Circle
-0.9507726 -0.0444081 Inside Circle
0.5169191 -0.5671841 Inside Circle
-0.3636380 -0.5367484 Inside Circle
cat("Estimated Pi:", result$pi_estimate, "\n")
## Estimated Pi: 3.16
cat("Probability (point in sub-square):", result$prob_square, "\n")
## Probability (point in sub-square): 0.252
summary_mc <- data_mc %>%
  group_by(status) %>%
  summarise(count = n()) %>%
  mutate(percentage = round(count/sum(count)*100,2))


kable(summary_mc)
status count percentage
Inside Circle 790 79
Outside Circle 210 21
plot_mc <- plot_ly(data_mc,
                   x = ~x,
                   y = ~y,
                   color = ~status,
                   type = "scatter",
                   mode = "markers")

plot_mc <- plot_mc %>%
  layout(title = "Monte Carlo Simulation for Pi",
         xaxis = list(title = "X"),
         yaxis = list(title = "Y"))

plot_mc

5.2 Interpretation

This simulation uses the Monte Carlo method to estimate the value of π by generating random points within a square and checking how many fall inside a unit circle. The ratio of points inside the circle to total points is used to approximate π.

Additionally, the simulation computes the probability of points falling within a smaller sub-square, demonstrating probability estimation through random sampling.

The scatter plot visualizes the distribution of points, clearly distinguishing those inside and outside the circle. As the number of points increases, the estimation of π becomes more accurate, reflecting the law of large numbers.

6 Advanced Data Transformation & Feature Engineering

6.1 Implementation

library(dplyr)
library(plotly)
library(knitr)

# =========================
# SAMPLE DATA
# =========================
set.seed(123)
df <- data.frame(
  salary = sample(3000:10000, 30),
  performance_score = sample(60:100, 30)
)

# =========================
# FUNCTION: NORMALIZATION (Min-Max)
# =========================
normalize_columns <- function(df) {
  
  df_norm <- df
  
  for (col in names(df)) {
    if (is.numeric(df[[col]])) {
      min_val <- min(df[[col]])
      max_val <- max(df[[col]])
      
      df_norm[[col]] <- (df[[col]] - min_val) / (max_val - min_val)
    }
  }
  
  return(df_norm)
}

# =========================
# FUNCTION: Z-SCORE
# =========================
z_score <- function(df) {
  
  df_z <- df
  
  for (col in names(df)) {
    if (is.numeric(df[[col]])) {
      mean_val <- mean(df[[col]])
      sd_val <- sd(df[[col]])
      
      df_z[[col]] <- (df[[col]] - mean_val) / sd_val
    }
  }
  
  return(df_z)
}

# =========================
# APPLY TRANSFORMATION
# =========================
df_norm <- normalize_columns(df)
df_z <- z_score(df)

# =========================
# FEATURE ENGINEERING
# =========================
df_feat <- df %>%
  mutate(
    performance_category = case_when(
      performance_score >= 90 ~ "Excellent",
      performance_score >= 80 ~ "Very Good",
      performance_score >= 70 ~ "Good",
      performance_score >= 65 ~ "Average",
      TRUE ~ "Poor"
    ),
    
    salary_bracket = case_when(
      salary >= 8000 ~ "High",
      salary >= 5000 ~ "Medium",
      TRUE ~ "Low"
    )
  )

# =========================
# 📋 TABLE
# =========================
cat("### Table: Original Data with New Features\n")
## ### Table: Original Data with New Features
kable(df_feat)
salary performance_score performance_category salary_bracket
5462 78 Good Medium
5510 95 Excellent Medium
5226 73 Good Medium
3525 76 Good Low
7290 71 Good Medium
5985 74 Good Medium
4841 91 Excellent Low
4141 66 Average Low
6370 68 Average Medium
8348 92 Excellent High
8363 69 Average High
8133 82 Very Good High
6445 86 Very Good Medium
7760 87 Very Good Medium
9745 80 Very Good High
4626 93 Excellent Low
5756 88 Very Good Medium
8106 65 Average High
8210 61 Poor High
3952 64 Poor Low
7443 67 Average Medium
4016 96 Excellent Low
5012 72 Good Medium
8474 77 Good High
5887 60 Poor Medium
9169 94 Excellent High
5566 70 Good Medium
4449 75 Good Low
8768 83 Very Good High
4789 81 Very Good Low
# =========================
# 📊 COMPARISON DATA
# =========================
compare_df <- data.frame(
  original_salary = df$salary,
  normalized_salary = df_norm$salary,
  zscore_salary = df_z$salary
)

# =========================
# 📈 HISTOGRAM (Plotly)
# =========================
hist_plot <- plot_ly(compare_df, x = ~original_salary, type = "histogram", name = "Original") %>%
  add_trace(x = ~normalized_salary, name = "Normalized") %>%
  add_trace(x = ~zscore_salary, name = "Z-Score") %>%
  layout(title = "Salary Distribution Comparison")

hist_plot
# =========================
#  BOXPLOT
# =========================
library(tidyr)

compare_long <- compare_df %>%
  pivot_longer(cols = everything(),
               names_to = "type",
               values_to = "value")

box_plot <- plot_ly(compare_long,
                    x = ~type,
                    y = ~value,
                    type = "box")

box_plot <- box_plot %>%
  layout(title = "Boxplot Comparison (Original vs Normalized vs Z-Score)",
         xaxis = list(title = "Data Type"),
         yaxis = list(title = "Value"))

box_plot

6.2 Interpretation

This implementation applies advanced data transformation techniques, including normalization and z-score standardization, using loop-based functions. These methods rescale the data to make features comparable and suitable for analysis.

Additionally, new features are created to categorize performance and salary levels, enhancing the dataset with meaningful groupings. This reflects real-world feature engineering practices used in data science.

The visualizations compare the distribution of original and transformed data. Histograms show how the scale changes, while boxplots highlight differences in spread and outliers. Overall, the transformations improve data interpretability and prepare it for further analysis or modeling.

7 Mini Project – Company KPI Dashboard & Simulation

7.1 Implementation

library(dplyr)
library(plotly)
library(knitr)

# =========================
# FUNCTION: GENERATE DATA
# =========================
generate_kpi_data <- function(n_company = 5, min_emp = 50, max_emp = 100) {
  
  all_data <- data.frame()
  departments <- c("HR", "Finance", "IT", "Marketing", "Operations")
  
  for (c in 1:n_company) {
    
    n_employees <- sample(min_emp:max_emp, 1)
    
    for (e in 1:n_employees) {
      
      salary <- sample(3000:12000, 1)
      performance_score <- sample(60:100, 1)
      KPI_score <- sample(70:100, 1)
      department <- sample(departments, 1)
      
      temp <- data.frame(
        employee_id = paste0("E", c, "_", e),
        company_id = paste0("C", c),
        salary = salary,
        performance_score = performance_score,
        KPI_score = KPI_score,
        department = department
      )
      
      all_data <- rbind(all_data, temp)
    }
  }
  
  return(all_data)
}

# =========================
# GENERATE DATA
# =========================
set.seed(123)
df <- generate_kpi_data(5, 50, 100)

# =========================
# KPI TIER (LOOP)
# =========================
kpi_tier <- c()

for (k in df$KPI_score) {
  if (k >= 90) {
    kpi_tier <- c(kpi_tier, "Top Performer")
  } else if (k >= 80) {
    kpi_tier <- c(kpi_tier, "High")
  } else if (k >= 70) {
    kpi_tier <- c(kpi_tier, "Medium")
  } else {
    kpi_tier <- c(kpi_tier, "Low")
  }
}

df$kpi_tier <- kpi_tier

# =========================
# 📋 TABLE: SAMPLE DATA
# =========================
cat(" Table 1: Sample Employee Data\n")
##  Table 1: Sample Employee Data
kable(head(df, 20))
employee_id company_id salary performance_score KPI_score department kpi_tier
E1_1 C1 5510 73 72 Finance Medium
E1_2 C1 4841 96 89 HR High
E1_3 C1 9745 86 74 IT Medium
E1_4 C1 5887 85 76 Finance Medium
E1_5 C1 5979 73 86 IT High
E1_6 C1 7468 71 84 Finance High
E1_7 C1 10788 66 78 HR Medium
E1_8 C1 4046 86 97 Operations Top Performer
E1_9 C1 6206 86 75 HR Medium
E1_10 C1 11156 64 77 Marketing Medium
E1_11 C1 4598 72 87 HR High
E1_12 C1 7088 86 94 Operations Top Performer
E1_13 C1 3040 85 97 Marketing Top Performer
E1_14 C1 5503 81 91 HR Top Performer
E1_15 C1 11565 93 73 Operations Medium
E1_16 C1 5116 84 89 HR High
E1_17 C1 10126 94 77 Marketing Medium
E1_18 C1 6229 94 83 Operations High
E1_19 C1 7575 66 72 Finance Medium
E1_20 C1 8966 80 74 IT Medium
# =========================
# 📊 SUMMARY PER COMPANY
# =========================
summary_company <- df %>%
  group_by(company_id) %>%
  summarise(
    avg_salary = mean(salary),
    avg_KPI = mean(KPI_score),
    top_performers = sum(kpi_tier == "Top Performer")
  )

cat(" Table 2: Company Summary\n")
##  Table 2: Company Summary
kable(summary_company)
company_id avg_salary avg_KPI top_performers
C1 7480.175 83.65000 19
C2 7101.016 84.63934 22
C3 7890.163 84.62245 34
C4 7193.750 86.32292 38
C5 7903.556 86.95556 40
# =========================
# 📊 DEPARTMENT ANALYSIS
# =========================
dept_analysis <- df %>%
  group_by(company_id, department) %>%
  summarise(count = n(), .groups = "drop")

cat("Table 3: Department Distribution\n")
## Table 3: Department Distribution
kable(dept_analysis)
company_id department count
C1 Finance 18
C1 HR 14
C1 IT 11
C1 Marketing 16
C1 Operations 21
C2 Finance 11
C2 HR 9
C2 IT 14
C2 Marketing 19
C2 Operations 8
C3 Finance 19
C3 HR 19
C3 IT 17
C3 Marketing 24
C3 Operations 19
C4 Finance 13
C4 HR 26
C4 IT 21
C4 Marketing 19
C4 Operations 17
C5 Finance 20
C5 HR 19
C5 IT 21
C5 Marketing 14
C5 Operations 16
# =========================
# 📈 GROUPED BAR (DEPARTMENT)
# =========================
bar_dept <- plot_ly(dept_analysis,
                    x = ~department,
                    y = ~count,
                    color = ~company_id,
                    type = "bar")

bar_dept <- bar_dept %>%
  layout(title = "Department Distribution per Company",
         barmode = "group")

bar_dept
# =========================
# 📈 SCATTER + REGRESSION
# =========================
scatter <- plot_ly(df,
                   x = ~salary,
                   y = ~KPI_score,
                   color = ~company_id,
                   type = "scatter",
                   mode = "markers")

scatter <- scatter %>%
  layout(title = "Salary vs KPI Score")

scatter
# Tambahkan garis regresi sederhana
model <- lm(KPI_score ~ salary, data = df)

df$pred <- predict(model)

scatter_reg <- plot_ly(df,
                       x = ~salary,
                       y = ~KPI_score,
                       color = ~company_id,
                       type = "scatter",
                       mode = "markers") %>%
  add_lines(x = ~salary, y = ~pred, name = "Regression Line")

scatter_reg
# =========================
# 📈 SALARY DISTRIBUTION
# =========================
hist_salary <- plot_ly(df,
                       x = ~salary,
                       color = ~company_id,
                       type = "histogram")

hist_salary <- hist_salary %>%
  layout(title = "Salary Distribution")

hist_salary

7.2 Interpretation

This mini project simulates a company KPI dashboard by generating employee data across multiple companies. Each employee is assigned attributes such as salary, performance score, KPI score, and department.

A loop-based categorization is used to classify employees into KPI tiers, highlighting top performers and performance distribution. The summary table provides key metrics per company, including average salary, average KPI, and the number of top performers.

The visualizations offer deeper insights:

  • Grouped bar charts show department distribution across companies.
  • Scatter plots with regression lines reveal the relationship between salary and KPI performance.
  • Histograms illustrate salary distribution patterns.

Overall, this simulation reflects real-world data analysis workflows, combining data generation, transformation, and visualization into a comprehensive KPI dashboard.

8 Automated Report Generation (Bonus)

library(ggplot2)
library(dplyr)
library(knitr)

# =========================
# SAMPLE DATA (jika belum ada)
# =========================
set.seed(123)
df_company <- data.frame(
  company_id = sample(1:3, 150, replace = TRUE),
  salary = runif(150, 3000, 10000),
  KPI_score = runif(150, 50, 100),
  performance_score = runif(150, 50, 100),
  department = sample(c("IT","HR","Finance","Marketing"), 150, replace = TRUE)
)

# =========================
# KPI TIER
# =========================
df_company$kpi_tier <- ifelse(df_company$KPI_score >= 90, "Top Performer",
                              ifelse(df_company$KPI_score >= 80, "High",
                                     ifelse(df_company$KPI_score >= 70, "Medium", "Low")))

# =========================
# FUNCTION: AUTOMATED REPORT
# =========================
generate_report <- function(data) {
  
  for(c in unique(data$company_id)){
    
    cat("\n====================================\n")
    cat("Company ID:", c, "\n")
    cat("====================================\n")
    
    data_subset <- data %>% filter(company_id == c)
    
    # =========================
    # TABLE 1: SUMMARY
    # =========================
    summary_table <- data_subset %>%
      summarise(
        avg_salary = round(mean(salary),2),
        avg_KPI = round(mean(KPI_score),2),
        total_employee = n(),
        top_performer = sum(kpi_tier == "Top Performer")
      )
    
    cat("\nTable 1: Summary\n")
    print(kable(summary_table))
    
    # =========================
    # TABLE 2: TOP PERFORMERS
    # =========================
    top_data <- data_subset %>%
      filter(kpi_tier == "Top Performer") %>%
      arrange(desc(KPI_score)) %>%
      head(5)
    
    cat("\nTable 2: Top Performers\n")
    print(kable(top_data))
    
    # =========================
    # PLOT 1: DEPARTMENT DISTRIBUTION
    # =========================
    p1 <- ggplot(data_subset, aes(x = department, fill = department)) +
      geom_bar() +
      labs(title = paste("Department Distribution - Company", c),
           x = "Department", y = "Number of Employees") +
      theme_minimal() +
      theme(legend.position = "none")
    
    print(p1)
    
    # =========================
    # PLOT 2: SALARY vs KPI (IMPROVED)
    # =========================
    p2 <- ggplot(data_subset, aes(x = salary, y = KPI_score, color = department)) +
      geom_point(alpha = 0.7) +
      geom_smooth(method = "lm", se = FALSE, color = "black") +
      labs(title = paste("Salary vs KPI - Company", c),
           x = "Salary", y = "KPI Score") +
      theme_minimal()
    
    print(p2)
    
    # =========================
    # PLOT 3: SALARY DISTRIBUTION (IMPROVED)
    # =========================
    p3 <- ggplot(data_subset, aes(x = salary, fill = department)) +
      geom_histogram(bins = 15, alpha = 0.6, position = "identity") +
      labs(title = paste("Salary Distribution - Company", c),
           x = "Salary", y = "Frequency") +
      theme_minimal()
    
    print(p3)
    
    # =========================
    # EXPORT CSV
    # =========================
    write.csv(data_subset,
              paste0("company_", c, ".csv"),
              row.names = FALSE)
    
    cat("\n\n")
  }
}

# =========================
# RUN
# =========================
generate_report(df_company)
## 
## ====================================
## Company ID: 3 
## ====================================
## 
## Table 1: Summary
## 
## 
## | avg_salary| avg_KPI| total_employee| top_performer|
## |----------:|-------:|--------------:|-------------:|
## |    6640.58|   75.04|             54|            10|
## 
## Table 2: Top Performers
## 
## 
## | company_id|   salary| KPI_score| performance_score|department |kpi_tier      |
## |----------:|--------:|---------:|-----------------:|:----------|:-------------|
## |          3| 4800.517|  99.30271|          66.37987|HR         |Top Performer |
## |          3| 9313.121|  98.89267|          91.72005|Marketing  |Top Performer |
## |          3| 5727.110|  98.73629|          81.48727|Marketing  |Top Performer |
## |          3| 9409.785|  98.56712|          65.14438|HR         |Top Performer |
## |          3| 9736.513|  98.37347|          68.32207|IT         |Top Performer |

## 
## 
## 
## ====================================
## Company ID: 2 
## ====================================
## 
## Table 1: Summary
## 
## 
## | avg_salary| avg_KPI| total_employee| top_performer|
## |----------:|-------:|--------------:|-------------:|
## |    6260.05|    71.2|             54|             8|
## 
## Table 2: Top Performers
## 
## 
## | company_id|   salary| KPI_score| performance_score|department |kpi_tier      |
## |----------:|--------:|---------:|-----------------:|:----------|:-------------|
## |          2| 4574.897|  99.83086|          68.39480|Finance    |Top Performer |
## |          2| 8310.152|  97.65506|          93.93370|Finance    |Top Performer |
## |          2| 4513.784|  94.83693|          93.00534|HR         |Top Performer |
## |          2| 6366.376|  94.50390|          90.07148|HR         |Top Performer |
## |          2| 8098.761|  93.32417|          65.60564|Finance    |Top Performer |

## 
## 
## 
## ====================================
## Company ID: 1 
## ====================================
## 
## Table 1: Summary
## 
## 
## | avg_salary| avg_KPI| total_employee| top_performer|
## |----------:|-------:|--------------:|-------------:|
## |    6523.54|   77.18|             42|            10|
## 
## Table 2: Top Performers
## 
## 
## | company_id|   salary| KPI_score| performance_score|department |kpi_tier      |
## |----------:|--------:|---------:|-----------------:|:----------|:-------------|
## |          1| 8086.918|  99.56183|          69.25868|Marketing  |Top Performer |
## |          1| 9161.726|  99.29771|          58.90069|HR         |Top Performer |
## |          1| 5847.828|  99.16751|          62.65495|Finance    |Top Performer |
## |          1| 6827.783|  98.85495|          90.04741|Marketing  |Top Performer |
## |          1| 5766.541|  98.19217|          83.95067|Marketing  |Top Performer |

9 Conclusion & Reference

This practicum demonstrates the application of advanced programming concepts in data science using R, particularly through the integration of functions, loops, and conditional logic. Each task simulates real-world analytical scenarios, enabling a deeper understanding of how structured programming supports data-driven decision-making.

The Dynamic Multi-Formula function highlights how flexible models can be built to evaluate different mathematical behaviors simultaneously. The Nested Simulation and Performance Categorization tasks illustrate how iterative processes and logical conditions can be used to simulate business operations and classify performance effectively.

Furthermore, the Monte Carlo Simulation showcases the power of probabilistic methods in estimating mathematical constants and analyzing uncertainty through random sampling. The Advanced Data Transformation and Feature Engineering task emphasizes the importance of preparing and transforming data to improve interpretability and analytical quality.

The Mini Project and Automated Report Generation tasks represent a comprehensive data science workflow, combining data generation, transformation, visualization, and reporting. These tasks demonstrate how automated systems can generate insights efficiently across multiple entities, such as companies or departments.

Overall, this practicum reinforces the importance of combining programming logic with analytical thinking. It shows that well-structured code can be used not only to process data but also to generate meaningful insights, build interactive visualizations, and automate reporting processes in real-world data science applications.

No Author Year Title Publisher
1 Wickham, H. 2016 ggplot2: Elegant Graphics for Data Analysis Springer
2 Wickham, H. et al. 2023 dplyr: A Grammar of Data Manipulation R Package Documentation
3 Sievert, C. 2020 Interactive Web-Based Data Visualization with R, plotly, and shiny CRC Press
4 R Core Team 2023 R: A Language and Environment for Statistical Computing R Foundation
5 James, G. et al. 2021 An Introduction to Statistical Learning Springer
6 Ross, S. 2014 Introduction to Probability Models Academic Press