Syntax and Control Flow

Practicum ~ Week 4

Data Science | ITSB

Naifah Edria Arta

Digging into data, uncovering stories, and shaping the future one insight at a time.

Skill Focus

R Program Data Visualization Data Analysis Statistics

Instagram Email

Course: Data Science Programming

Academic Advisor: Bakti Siregar, M.Sc., CDS

- Introduction

This report is prepared to fulfill the Advanced Practicum requirements for the Data Science Programming course under the guidance of Bakti Siregar, M.Sc. The primary objective of this practicum is to develop an automated data science workflow by integrating multi-layer functions, nested loops, and complex conditional logic.The tasks within this practicum simulate real-world data science challenges, ranging from dynamic formula computations and Monte Carlo simulations to multi-company KPI analysis. By focusing on advanced statistics, data transformation, and visualization, this report demonstrates the practical application of R and Python in solving sophisticated analytical problems.

1 Dynamic Multi-Formula Function

1.1 Implementation

# =========================
# LIBRARY
# =========================
library(ggplot2)
library(tidyr)
library(plotly)

# =========================
# FUNCTION
# =========================
compute_formula <- function(x, formulas) {
  results <- list()
  
  for (f in formulas) {
    y <- numeric(length(x))
    
    for (i in seq_along(x)) {
      if (f == "linear") {
        y[i] <- x[i]
      } else if (f == "quadratic") {
        y[i] <- x[i]^2
      } else if (f == "cubic") {
        y[i] <- x[i]^3
      } else if (f == "exponential") {
        y[i] <- exp(x[i] / 5)
      } else {
        stop(paste("Formula tidak valid:", f))
      }
    }
    
    results[[f]] <- y
  }
  
  return(as.data.frame(results))
}

# =========================
# INPUT
# =========================
x_values <- 1:20
formulas <- c("linear", "quadratic", "cubic", "exponential")

# =========================
# RUN FUNCTION
# =========================
df <- compute_formula(x_values, formulas)
df$x <- x_values

# =========================
# TRANSFORM
# =========================
df_long <- pivot_longer(df,
                        cols = -x,
                        names_to = "formula",
                        values_to = "y")

# =========================
# PLOT
# =========================
p <- ggplot(
  df_long,
  aes(
    x = x,
    y = y,
    color = formula,
    text = paste0(
      "x: ", x,
      "<br>y: ", round(y,2),
      "<br>Formula: ", formula
    )
  )
) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  labs(
    title = "Dynamic Multi-Formula Plot",
    subtitle = "Linear, Quadratic, Cubic, Exponential",
    x = "X Value",
    y = "Y Value"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "top"
  )

ggplotly(p, tooltip = "text") %>%
  layout(hovermode = "x unified")

1.2 Interpretation

This implementation demonstrates how different mathematical models behave across the same range of input values using nested loops and conditional logic.

The visualization shows that each formula has a distinct growth pattern. The linear function increases at a constant rate, while the quadratic and cubic functions grow progressively faster as the value of x increases. The exponential function exhibits the most rapid growth, especially at higher values of x, highlighting its sensitivity to change.

Overall, the comparison clearly illustrates how higher-order and exponential functions can lead to significantly larger outputs, which is important in understanding model selection and data behavior in real-world applications.

2 Nested Simulation – Multi-Sales & Discounts

2.1 Implementation

library(dplyr)
library(plotly)
library(knitr)

# Fungsi simulasi (tetap sama)
simulate_sales <- function(n_salesperson, days) {
  
  all_data <- data.frame()
  
  for (sp in 1:n_salesperson) {
    
    cumulative_sales <- 0
    
    for (d in 1:days) {
      
      sales_amount <- sample(100:1000, 1)
      
      if (sales_amount > 800) {
        discount_rate <- 0.20
      } else if (sales_amount > 500) {
        discount_rate <- 0.10
      } else {
        discount_rate <- 0.05
      }
      
      cumulative_sales <- cumulative_sales + sales_amount
      
      temp <- data.frame(
        salesperson = paste0("SP", sp),
        day = d,
        sales_amount = sales_amount,
        discount_rate = discount_rate,
        cumulative_sales = cumulative_sales
      )
      
      all_data <- rbind(all_data, temp)
    }
  }
  
  return(all_data)
}

# Jalankan simulasi
set.seed(123)
data_sales <- simulate_sales(3, 10)


cat(" Table 1: Sales Data\n")

##  Table 1: Sales Data

kable(data_sales)

salesperson	day	sales_amount	discount_rate	cumulative_sales
SP1	1	514	0.10	514
SP1	2	562	0.10	1076
SP1	3	278	0.05	1354
SP1	4	625	0.10	1979
SP1	5	294	0.05	2273
SP1	6	917	0.20	3190
SP1	7	217	0.05	3407
SP1	8	398	0.05	3805
SP1	9	328	0.05	4133
SP1	10	343	0.05	4476
SP2	1	113	0.05	113
SP2	2	473	0.05	586
SP2	3	764	0.10	1350
SP2	4	701	0.10	2051
SP2	5	702	0.10	2753
SP2	6	867	0.20	3620
SP2	7	808	0.20	4428
SP2	8	190	0.05	4618
SP2	9	447	0.05	5065
SP2	10	748	0.10	5813
SP3	1	454	0.05	454
SP3	2	939	0.20	1393
SP3	3	125	0.05	1518
SP3	4	618	0.10	2136
SP3	5	525	0.10	2661
SP3	6	748	0.10	3409
SP3	7	865	0.20	4274
SP3	8	310	0.05	4584
SP3	9	689	0.10	5273
SP3	10	692	0.10	5965

summary_stats <- data_sales %>%
  group_by(salesperson) %>%
  summarise(
    total_sales = sum(sales_amount),
    mean_sales = mean(sales_amount)
  )

cat(" Table 2: Summary Statistics\n")

##  Table 2: Summary Statistics

kable(summary_stats)

salesperson	total_sales	mean_sales
SP1	4476	447.6
SP2	5813	581.3
SP3	5965	596.5

# =========================
# 📈 PLOTLY VISUALIZATION
# =========================
fig <- plot_ly(data_sales,
               x = ~day,
               y = ~cumulative_sales,
               color = ~salesperson,
               type = 'scatter',
               mode = 'lines+markers')

fig <- fig %>%
  layout(title = "Cumulative Sales per Salesperson",
         xaxis = list(title = "Day"),
         yaxis = list(title = "Cumulative Sales"))

fig

2.2 Interpretation

This implementation simulates daily sales activity for multiple salespersons over a given period using a structured approach with functions, loops, and conditionals.

The simulate_sales function generates random sales amounts for each salesperson across several days. A conditional logic is applied to assign discount rates based on the sales value, reflecting real-world business rules. The use of nested loops allows the model to iterate through each salesperson and each day systematically.

Cumulative sales are calculated progressively, enabling tracking of overall performance over time. The resulting dataset is then summarized to show total and average sales per salesperson, providing a clear comparison of performance.

Finally, the interactive Plotly visualization helps illustrate how cumulative sales grow over time for each salesperson, making it easier to identify trends and differences in sales performance.

3 3. Multi-Level Performance Categorization

3.1 Implementation

library(dplyr)
library(plotly)
library(knitr)

# =========================
# DATA SIMULATION
# =========================
set.seed(123)
sales_amount <- sample(100:1000, 30)

# =========================
# FUNCTION: Categorize Performance
# =========================
categorize_performance <- function(sales_amount) {
  
  categories <- c()
  
  for (s in sales_amount) {
    if (s >= 900) {
      categories <- c(categories, "Excellent")
    } else if (s >= 700) {
      categories <- c(categories, "Very Good")
    } else if (s >= 500) {
      categories <- c(categories, "Good")
    } else if (s >= 300) {
      categories <- c(categories, "Average")
    } else {
      categories <- c(categories, "Poor")
    }
  }
  
  return(categories)
}

# =========================
# APPLY FUNCTION
# =========================
performance <- categorize_performance(sales_amount)

data_perf <- data.frame(
  sales_amount = sales_amount,
  category = performance
)

# Tambah ID biar rapi
data_perf <- data_perf %>%
  mutate(id = row_number()) %>%
  select(id, everything())

# =========================
# 📋 TABLE: IMPLEMENTATION RESULT
# =========================
cat("### Table 1: Sales Performance Categorization\n")

## ### Table 1: Sales Performance Categorization

kable(data_perf)

id	sales_amount	category
1	514	Good
2	562	Good
3	278	Poor
4	625	Good
5	294	Poor
6	917	Excellent
7	217	Poor
8	398	Average
9	328	Average
10	343	Average
11	113	Poor
12	473	Average
13	764	Very Good
14	701	Very Good
15	702	Very Good
16	867	Very Good
17	808	Very Good
18	190	Poor
19	447	Average
20	748	Very Good
21	454	Average
22	939	Excellent
23	125	Poor
24	618	Good
25	525	Good
26	981	Excellent
27	865	Very Good
28	310	Average
29	689	Good
30	692	Good

# =========================
# 📊 SUMMARY TABLE
# =========================
summary_perf <- data_perf %>%
  group_by(category) %>%
  summarise(count = n()) %>%
  mutate(percentage = round((count / sum(count)) * 100, 2))

cat("\n### Table 2: Summary Statistics\n")

## 
## ### Table 2: Summary Statistics

kable(summary_perf)

category	count	percentage
Average	7	23.33
Excellent	3	10.00
Good	7	23.33
Poor	6	20.00
Very Good	7	23.33

# =========================
# 📈 BAR PLOT (Plotly)
# =========================
bar_plot <- plot_ly(summary_perf,
                   x = ~category,
                   y = ~count,
                   type = "bar")

bar_plot <- bar_plot %>%
  layout(title = "Performance Distribution (Bar Plot)",
         xaxis = list(title = "Category"),
         yaxis = list(title = "Count"))

bar_plot

# =========================
# 🥧 PIE CHART (Plotly)
# =========================
pie_chart <- plot_ly(summary_perf,
                    labels = ~category,
                    values = ~percentage,
                    type = "pie")

pie_chart <- pie_chart %>%
  layout(title = "Performance Distribution (Pie Chart)")

pie_chart

3.2 Interpretation

This implementation categorizes sales performance into five levels: Excellent, Very Good, Good, Average, and Poor based on sales amount thresholds.

A loop is used to assign each sales value into a category using conditional logic, simulating a real-world evaluation system. The results show the distribution of performance levels, both in counts and percentages.

The bar plot highlights the number of occurrences in each category, while the pie chart provides a clear view of their proportional distribution. This helps identify which performance level dominates and supports decision-making in evaluating overall sales performance.

4 Multi-Company Dataset Simulation

4.1 Implementation

library(dplyr)
library(plotly)
library(knitr)

# =========================
# FUNCTION: Generate Company Data
# =========================
generate_company_data <- function(n_company, n_employees) {
  
  all_data <- data.frame()
  departments <- c("HR", "Finance", "IT", "Marketing")
  
  for (c in 1:n_company) {
    
    for (e in 1:n_employees) {
      
      salary <- sample(3000:10000, 1)
      performance_score <- sample(60:100, 1)
      KPI_score <- sample(70:100, 1)
      department <- sample(departments, 1)
      
      # Conditional: Top Performer
      if (KPI_score > 90) {
        category <- "Top Performer"
      } else {
        category <- "Regular"
      }
      
      temp <- data.frame(
        company_id = paste0("C", c),
        employee_id = paste0("E", e),
        department = department,
        salary = salary,
        performance_score = performance_score,
        KPI_score = KPI_score,
        category = category
      )
      
      all_data <- rbind(all_data, temp)
    }
  }
  
  return(all_data)
}

# =========================
# GENERATE DATA
# =========================
set.seed(123)
company_data <- generate_company_data(3, 10)

# =========================
# 📋 TABLE 1: FULL DATA
# =========================
cat("### Table 1: Company Employee Data\n")

## ### Table 1: Company Employee Data

kable(company_data)

company_id	employee_id	department	salary	performance_score	KPI_score	category
C1	E1	Finance	5462	74	88	Regular
C1	E2	Finance	7290	96	89	Regular
C1	E3	IT	6445	84	95	Top Performer
C1	E4	Marketing	5756	86	94	Top Performer
C1	E5	IT	4016	68	98	Top Performer
C1	E6	Finance	5887	85	76	Regular
C1	E7	Finance	8768	78	73	Regular
C1	E8	Marketing	9736	98	90	Regular
C1	E9	HR	4166	91	79	Regular
C1	E10	Finance	4798	68	78	Regular
C2	E1	HR	4046	86	97	Top Performer
C2	E2	HR	6206	86	75	Regular
C2	E3	Marketing	4313	88	74	Regular
C2	E4	HR	3587	72	87	Regular
C2	E5	Finance	7088	86	94	Top Performer
C2	E6	IT	3276	74	78	Regular
C2	E7	Marketing	9233	90	85	Regular
C2	E8	Finance	5821	67	91	Top Performer
C2	E9	Finance	4182	76	91	Top Performer
C2	E10	HR	9128	93	73	Regular
C3	E1	Finance	5116	84	89	Regular
C3	E2	HR	8208	91	83	Regular
C3	E3	Finance	5338	99	85	Regular
C3	E4	Finance	6979	90	94	Top Performer
C3	E5	HR	6229	94	83	Regular
C3	E6	IT	7575	66	72	Regular
C3	E7	HR	4913	74	90	Regular
C3	E8	Finance	4074	69	87	Regular
C3	E9	Finance	5283	93	79	Regular
C3	E10	Finance	7222	71	89	Regular

# =========================
# 📊 SUMMARY PER COMPANY
# =========================
summary_company <- company_data %>%
  group_by(company_id) %>%
  summarise(
    avg_salary = mean(salary),
    avg_performance = mean(performance_score),
    max_KPI = max(KPI_score)
  )

cat("\n### Table 2: Company Summary\n")

## 
## ### Table 2: Company Summary

kable(summary_company)

company_id	avg_salary	avg_performance	max_KPI
C1	6232.4	82.8	98
C2	5688.0	81.8	97
C3	6093.7	83.1	94

# =========================
# 📈 PLOT 1: AVG SALARY
# =========================
plot_salary <- plot_ly(summary_company,
                      x = ~company_id,
                      y = ~avg_salary,
                      type = "bar")

plot_salary <- plot_salary %>%
  layout(title = "Average Salary per Company",
         xaxis = list(title = "Company"),
         yaxis = list(title = "Average Salary"))

plot_salary

# =========================
# 📈 PLOT 2: AVG PERFORMANCE
# =========================
plot_perf <- plot_ly(summary_company,
                    x = ~company_id,
                    y = ~avg_performance,
                    type = "bar")

plot_perf <- plot_perf %>%
  layout(title = "Average Performance per Company",
         xaxis = list(title = "Company"),
         yaxis = list(title = "Performance Score"))

plot_perf

# =========================
# 🥧 PIE CHART: CATEGORY DISTRIBUTION
# =========================
category_dist <- company_data %>%
  group_by(category) %>%
  summarise(count = n()) %>%
  mutate(percentage = round(count/sum(count)*100,2))

pie_chart <- plot_ly(category_dist,
                    labels = ~category,
                    values = ~percentage,
                    type = "pie")

pie_chart <- pie_chart %>%
  layout(title = "Employee Category Distribution")

pie_chart

4.2 Interpretation

This implementation simulates employee data across multiple companies using nested loops to represent companies and their employees. Each employee is assigned attributes such as salary, department, performance score, and KPI score.

A conditional rule is applied to classify employees as “Top Performer” when their KPI score exceeds 90, reflecting performance evaluation in real-world organizations.

The summary table provides key insights per company, including average salary, average performance, and maximum KPI score. The visualizations help compare company performance and highlight the distribution of top-performing employees.

5 Monte Carlo Simulation (Pi & Probability)

5.1 Implementation

library(plotly)
library(dplyr)
library(knitr)

# =========================
# FUNCTION: Monte Carlo Pi
# =========================
monte_carlo_pi <- function(n_points) {
  
  x_vals <- c()
  y_vals <- c()
  inside <- c()
  
  count_inside <- 0
  count_square <- 0
  
  for (i in 1:n_points) {
    
    # Generate random point
    x <- runif(1, -1, 1)
    y <- runif(1, -1, 1)
    
    x_vals <- c(x_vals, x)
    y_vals <- c(y_vals, y)
    
    # Check inside circle
    if (x^2 + y^2 <= 1) {
      inside <- c(inside, "Inside Circle")
      count_inside <- count_inside + 1
    } else {
      inside <- c(inside, "Outside Circle")
    }
    
    # Probability: sub-square (-0.5 to 0.5)
    if (x >= -0.5 && x <= 0.5 && y >= -0.5 && y <= 0.5) {
      count_square <- count_square + 1
    }
  }
  
  # Estimate Pi
  pi_estimate <- 4 * (count_inside / n_points)
  
  # Probability result
  prob_square <- count_square / n_points
  
  # Data frame
  data <- data.frame(
    x = x_vals,
    y = y_vals,
    status = inside
  )
  
  return(list(
    data = data,
    pi_estimate = pi_estimate,
    prob_square = prob_square
  ))
}


set.seed(123)
result <- monte_carlo_pi(1000)

data_mc <- result$data

# =========================
# 📋 TABLE RESULT
# =========================
cat(" Table: Monte Carlo Sample Points\n")

##  Table: Monte Carlo Sample Points

kable(head(data_mc, 20))

x	y	status
-0.4248450	0.5766103	Inside Circle
-0.1820462	0.7660348	Inside Circle
0.8809346	-0.9088870	Outside Circle
0.0562110	0.7848381	Inside Circle
0.1028700	-0.0867705	Inside Circle
0.9136667	-0.0933317	Inside Circle
0.3551413	0.1452668	Inside Circle
-0.7941506	0.7996499	Outside Circle
-0.5078245	-0.9158809	Outside Circle
-0.3441586	0.9090073	Inside Circle
0.7790786	0.3856068	Inside Circle
0.2810136	0.9885396	Outside Circle
0.3114116	0.4170609	Inside Circle
0.0881320	0.1882840	Inside Circle
-0.4216805	-0.7057727	Inside Circle
0.9260485	0.8045981	Outside Circle
0.3814106	0.5909348	Inside Circle
-0.9507726	-0.0444081	Inside Circle
0.5169191	-0.5671841	Inside Circle
-0.3636380	-0.5367484	Inside Circle

cat("Estimated Pi:", result$pi_estimate, "\n")

## Estimated Pi: 3.16

cat("Probability (point in sub-square):", result$prob_square, "\n")

## Probability (point in sub-square): 0.252

summary_mc <- data_mc %>%
  group_by(status) %>%
  summarise(count = n()) %>%
  mutate(percentage = round(count/sum(count)*100,2))


kable(summary_mc)

status	count	percentage
Inside Circle	790	79
Outside Circle	210	21

plot_mc <- plot_ly(data_mc,
                   x = ~x,
                   y = ~y,
                   color = ~status,
                   type = "scatter",
                   mode = "markers")

plot_mc <- plot_mc %>%
  layout(title = "Monte Carlo Simulation for Pi",
         xaxis = list(title = "X"),
         yaxis = list(title = "Y"))

plot_mc

5.2 Interpretation

This simulation uses the Monte Carlo method to estimate the value of π by generating random points within a square and checking how many fall inside a unit circle. The ratio of points inside the circle to total points is used to approximate π.

Additionally, the simulation computes the probability of points falling within a smaller sub-square, demonstrating probability estimation through random sampling.

The scatter plot visualizes the distribution of points, clearly distinguishing those inside and outside the circle. As the number of points increases, the estimation of π becomes more accurate, reflecting the law of large numbers.

6 Advanced Data Transformation & Feature Engineering

6.1 Implementation

library(dplyr)
library(plotly)
library(knitr)

# =========================
# SAMPLE DATA
# =========================
set.seed(123)
df <- data.frame(
  salary = sample(3000:10000, 30),
  performance_score = sample(60:100, 30)
)

# =========================
# FUNCTION: NORMALIZATION (Min-Max)
# =========================
normalize_columns <- function(df) {
  
  df_norm <- df
  
  for (col in names(df)) {
    if (is.numeric(df[[col]])) {
      min_val <- min(df[[col]])
      max_val <- max(df[[col]])
      
      df_norm[[col]] <- (df[[col]] - min_val) / (max_val - min_val)
    }
  }
  
  return(df_norm)
}

# =========================
# FUNCTION: Z-SCORE
# =========================
z_score <- function(df) {
  
  df_z <- df
  
  for (col in names(df)) {
    if (is.numeric(df[[col]])) {
      mean_val <- mean(df[[col]])
      sd_val <- sd(df[[col]])
      
      df_z[[col]] <- (df[[col]] - mean_val) / sd_val
    }
  }
  
  return(df_z)
}

# =========================
# APPLY TRANSFORMATION
# =========================
df_norm <- normalize_columns(df)
df_z <- z_score(df)

# =========================
# FEATURE ENGINEERING
# =========================
df_feat <- df %>%
  mutate(
    performance_category = case_when(
      performance_score >= 90 ~ "Excellent",
      performance_score >= 80 ~ "Very Good",
      performance_score >= 70 ~ "Good",
      performance_score >= 65 ~ "Average",
      TRUE ~ "Poor"
    ),
    
    salary_bracket = case_when(
      salary >= 8000 ~ "High",
      salary >= 5000 ~ "Medium",
      TRUE ~ "Low"
    )
  )

# =========================
# 📋 TABLE
# =========================
cat("### Table: Original Data with New Features\n")

## ### Table: Original Data with New Features

kable(df_feat)

salary	performance_score	performance_category	salary_bracket
5462	78	Good	Medium
5510	95	Excellent	Medium
5226	73	Good	Medium
3525	76	Good	Low
7290	71	Good	Medium
5985	74	Good	Medium
4841	91	Excellent	Low
4141	66	Average	Low
6370	68	Average	Medium
8348	92	Excellent	High
8363	69	Average	High
8133	82	Very Good	High
6445	86	Very Good	Medium
7760	87	Very Good	Medium
9745	80	Very Good	High
4626	93	Excellent	Low
5756	88	Very Good	Medium
8106	65	Average	High
8210	61	Poor	High
3952	64	Poor	Low
7443	67	Average	Medium
4016	96	Excellent	Low
5012	72	Good	Medium
8474	77	Good	High
5887	60	Poor	Medium
9169	94	Excellent	High
5566	70	Good	Medium
4449	75	Good	Low
8768	83	Very Good	High
4789	81	Very Good	Low

# =========================
# 📊 COMPARISON DATA
# =========================
compare_df <- data.frame(
  original_salary = df$salary,
  normalized_salary = df_norm$salary,
  zscore_salary = df_z$salary
)

# =========================
# 📈 HISTOGRAM (Plotly)
# =========================
hist_plot <- plot_ly(compare_df, x = ~original_salary, type = "histogram", name = "Original") %>%
  add_trace(x = ~normalized_salary, name = "Normalized") %>%
  add_trace(x = ~zscore_salary, name = "Z-Score") %>%
  layout(title = "Salary Distribution Comparison")

hist_plot

# =========================
#  BOXPLOT
# =========================
library(tidyr)

compare_long <- compare_df %>%
  pivot_longer(cols = everything(),
               names_to = "type",
               values_to = "value")

box_plot <- plot_ly(compare_long,
                    x = ~type,
                    y = ~value,
                    type = "box")

box_plot <- box_plot %>%
  layout(title = "Boxplot Comparison (Original vs Normalized vs Z-Score)",
         xaxis = list(title = "Data Type"),
         yaxis = list(title = "Value"))

box_plot

6.2 Interpretation

This implementation applies advanced data transformation techniques, including normalization and z-score standardization, using loop-based functions. These methods rescale the data to make features comparable and suitable for analysis.

Additionally, new features are created to categorize performance and salary levels, enhancing the dataset with meaningful groupings. This reflects real-world feature engineering practices used in data science.

The visualizations compare the distribution of original and transformed data. Histograms show how the scale changes, while boxplots highlight differences in spread and outliers. Overall, the transformations improve data interpretability and prepare it for further analysis or modeling.

7 Mini Project – Company KPI Dashboard & Simulation

7.1 Implementation

library(dplyr)
library(plotly)
library(knitr)

# =========================
# FUNCTION: GENERATE DATA
# =========================
generate_kpi_data <- function(n_company = 5, min_emp = 50, max_emp = 100) {
  
  all_data <- data.frame()
  departments <- c("HR", "Finance", "IT", "Marketing", "Operations")
  
  for (c in 1:n_company) {
    
    n_employees <- sample(min_emp:max_emp, 1)
    
    for (e in 1:n_employees) {
      
      salary <- sample(3000:12000, 1)
      performance_score <- sample(60:100, 1)
      KPI_score <- sample(70:100, 1)
      department <- sample(departments, 1)
      
      temp <- data.frame(
        employee_id = paste0("E", c, "_", e),
        company_id = paste0("C", c),
        salary = salary,
        performance_score = performance_score,
        KPI_score = KPI_score,
        department = department
      )
      
      all_data <- rbind(all_data, temp)
    }
  }
  
  return(all_data)
}

# =========================
# GENERATE DATA
# =========================
set.seed(123)
df <- generate_kpi_data(5, 50, 100)

# =========================
# KPI TIER (LOOP)
# =========================
kpi_tier <- c()

for (k in df$KPI_score) {
  if (k >= 90) {
    kpi_tier <- c(kpi_tier, "Top Performer")
  } else if (k >= 80) {
    kpi_tier <- c(kpi_tier, "High")
  } else if (k >= 70) {
    kpi_tier <- c(kpi_tier, "Medium")
  } else {
    kpi_tier <- c(kpi_tier, "Low")
  }
}

df$kpi_tier <- kpi_tier

# =========================
# 📋 TABLE: SAMPLE DATA
# =========================
cat(" Table 1: Sample Employee Data\n")

##  Table 1: Sample Employee Data

kable(head(df, 20))

employee_id	company_id	salary	performance_score	KPI_score	department	kpi_tier
E1_1	C1	5510	73	72	Finance	Medium
E1_2	C1	4841	96	89	HR	High
E1_3	C1	9745	86	74	IT	Medium
E1_4	C1	5887	85	76	Finance	Medium
E1_5	C1	5979	73	86	IT	High
E1_6	C1	7468	71	84	Finance	High
E1_7	C1	10788	66	78	HR	Medium
E1_8	C1	4046	86	97	Operations	Top Performer
E1_9	C1	6206	86	75	HR	Medium
E1_10	C1	11156	64	77	Marketing	Medium
E1_11	C1	4598	72	87	HR	High
E1_12	C1	7088	86	94	Operations	Top Performer
E1_13	C1	3040	85	97	Marketing	Top Performer
E1_14	C1	5503	81	91	HR	Top Performer
E1_15	C1	11565	93	73	Operations	Medium
E1_16	C1	5116	84	89	HR	High
E1_17	C1	10126	94	77	Marketing	Medium
E1_18	C1	6229	94	83	Operations	High
E1_19	C1	7575	66	72	Finance	Medium
E1_20	C1	8966	80	74	IT	Medium

# =========================
# 📊 SUMMARY PER COMPANY
# =========================
summary_company <- df %>%
  group_by(company_id) %>%
  summarise(
    avg_salary = mean(salary),
    avg_KPI = mean(KPI_score),
    top_performers = sum(kpi_tier == "Top Performer")
  )

cat(" Table 2: Company Summary\n")

##  Table 2: Company Summary

kable(summary_company)

company_id	avg_salary	avg_KPI	top_performers
C1	7480.175	83.65000	19
C2	7101.016	84.63934	22
C3	7890.163	84.62245	34
C4	7193.750	86.32292	38
C5	7903.556	86.95556	40

# =========================
# 📊 DEPARTMENT ANALYSIS
# =========================
dept_analysis <- df %>%
  group_by(company_id, department) %>%
  summarise(count = n(), .groups = "drop")

cat("Table 3: Department Distribution\n")

## Table 3: Department Distribution

kable(dept_analysis)

company_id	department	count
C1	Finance	18
C1	HR	14
C1	IT	11
C1	Marketing	16
C1	Operations	21
C2	Finance	11
C2	HR	9
C2	IT	14
C2	Marketing	19
C2	Operations	8
C3	Finance	19
C3	HR	19
C3	IT	17
C3	Marketing	24
C3	Operations	19
C4	Finance	13
C4	HR	26
C4	IT	21
C4	Marketing	19
C4	Operations	17
C5	Finance	20
C5	HR	19
C5	IT	21
C5	Marketing	14
C5	Operations	16

# =========================
# 📈 GROUPED BAR (DEPARTMENT)
# =========================
bar_dept <- plot_ly(dept_analysis,
                    x = ~department,
                    y = ~count,
                    color = ~company_id,
                    type = "bar")

bar_dept <- bar_dept %>%
  layout(title = "Department Distribution per Company",
         barmode = "group")

bar_dept

# =========================
# 📈 SCATTER + REGRESSION
# =========================
scatter <- plot_ly(df,
                   x = ~salary,
                   y = ~KPI_score,
                   color = ~company_id,
                   type = "scatter",
                   mode = "markers")

scatter <- scatter %>%
  layout(title = "Salary vs KPI Score")

scatter

# Tambahkan garis regresi sederhana
model <- lm(KPI_score ~ salary, data = df)

df$pred <- predict(model)

scatter_reg <- plot_ly(df,
                       x = ~salary,
                       y = ~KPI_score,
                       color = ~company_id,
                       type = "scatter",
                       mode = "markers") %>%
  add_lines(x = ~salary, y = ~pred, name = "Regression Line")

scatter_reg

# =========================
# 📈 SALARY DISTRIBUTION
# =========================
hist_salary <- plot_ly(df,
                       x = ~salary,
                       color = ~company_id,
                       type = "histogram")

hist_salary <- hist_salary %>%
  layout(title = "Salary Distribution")

hist_salary

7.2 Interpretation

This mini project simulates a company KPI dashboard by generating employee data across multiple companies. Each employee is assigned attributes such as salary, performance score, KPI score, and department.

A loop-based categorization is used to classify employees into KPI tiers, highlighting top performers and performance distribution. The summary table provides key metrics per company, including average salary, average KPI, and the number of top performers.

The visualizations offer deeper insights:

Grouped bar charts show department distribution across companies.
Scatter plots with regression lines reveal the relationship between salary and KPI performance.
Histograms illustrate salary distribution patterns.

Overall, this simulation reflects real-world data analysis workflows, combining data generation, transformation, and visualization into a comprehensive KPI dashboard.

8 Automated Report Generation (Bonus)

library(ggplot2)
library(dplyr)
library(knitr)

# =========================
# SAMPLE DATA (jika belum ada)
# =========================
set.seed(123)
df_company <- data.frame(
  company_id = sample(1:3, 150, replace = TRUE),
  salary = runif(150, 3000, 10000),
  KPI_score = runif(150, 50, 100),
  performance_score = runif(150, 50, 100),
  department = sample(c("IT","HR","Finance","Marketing"), 150, replace = TRUE)
)

# =========================
# KPI TIER
# =========================
df_company$kpi_tier <- ifelse(df_company$KPI_score >= 90, "Top Performer",
                              ifelse(df_company$KPI_score >= 80, "High",
                                     ifelse(df_company$KPI_score >= 70, "Medium", "Low")))

# =========================
# FUNCTION: AUTOMATED REPORT
# =========================
generate_report <- function(data) {
  
  for(c in unique(data$company_id)){
    
    cat("\n====================================\n")
    cat("Company ID:", c, "\n")
    cat("====================================\n")
    
    data_subset <- data %>% filter(company_id == c)
    
    # =========================
    # TABLE 1: SUMMARY
    # =========================
    summary_table <- data_subset %>%
      summarise(
        avg_salary = round(mean(salary),2),
        avg_KPI = round(mean(KPI_score),2),
        total_employee = n(),
        top_performer = sum(kpi_tier == "Top Performer")
      )
    
    cat("\nTable 1: Summary\n")
    print(kable(summary_table))
    
    # =========================
    # TABLE 2: TOP PERFORMERS
    # =========================
    top_data <- data_subset %>%
      filter(kpi_tier == "Top Performer") %>%
      arrange(desc(KPI_score)) %>%
      head(5)
    
    cat("\nTable 2: Top Performers\n")
    print(kable(top_data))
    
    # =========================
    # PLOT 1: DEPARTMENT DISTRIBUTION
    # =========================
    p1 <- ggplot(data_subset, aes(x = department, fill = department)) +
      geom_bar() +
      labs(title = paste("Department Distribution - Company", c),
           x = "Department", y = "Number of Employees") +
      theme_minimal() +
      theme(legend.position = "none")
    
    print(p1)
    
    # =========================
    # PLOT 2: SALARY vs KPI (IMPROVED)
    # =========================
    p2 <- ggplot(data_subset, aes(x = salary, y = KPI_score, color = department)) +
      geom_point(alpha = 0.7) +
      geom_smooth(method = "lm", se = FALSE, color = "black") +
      labs(title = paste("Salary vs KPI - Company", c),
           x = "Salary", y = "KPI Score") +
      theme_minimal()
    
    print(p2)
    
    # =========================
    # PLOT 3: SALARY DISTRIBUTION (IMPROVED)
    # =========================
    p3 <- ggplot(data_subset, aes(x = salary, fill = department)) +
      geom_histogram(bins = 15, alpha = 0.6, position = "identity") +
      labs(title = paste("Salary Distribution - Company", c),
           x = "Salary", y = "Frequency") +
      theme_minimal()
    
    print(p3)
    
    # =========================
    # EXPORT CSV
    # =========================
    write.csv(data_subset,
              paste0("company_", c, ".csv"),
              row.names = FALSE)
    
    cat("\n\n")
  }
}

# =========================
# RUN
# =========================
generate_report(df_company)

## 
## ====================================
## Company ID: 3 
## ====================================
## 
## Table 1: Summary
## 
## 
## | avg_salary| avg_KPI| total_employee| top_performer|
## |----------:|-------:|--------------:|-------------:|
## |    6640.58|   75.04|             54|            10|
## 
## Table 2: Top Performers
## 
## 
## | company_id|   salary| KPI_score| performance_score|department |kpi_tier      |
## |----------:|--------:|---------:|-----------------:|:----------|:-------------|
## |          3| 4800.517|  99.30271|          66.37987|HR         |Top Performer |
## |          3| 9313.121|  98.89267|          91.72005|Marketing  |Top Performer |
## |          3| 5727.110|  98.73629|          81.48727|Marketing  |Top Performer |
## |          3| 9409.785|  98.56712|          65.14438|HR         |Top Performer |
## |          3| 9736.513|  98.37347|          68.32207|IT         |Top Performer |

## 
## 
## 
## ====================================
## Company ID: 2 
## ====================================
## 
## Table 1: Summary
## 
## 
## | avg_salary| avg_KPI| total_employee| top_performer|
## |----------:|-------:|--------------:|-------------:|
## |    6260.05|    71.2|             54|             8|
## 
## Table 2: Top Performers
## 
## 
## | company_id|   salary| KPI_score| performance_score|department |kpi_tier      |
## |----------:|--------:|---------:|-----------------:|:----------|:-------------|
## |          2| 4574.897|  99.83086|          68.39480|Finance    |Top Performer |
## |          2| 8310.152|  97.65506|          93.93370|Finance    |Top Performer |
## |          2| 4513.784|  94.83693|          93.00534|HR         |Top Performer |
## |          2| 6366.376|  94.50390|          90.07148|HR         |Top Performer |
## |          2| 8098.761|  93.32417|          65.60564|Finance    |Top Performer |

## 
## 
## 
## ====================================
## Company ID: 1 
## ====================================
## 
## Table 1: Summary
## 
## 
## | avg_salary| avg_KPI| total_employee| top_performer|
## |----------:|-------:|--------------:|-------------:|
## |    6523.54|   77.18|             42|            10|
## 
## Table 2: Top Performers
## 
## 
## | company_id|   salary| KPI_score| performance_score|department |kpi_tier      |
## |----------:|--------:|---------:|-----------------:|:----------|:-------------|
## |          1| 8086.918|  99.56183|          69.25868|Marketing  |Top Performer |
## |          1| 9161.726|  99.29771|          58.90069|HR         |Top Performer |
## |          1| 5847.828|  99.16751|          62.65495|Finance    |Top Performer |
## |          1| 6827.783|  98.85495|          90.04741|Marketing  |Top Performer |
## |          1| 5766.541|  98.19217|          83.95067|Marketing  |Top Performer |

9 Conclusion & Reference

This practicum demonstrates the application of advanced programming concepts in data science using R, particularly through the integration of functions, loops, and conditional logic. Each task simulates real-world analytical scenarios, enabling a deeper understanding of how structured programming supports data-driven decision-making.

The Dynamic Multi-Formula function highlights how flexible models can be built to evaluate different mathematical behaviors simultaneously. The Nested Simulation and Performance Categorization tasks illustrate how iterative processes and logical conditions can be used to simulate business operations and classify performance effectively.

Furthermore, the Monte Carlo Simulation showcases the power of probabilistic methods in estimating mathematical constants and analyzing uncertainty through random sampling. The Advanced Data Transformation and Feature Engineering task emphasizes the importance of preparing and transforming data to improve interpretability and analytical quality.

The Mini Project and Automated Report Generation tasks represent a comprehensive data science workflow, combining data generation, transformation, visualization, and reporting. These tasks demonstrate how automated systems can generate insights efficiently across multiple entities, such as companies or departments.

Overall, this practicum reinforces the importance of combining programming logic with analytical thinking. It shows that well-structured code can be used not only to process data but also to generate meaningful insights, build interactive visualizations, and automate reporting processes in real-world data science applications.

No	Author	Year	Title	Publisher
1	Wickham, H.	2016	ggplot2: Elegant Graphics for Data Analysis	Springer
2	Wickham, H. et al.	2023	dplyr: A Grammar of Data Manipulation	R Package Documentation
3	Sievert, C.	2020	Interactive Web-Based Data Visualization with R, plotly, and shiny	CRC Press
4	R Core Team	2023	R: A Language and Environment for Statistical Computing	R Foundation
5	James, G. et al.	2021	An Introduction to Statistical Learning	Springer
6	Ross, S.	2014	Introduction to Probability Models	Academic Press