Practicum Advance
Functions & Loops
Anindya Kristianingputri
1 Dynamic Multi Formula Function
A dynamic multi-formula function like compute_formula(x, formula) enables a single routine to evaluate different mathematical expressions by selecting the appropriate formula type. It supports linear (y = 5x + 10), quadratic (y = 0.3x²), cubic (y = 0.02x³), and exponential (y = 1.2ˣ) formulas. When invoked, the function checks the given formula identifier — for linear it returns 5x + 10, for quadratic it computes 0.3x², for cubic 0.02x³, and for exponential 1.2 raised to the power x. This approach keeps the code flexible, reusable, and clean; you only change the formula parameter without rewriting logic, making it ideal for simulations, data modeling, and educational tools.
1.1 Nested loops to compute multiple formulas at once
library(DT)
compute_formula <- function(x, formulas) {
results <- list()
valid_formulas <- c("linear","quadratic","cubic","exponential")
# Outer loop → formula type
for (f in formulas) {
# VALIDASI INPUT
if (!(f %in% valid_formulas)) {
warning(paste("Formula", f, "is not valid and will be skipped"))
next
}
y_values <- numeric(length(x))
# Inner loop → x values
for (i in seq_along(x)) {
if (f == "linear") {
y_values[i] <- 5*x[i] + 10
} else if (f == "quadratic") {
y_values[i] <- 0.3*(x[i]^2)
} else if (f == "cubic") {
y_values[i] <- 0.02*(x[i]^3)
} else if (f == "exponential") {
y_values[i] <- 1.2^x[i]
}
}
results[[f]] <- y_values
}
return(results)
}
# Data x
x <- 1:20
# Jalankan function
hasil <- compute_formula(x, c("linear","quadratic","cubic","exponential"))
# Combine into one large table
all_data <- data.frame()
for (name in names(hasil)) {
temp <- data.frame(
x = x,
y = hasil[[name]],
formula = name
)
all_data <- rbind(all_data, temp)
}
# Display interactive table
datatable(
all_data,
options = list(
pageLength = 10,
autoWidth = TRUE
),
caption = "Table of Calculation Results for All Formulas"
)Interpretation
The nested loop successfully generates 80 data points by computing four formulas (linear, quadratic, cubic, and exponential) for values of (x) from 1 to 20. The results show clear differences in growth patterns: the linear function increases steadily, the quadratic and cubic functions grow at an accelerating rate, and the exponential function increases the fastest as (x) becomes larger. By organizing all results into a single dataset, the program allows easy comparison between formulas and demonstrates how nested loops can efficiently handle repetitive calculations and produce structured outputs.
1.2 Validate Formula Input
Input validation has already been implemented in the program, where the function checks whether each formula type is valid by comparing it with a predefined list of allowed formulas (linear, quadratic, cubic, and exponential); if an invalid input is detected, the program generates a warning message using warning() and skips the incorrect formula with next, allowing the function to continue processing only valid inputs, as indicated by the conditional check if (!(f %in% valid_formulas)).
1.3 Plot all formulas on same graph for x = 1:20
library(ggplot2)
library(plotly)
# Plot ggplot (dark mode)
p <- ggplot(all_data, aes(x = x, y = y, color = formula)) +
geom_line(size = 1.3) +
geom_point(size = 2) +
scale_color_manual(values = c(
"linear" = "#00F5FF", # cyan neon
"quadratic" = "#39FF14", # neon green
"cubic" = "#FF6EC7", # neon pink
"exponential" = "#FFD700" # neon yellow
)) +
labs(
title = "Comparison of Mathematical Formulas",
x = "Value of x",
y = "Result (y)",
color = "Formula Type"
) +
theme_minimal(base_size = 14) +
theme(
plot.background = element_rect(fill = "#0D0D0D", color = NA),
panel.background = element_rect(fill = "#0D0D0D", color = NA),
panel.grid.major = element_line(color = "#2A2A2A"),
panel.grid.minor = element_line(color = "#1A1A1A"),
text = element_text(color = "white"),
axis.text = element_text(color = "#CCCCCC"),
legend.background = element_rect(fill = "#0D0D0D"),
legend.key = element_rect(fill = "#0D0D0D"),
plot.title = element_text(face = "bold", size = 16)
)
# Interaktif
ggplotly(p)Interpretation
The visualization compares linear, quadratic, cubic, and exponential functions over the range ( x = 1 ) to ( 20 ), highlighting distinct growth patterns. The linear function demonstrates a constant and stable rate of increase, while the quadratic and cubic functions exhibit progressively accelerating growth, with the cubic function increasing more rapidly at higher values of ( x ). In contrast, the exponential function shows the most significant and rapid growth, especially as ( x ) becomes large. Overall, the results indicate that as the complexity of the function increases, the rate of change becomes more pronounced, with exponential growth dominating and producing substantially larger outputs compared to the other functions.
2 Nested Simulation: Multi-Sales & Discounts
In this segment, we discuss the simulate_sales(n_salesperson, days) function, which generates four columns: sales_id, day, sales_amount, and discount_rate. This function contains a nested function responsible for calculating cumulative sales per salesperson. Subsequently, the function applies conditional discounts, where the discount rate is determined based on the value of sales_amount. The process is executed by iterating through each salesperson using a loop. Finally, the function also provides summary statistics and a cumulative sales plot to facilitate the analysis of sales trends over time.
2.1 Nested function to calculate cumulative sales per salesperson
A nested function is used to calculate cumulative sales per salesperson because it allows the cumulative sum logic to be encapsulated inside the main function, keeping the code organized and reusable. By applying this nested function to each salesperson individually through a loop, the program can track the running total of sales day by day for every salesperson separately, without mixing values between different salespersons. This approach is useful for identifying performance trends, monitoring progress toward sales targets, and comparing which salespersons are consistently growing their sales over time. Additionally, the nested structure makes the code easier to debug and maintain, as the cumulative calculation logic is isolated from the main data processing workflow.
2.2 Apply conditional discounts based on sales amount
This design reflects a stepwise incentive system in which higher sales values correspond to higher discount rates, maintaining a consistent and scalable progression. Such an approach demonstrates the application of conditional logic to implement a structured pricing strategy that rewards increased sales performance.
Discount Tier Structure
| Sales Amount (Units/Value) | Discount Rate | Incentive Type |
|---|---|---|
| ≤ 250 | 12% | Basic Incentive |
| 250 & ≤ 500 | 24% | Mid-Level Reward |
| 500 | 36% | High Performer Bonus |
2.3 Loop per salesperson
# Load library
library(DT)
# Function to simulate sales data (decimal values)
simulate_sales <- function(n_salesperson, days) {
set.seed(123)
data_list <- list()
counter <- 1
# Loop per salesperson
for (sp in 1:n_salesperson) {
# Loop per day
for (d in 1:days) {
# Generate decimal sales amount (100 - 800)
sales_amount <- round(runif(1, min = 100, max = 800), 2)
data_list[[counter]] <- data.frame(
sales_id = paste0("SP_", sp),
day = d,
sales_amount = sales_amount
)
counter <- counter + 1
}
}
# Combine all data
sales_data <- do.call(rbind, data_list)
return(sales_data)
}
# Generate data (15 salesperson, 7 days)
sales_data <- simulate_sales(8, 7)
# Show interactive table
datatable(sales_data)2.3.1 Apply conditional discounts
# Load library
library(DT)
# Function to simulate sales data (with discount)
simulate_sales <- function(n_salesperson, days) {
set.seed(123)
data_list <- list()
counter <- 1
# Loop per salesperson
for (sp in 1:n_salesperson) {
# Loop per day
for (d in 1:days) {
# Generate decimal sales amount (100 - 800)
sales_amount <- round(runif(1, min = 100, max = 800), 2)
# Apply discount rules
if (sales_amount <= 250) {
discount_rate <- 0.12
} else if (sales_amount <= 500) {
discount_rate <- 0.24
} else {
discount_rate <- 0.36
}
# Convert to percentage format
discount_label <- paste0(discount_rate * 100, "%")
data_list[[counter]] <- data.frame(
sales_id = paste0("SP_", sp),
day = d,
sales_amount = sales_amount,
discount_rate = discount_label
)
counter <- counter + 1
}
}
# Combine all data
sales_data <- do.call(rbind, data_list)
return(sales_data)
}
# Generate data
sales_data <- simulate_sales(8, 7)
# Show interactive table
datatable(sales_data)2.4 Summary stats and cumulative sales plot
2.4.1 Summary stats
2.4.2 Cumulative Sales Plot
2.4.2.1 Example
library(dplyr)
library(plotly)
# count cumulative sales
sales_cum <- sales_data %>%
arrange(sales_id, day) %>%
group_by(sales_id) %>%
mutate(cum_sales = cumsum(sales_amount)) %>%
ungroup()
# color
neon_colors <- c(
"#00FFFF", "#FF00FF", "#39FF14", "#FF073A",
"#FFD300", "#FF6EC7", "#00FFEF", "#B026FF"
)
# Plot interaktif
fig <- plot_ly(
data = sales_cum,
x = ~day,
y = ~cum_sales,
color = ~sales_id,
colors = neon_colors,
type = "scatter",
mode = "lines+markers",
text = ~paste(
"Sales:", sales_id,
"<br>Day:", day,
"<br>Cumulative:", round(cum_sales, 2)
),
hoverinfo = "text"
) %>%
layout(
title = list(
text = "Cumulative Sales per Salesperson",
font = list(color = "#FFFFFF")
),
paper_bgcolor = "#000000",
plot_bgcolor = "#000000",
font = list(color = "#FFFFFF"),
xaxis = list(
title = "Day",
color = "#FFFFFF",
gridcolor = "#333333"
),
yaxis = list(
title = "Cumulative Sales",
color = "#FFFFFF",
gridcolor = "#333333"
),
legend = list(font = list(color = "#FFFFFF"))
)
figInterpretation
From the visualization, salespersons whose lines consistently remain above others can be identified as top performers, as they achieve higher total sales throughout the period, while those with lower cumulative curves indicate relatively weaker performance. Additionally, fluctuations in the steepness of the lines reveal variations in daily sales contributions—sharp increases suggest high sales on particular days, whereas smoother, gradual increases indicate stable but moderate performance. Overall, this plot is useful for comparing sales performance across individuals, identifying consistent high performers, and detecting patterns such as rapid growth phases or stagnation periods. It provides a clearer picture of long-term performance trends compared to daily sales data, making it valuable for performance evaluation and decision-making.
3 Multi-Level Performance Categorization
3.1 Loop Through Vector
3.1.1 Category
| Category | Range | Description |
|---|---|---|
| Excellent | ≥ 700 | Outstanding performance, far above target |
| Very Good | 500 – 699 | Very good, above expectations |
| Good | 300 – 499 | Good, meeting standard target |
| Average | 200 – 299 | Sufficient, within reasonable limits |
| Poor | < 200 | Below target, needs serious evaluation |
3.2 Calculate Precentage per Category
library(dplyr)
library(DT)
# Function kategorisasi
categorize_performance <- function(sales_amount) {
categories <- c()
for (i in sales_amount) {
if (i >= 700) {
categories <- c(categories, "Excellent")
} else if (i >= 500) {
categories <- c(categories, "Very Good")
} else if (i >= 300) {
categories <- c(categories, "Good")
} else if (i >= 200) {
categories <- c(categories, "Average")
} else {
categories <- c(categories, "Poor")
}
}
return(categories)
}
# Terapkan ke data (WAJIB dulu)
sales_data$performance_category <- categorize_performance(sales_data$sales_amount)
# Baru ubah jadi factor biar urut
sales_data$performance_category <- factor(
sales_data$performance_category,
levels = c("Excellent", "Very Good", "Good", "Average", "Poor")
)
# Summary tabel (pakai .drop = FALSE biar semua kategori muncul)
summary_table <- sales_data %>%
count(performance_category, .drop = FALSE) %>%
mutate(percentage = round(n / sum(n) * 100, 2))
# Tampilkan tabel interaktif
datatable(
summary_table,
options = list(
pageLength = 5,
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'print')
),
rownames = FALSE,
caption = "Sales Performance Distribution"
)3.3 Bar plot and pie chart of distribution
3.3.1 Bar plot
library(ggplot2)
library(plotly)
# Base ggplot
p <- ggplot(summary_table, aes(x = performance_category, y = n, fill = performance_category)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c(
"Excellent" = "#39FF14",
"Very Good" = "#00FFFF",
"Good" = "#FFD300",
"Average" = "#FF6EC7",
"Poor" = "#FF073A"
)) +
labs(
title = "Sales Performance Distribution",
x = "Performance Category",
y = "Count"
) +
theme_minimal() +
theme(
plot.background = element_rect(fill = "black", color = NA),
panel.background = element_rect(fill = "black"),
text = element_text(color = "white"),
axis.text = element_text(color = "white"),
legend.text = element_text(color = "white"),
legend.title = element_blank()
)
# Convert to interactive plot
ggplotly(p)Interpretation
From the chart, it can be seen that the Good category has the tallest bar, meaning most of the salespeople are performing well, followed by Excellent in the second highest position, indicating quite a few salespeople have achieved outstanding performance. Meanwhile, the Very Good and Average categories have bars of medium height, suggesting that the remaining salespeople are spread across very good and average levels. The Poor category appears to be the shortest, so only a few salespeople have low performance and it is not too concerning. Overall, the sales team’s performance distribution tends to skew to the right (from Good to Excellent), which means the team has healthy and positive performance, although salespeople in the Average category need to be encouraged to move up to Good or Very Good.
3.3.2 Pie Chart
library(plotly)
# Pie chart interaktif
plot_ly(
data = summary_table,
labels = ~performance_category,
values = ~n,
type = "pie",
marker = list(colors = c(
"#39FF14", # Excellent
"#00FFFF", # Very Good
"#FFD300", # Good
"#FF6EC7", # Average
"#FF073A" # Poor
))
) %>%
layout(
title = list(
text = "Sales Performance Proportion",
font = list(color = "#FFFFFF")
),
paper_bgcolor = "#000000",
plot_bgcolor = "#000000",
font = list(color = "#FFFFFF")
)Interpretation
The pie chart illustrates the proportion of each sales performance category relative to the total dataset. Each slice represents a category, and its size reflects its percentage contribution. Larger slices indicate more dominant performance categories, while smaller slices show less frequent occurrences. This visualization helps to quickly understand the overall composition of sales performance and identify which categories contribute the most to total sales.
4 Multi Company Data Set Simulation
In this segment, we discuss the generate_company_data(n_company, n_employees) function, which generates a dataset containing the following columns: company_id, employee_id, salary, department, performance_score, and KPI_score. This function uses nested loops — an outer loop iterates over each company, and an inner loop iterates over each employee within that company. Subsequently, the function applies conditional logic to identify top performers based on the rule KPI_score > 90. After generating the data, the function calculates a summary per company including average salary, average performance score, and maximum KPI score. Finally, the function also provides a summary table and plots (such as bar charts or boxplots) to visualize the distribution and comparison across companies.
4.1 Company Data
library(DT)
set.seed(123)
generate_hotel_data <- function(n_company = 3, n_employees = 20) {
data_list <- list()
counter <- 1
departments <- c(
"Front Office", "Housekeeping", "Food & Beverage",
"Kitchen", "Maintenance", "HR", "Finance"
)
for (c in 1:n_company) {
for (e in 1:n_employees) {
salary <- round(runif(1, 3000, 10000), 0)
performance_score <- round(runif(1, 60, 100), 1)
KPI_score <- round(runif(1, 65, 100), 1)
# Conditional logic
if (KPI_score > 90) {
top_status <- "Top Performer"
} else {
top_status <- "Regular"
}
data_list[[counter]] <- data.frame(
company_id = paste0("Hotel_", c),
employee_id = paste0("H", c, "_E", sprintf("%02d", e)),
salary = salary,
department = sample(departments, 1),
performance_score = performance_score,
KPI_score = KPI_score,
top_performer = top_status # kolom baru
)
counter <- counter + 1
}
}
hotel_data <- do.call(rbind, data_list)
return(hotel_data)
}
hotel_data <- generate_hotel_data(3, 20)
# tampilkan
datatable(hotel_data)4.2 Summary per company
library(dplyr)
library(DT)
# Summary per company
company_summary <- hotel_data %>%
group_by(company_id) %>%
summarise(
avg_salary = round(mean(salary), 2),
avg_performance = round(mean(performance_score), 2),
max_KPI = max(KPI_score),
total_employee = n(),
top_performers = sum(top_performer == "Top Performer")
)
# Tampilkan dalam DT
datatable(
company_summary,
options = list(
pageLength = 5,
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'print')
),
rownames = FALSE,
caption = "Company Performance Summary"
)4.3 Visualization
library(plotly)
plot_ly(
data = company_summary,
x = ~company_id,
y = ~avg_performance,
type = "bar",
marker = list(color = c("#00FFFF", "#FF00FF", "#39FF14")),
name = ~company_id
) %>%
layout(
title = "Average Performance Score by Company",
xaxis = list(title = "Company", color = "white"),
yaxis = list(title = "Average Performance Score", color = "white"),
plot_bgcolor = "black",
paper_bgcolor = "black",
font = list(color = "white")
)Interpretation
The bar chart compares the average performance scores across the three companies. Each bar represents a company, and its height indicates the overall performance level of its employees. The company with the highest bar demonstrates the strongest average performance, suggesting better employee productivity and effectiveness. Differences in bar heights highlight variations in workforce quality among the companies, making it easier to identify which company performs best overall. This visualization provides a clear and straightforward comparison, supporting performance evaluation and strategic decision-making.
5 Monte Carlo Simulation: Pi & Probability
In this segment, we discuss the monte_carlo_pi(n_points) function, which performs a Monte Carlo simulation to estimate the value of π (pi) along with additional probability analysis. The function uses a loop that runs for a specified number of iterations (n_points). Within each iteration, random points are generated within a square bounding a unit circle. The function then counts how many points fall inside the circle and uses the ratio of points inside the circle to total points to compute an approximation of π. Additionally, the function performs probability analysis by calculating the probability of random points falling inside a specific sub-square region. Finally, the function produces a plot that visualizes the generated points, distinguishing between points that fall inside the circle and those outside, providing an intuitive understanding of the Monte Carlo method.
5.1 Loop for iteration
5.2 Implementation
# Load DT library
library(DT)
# Monte Carlo simulation to estimate pi
set.seed(123) # for reproducibility
n_points <- 2750
inside_count <- 0
for (i in 1:n_points) {
x <- runif(1, -1, 1)
y <- runif(1, -1, 1)
if (x^2 + y^2 <= 1) {
inside_count <- inside_count + 1
}
}
pi_estimate <- 4 * inside_count / n_points
# Create a summary data frame
results_df <- data.frame(
Metric = c("Total points (n_points)",
"Points inside circle",
"Estimated π",
"Actual π",
"Error"),
Value = c(n_points,
inside_count,
round(pi_estimate, 6),
pi,
abs(round(pi_estimate - pi, 6)))
)
# Display as interactive table
datatable(results_df,
options = list(dom = 't',
pageLength = 5),
rownames = FALSE,
caption = "Monte Carlo Simulation Results for π Estimation")5.3 Compute probability of random points falling in sub-square
library(DT)
# PROBABILITY OF RANDOM POINTS IN SUB-SQUARE
# Parameter
n_points <- 2750
# Sub-square boundaries (0.5 to 1)
x_min <- 0.5
x_max <- 1
y_min <- 0.5
y_max <- 1
# Initialize counter
sub_square_count <- 0
# Loop through iterations
for (i in 1:n_points) {
# Generate random point (x, y) between -1 and 1
x <- runif(1, -1, 1)
y <- runif(1, -1, 1)
# Check if point is inside the sub-square
if (x >= x_min && x <= x_max && y >= y_min && y <= y_max) {
sub_square_count <- sub_square_count + 1
}
}
# Calculate probability
probability <- sub_square_count / n_points
percentage <- probability * 100
# Create result table
result <- data.frame(
Statistic = c("Total Points", "Points Inside Sub-square", "Probability", "Percentage"),
Value = c(
format(n_points, big.mark = ","),
format(sub_square_count, big.mark = ","),
paste0(round(probability, 6)),
paste0(round(percentage, 4), "%")
)
)
# Display interactive table
datatable(
result,
options = list(
dom = 't',
paging = FALSE,
ordering = FALSE
),
rownames = FALSE,
caption = htmltools::tags$caption(
style = "caption-side: top; text-align: center; font-size: 16px; font-weight: bold;",
"Probability of Random Points in Sub-Square (0.5 to 1)"
)
)5.4 Plot points inside vs outside circle
# PLOT POINTS INSIDE VS OUTSIDE CIRCLE
library(plotly)
# Parameter
n_points <- 2750
# Initialize vectors to store coordinates
x_coords <- numeric(n_points)
y_coords <- numeric(n_points)
inside_flag <- logical(n_points)
# Loop to generate random points
for (i in 1:n_points) {
x_coords[i] <- runif(1, -1, 1)
y_coords[i] <- runif(1, -1, 1)
# Check if point is inside the circle
if (x_coords[i]^2 + y_coords[i]^2 <= 1) {
inside_flag[i] <- TRUE
} else {
inside_flag[i] <- FALSE
}
}
# Calculate pi estimation
inside_count <- sum(inside_flag)
pi_estimate <- 4 * inside_count / n_points
# Create data frame
df <- data.frame(
x = x_coords,
y = y_coords,
status = ifelse(inside_flag, "Inside Circle", "Outside Circle")
)
# Create circle boundary points
theta <- seq(0, 2 * pi, length.out = 500)
circle <- data.frame(x = cos(theta), y = sin(theta))
# Create square boundary
square <- data.frame(
x = c(-1, 1, 1, -1, -1),
y = c(-1, -1, 1, 1, -1)
)
# GLOW IN THE DARK PLOT with Plotly
plot_ly() %>%
# Points inside circle (neon red)
add_trace(
data = df[df$status == "Inside Circle", ],
x = ~x, y = ~y,
type = "scatter",
mode = "markers",
marker = list(
size = 5,
color = "#ff2a2a",
opacity = 0.8,
line = list(color = "#ff0000", width = 1)
),
name = "Inside Circle",
hoverinfo = "text",
text = ~paste("Inside Circle\nx:", round(x, 4), "\ny:", round(y, 4))
) %>%
# Points outside circle (neon blue)
add_trace(
data = df[df$status == "Outside Circle", ],
x = ~x, y = ~y,
type = "scatter",
mode = "markers",
marker = list(
size = 3,
color = "#00bfff",
opacity = 0.5,
line = list(color = "#0088ff", width = 0.5)
),
name = "Outside Circle",
hoverinfo = "text",
text = ~paste("Outside Circle\nx:", round(x, 4), "\ny:", round(y, 4))
) %>%
# Circle boundary (neon green glow)
add_trace(
data = circle,
x = ~x, y = ~y,
type = "scatter",
mode = "lines",
line = list(
color = "#39ff14",
width = 3,
dash = "solid"
),
name = "Circle Boundary (x² + y² = 1)",
hoverinfo = "none"
) %>%
# Square boundary (neon purple)
add_trace(
data = square,
x = ~x, y = ~y,
type = "scatter",
mode = "lines",
line = list(
color = "#bf00ff",
width = 2,
dash = "dash"
),
name = "Square Boundary (-1 to 1)",
hoverinfo = "none"
) %>%
# Layout with dark theme (glow in the dark)
layout(
title = list(
text = paste("MONTE CARLO PI ESTIMATION\nn =", n_points, "| Estimated Pi =", round(pi_estimate, 6), "| Actual Pi =", round(pi, 6)),
font = list(color = "#ffffff", size = 16, family = "Arial"),
x = 0.5,
xanchor = "center"
),
paper_bgcolor = "#0a0a0a",
plot_bgcolor = "#050505",
xaxis = list(
title = list(text = "X AXIS", font = list(color = "#ffffff")),
range = c(-1.15, 1.15),
gridcolor = "#1a1a1a",
zerolinecolor = "#333333",
tickfont = list(color = "#cccccc"),
showgrid = TRUE,
gridwidth = 0.5
),
yaxis = list(
title = list(text = "Y AXIS", font = list(color = "#ffffff")),
range = c(-1.15, 1.15),
gridcolor = "#1a1a1a",
zerolinecolor = "#333333",
tickfont = list(color = "#cccccc"),
showgrid = TRUE,
gridwidth = 0.5,
scaleanchor = "x",
scaleratio = 1
),
legend = list(
title = list(text = "LEGEND", font = list(color = "#ffffff")),
font = list(color = "#ffffff"),
bgcolor = "rgba(0,0,0,0.7)",
bordercolor = "#39ff14",
borderwidth = 1,
x = 0.02,
y = 0.98
),
hovermode = "closest",
annotations = list(
list(
text = paste("INSIDE:", inside_count, "| OUTSIDE:", n_points - inside_count),
x = 0.5,
y = -0.08,
xref = "paper",
yref = "paper",
showarrow = FALSE,
font = list(color = "#39ff14", size = 11, family = "monospace"),
bgcolor = "rgba(0,0,0,0.6)",
borderpad = 4
),
list(
text = "GLOW IN THE DARK MODE",
x = 0.5,
y = 1.05,
xref = "paper",
yref = "paper",
showarrow = FALSE,
font = list(color = "#bf00ff", size = 9, family = "monospace")
)
)
) %>%
config(
displayModeBar = TRUE,
modeBarButtonsToRemove = c("lasso2d", "select2d"),
displaylogo = FALSE,
toImageButtonOptions = list(format = "png", filename = "monte_carlo_glow")
)Interpretation
Throwing 2,750 darts at a square target that has a circle drawn inside it, and only about 172 darts (6.25%) hit inside the circle. Using this ratio, we can calculate that π is approximately 3.14 because the proportion of darts inside the circle equals the circle’s area (π) divided by the square’s area (4). The more darts we throw, the more accurate our pi value becomes, as the law of large numbers reduces random variation and pushes the estimate closer to the true mathematical constant. This Monte Carlo method demonstrates how random sampling can solve
6 Advanced Data Transformation & Feature Engineering
In this segment, we discuss Advanced Data Transformation & Feature Engineering, covering two key functions: normalize_columns(df) which applies loop-based normalization to scale numeric columns to a 0-1 range using (x - min)/(max - min), and z_score(df) which standardizes data to have a mean of 0 and standard deviation of 1. From these transformed values, new features like performance_category (e.g., Low, Medium, High) and salary_bracket (e.g., Entry, Mid, Senior) are created through binning and conditional logic, enabling more intuitive grouping and analysis. To assess the impact of these transformations, the section compares distributions before and after using histograms and boxplots, which reveal changes in shape, spread, outliers, and central tendency. This comparison ensures that transformations work as intended and prepares the dataset for reliable machine learning or statistical modeling.
6.1 Loop-based normalization
In this segment, loop-based normalization is implemented within the normalize_columns(df) function to scale numeric columns to a uniform range, typically 0 to 1, using the formula (x - min)/(max - min). The function uses a loop to iterate through each column of the dataframe, checks if the column is numeric, and then applies the normalization formula to every value in that column. This approach provides full control over the transformation process, allowing for conditional handling of different data types and making the logic transparent and educational. By scaling all numeric features to the same range, loop-based normalization ensures that no single variable dominates analyses or machine learning models due to larger magnitude, while also preserving the relative relationships and distribution shape of the original data.
6.2 New Features
library(DT)
set.seed(123)
generate_hotel_data <- function(n_employees = 120) {
departments <- c(
"Front Office", "Housekeeping", "Food & Beverage",
"Kitchen", "Maintenance", "HR", "Finance"
)
data_list <- vector("list", n_employees)
for (e in seq_len(n_employees)) {
# Salary
salary <- round(rlnorm(1, meanlog = 8.7, sdlog = 0.35))
# Mild outlier
if (runif(1) < 0.03) {
salary <- salary * runif(1, 1.3, 1.8)
}
# Clipping biar aman untuk Z-score
salary <- min(max(salary, 2500), 18000)
# Scores
performance_score <- round(runif(1, 55, 100), 1)
KPI_score <- round(runif(1, 60, 100), 1)
# Performance Category
perf_cat <- if (performance_score >= 90) {
"Excellent"
} else if (performance_score >= 80) {
"Very Good"
} else if (performance_score >= 70) {
"Good"
} else if (performance_score >= 60) {
"Average"
} else {
"Poor"
}
# Salary Bracket
sal_bracket <- if (salary >= 12000) {
"High"
} else if (salary >= 7000) {
"Medium"
} else {
"Low"
}
data_list[[e]] <- data.frame(
employee_id = sprintf("E%03d", e),
department = sample(departments, 1),
salary = salary,
performance_score = performance_score,
KPI_score = KPI_score,
performance_category = perf_cat,
salary_bracket = sal_bracket,
stringsAsFactors = FALSE
)
}
hotel_data <- do.call(rbind, data_list)
return(hotel_data)
}
hotel_data <- generate_hotel_data(120)
datatable(
hotel_data,
options = list(pageLength = 10, autoWidth = TRUE),
rownames = FALSE,
caption = "Hotel Employee Dataset (Realistic & Stat-Safe)"
)library(DT)
library(dplyr)
set.seed(123)
# Fungsi generate data hotel
generate_hotel_data <- function(n_employees = 120) {
departments <- c(
"Front Office", "Housekeeping", "Food & Beverage",
"Kitchen", "Maintenance", "HR", "Finance"
)
data_list <- vector("list", n_employees)
for (e in seq_len(n_employees)) {
# Realistic Salary (controlled skew)
salary <- round(rlnorm(1, meanlog = 8.7, sdlog = 0.35))
if (runif(1) < 0.03) salary <- salary * runif(1, 1.3, 1.8)
salary <- min(max(salary, 2500), 18000)
# Scores
performance_score <- round(runif(1, 55, 100), 1)
KPI_score <- round(runif(1, 60, 100), 1)
# Performance Category
perf_cat <- case_when(
performance_score >= 90 ~ "Excellent",
performance_score >= 80 ~ "Very Good",
performance_score >= 70 ~ "Good",
performance_score >= 60 ~ "Average",
TRUE ~ "Poor"
)
# Salary Bracket
sal_bracket <- case_when(
salary >= 12000 ~ "High",
salary >= 7000 ~ "Medium",
TRUE ~ "Low"
)
data_list[[e]] <- data.frame(
employee_id = sprintf("E%03d", e),
department = sample(departments, 1),
salary = salary,
performance_score = performance_score,
KPI_score = KPI_score,
performance_category = perf_cat,
salary_bracket = sal_bracket,
stringsAsFactors = FALSE
)
}
hotel_data <- do.call(rbind, data_list)
return(hotel_data)
}
# Fungsi Normalisasi (Min-Max)
normalize_columns <- function(df, cols) {
df %>%
mutate(across(all_of(cols), ~ (. - min(.)) / (max(.) - min(.)), .names = "{.col}_norm"))
}
# Fungsi Z-score
z_score <- function(df, cols) {
df %>%
mutate(across(all_of(cols), ~ as.numeric(scale(.)), .names = "{.col}_z"))
}
#
# Generate Data & Apply Transformasi
hotel_data <- generate_hotel_data(120)
# Normalisasi kolom salary & performance_score
hotel_data_norm <- normalize_columns(hotel_data, c("salary", "performance_score"))
# Z-score kolom salary & performance_score
hotel_data_final <- z_score(hotel_data_norm, c("salary", "performance_score"))
# Tampilkan Data & Summary
datatable(
hotel_data_final,
options = list(pageLength = 10, autoWidth = TRUE),
rownames = FALSE,
caption = "Hotel Employee Dataset dengan Normalisasi & Z-Score"
)summary(hotel_data_final[, c("salary_norm", "performance_score_norm", "salary_z", "performance_score_z")])## salary_norm performance_score_norm salary_z performance_score_z
## Min. :0.0000 Min. :0.0000 Min. :-1.8084 Min. :-1.7187
## 1st Qu.:0.2350 1st Qu.:0.2709 1st Qu.:-0.6789 1st Qu.:-0.7895
## Median :0.3496 Median :0.4707 Median :-0.1282 Median :-0.1043
## Mean :0.3763 Mean :0.5011 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.:0.4977 3rd Qu.:0.7619 3rd Qu.: 0.5834 3rd Qu.: 0.8946
## Max. :1.0000 Max. :1.0000 Max. : 2.9978 Max. : 1.7114
6.3 Compare Distribution Before and After
6.3.1 Performance Score
library(plotly)
fig_box_perf <- plot_ly(data = hotel_data_final) %>%
add_trace(
y = ~performance_score,
type = "box",
name = "Before",
boxpoints = "outliers",
marker = list(color = "#00FFFF"),
line = list(color = "#00FFFF")
) %>%
add_trace(
y = ~performance_score_norm,
type = "box",
name = "After Normalization",
boxpoints = "outliers",
marker = list(color = "#FF00FF"),
line = list(color = "#FF00FF")
) %>%
layout(
title = list(
text = "Performance Score: Before vs After Normalization",
font = list(color = "#FFFFFF")
),
# background gelap
paper_bgcolor = "#0A0A0A",
plot_bgcolor = "#0A0A0A",
# axis
yaxis = list(
title = "Value",
color = "#FFFFFF",
gridcolor = "#333333"
),
xaxis = list(
color = "#FFFFFF"
),
# legend
legend = list(
font = list(color = "#FFFFFF")
),
boxmode = "group"
)
fig_box_perfInterpretation
The boxplot comparison shows original performance scores ranging from 55 to 100, with median, IQR, and outliers reflecting natural variation. After min-max normalization, all values are rescaled to 0–1 while preserving the exact distribution shape, including median position and data spread. This confirms that normalization only changes the scale, not the underlying distribution or relationships within the dataset. Outliers remain visible after normalization, now expressed within the 0–1 scale, making the data ready for cross-variable comparison or machine learning applications.
6.3.2 Salary
library(plotly)
fig_salary_z <- plot_ly(hotel_data_final) %>%
add_trace(
x = ~salary,
type = "histogram",
name = "Before (Original)",
opacity = 0.7,
marker = list(color = "cyan", line = list(color = "white", width = 1)),
hovertemplate = "Salary: %{x}<br>Frequency: %{y}<extra></extra>"
) %>%
add_trace(
x = ~salary_z,
type = "histogram",
name = "After Z-Score",
opacity = 0.7,
marker = list(color = "magenta", line = list(color = "white", width = 1)),
hovertemplate = "Z-Score: %{x}<br>Frequency: %{y}<extra></extra>"
) %>%
layout(
title = list(
text = "Salary Distribution: Before vs After Z-Score Normalization",
font = list(color = "white", size = 18, family = "Arial Black")
),
barmode = "overlay",
plot_bgcolor = "black",
paper_bgcolor = "black",
xaxis = list(
title = list(text = "Value", font = list(color = "white")),
tickfont = list(color = "white"),
gridcolor = "gray30",
zerolinecolor = "gray50"
),
yaxis = list(
title = list(text = "Frequency", font = list(color = "white")),
tickfont = list(color = "white"),
gridcolor = "gray30",
zerolinecolor = "gray50"
),
legend = list(
font = list(color = "white", size = 12),
bgcolor = "rgba(0,0,0,0.6)",
bordercolor = "white",
borderwidth = 1
),
hoverlabel = list(bgcolor = "white", font = list(color = "black"))
) %>%
config(displayModeBar = TRUE)
fig_salary_zInterpretation
The original salary histogram shows a right-skewed distribution concentrated between IDR 3–7 million, with a peak at IDR 4.5 million and a few extreme values above IDR 20 million acting as clear outliers. After Z-score normalization, the same distribution shifts to center at zero with a standard deviation of one, preserving the exact right-skewed shape and the relative position of each salary value. The outliers originally above IDR 20 million now appear as Z-scores above +2.8, making them instantly recognizable as statistical outliers without any loss of information. Thus, Z-score normalization does not alter the data’s structure or hide extreme values but merely rescales the salary into standardized units, enabling direct comparison with other normalized variables like performance or tenure.
7 Mini Project:Company KPI Dashboard & Simulation
This mini project generates a synthetic dataset for 5–10 companies with 50–200 employees each. Columns include employee_id, company_id, salary, performance_score, KPI_score, and department. A loop is used to categorize employees into KPI tiers (Excellent, Good, Satisfactory, Needs Improvement).
The project summarizes each company by calculating average salary, average KPI score, and identifying top performers. Output includes tables for top performers and department analysis, plus salary distribution plots. Advanced visualizations feature grouped bar charts and scatter plots with regression lines to explore relationships between salary and KPI.
7.1 Generate Data
library(data.table)
set.seed(123)
# Parameter
n_companies <- 7
n_employees <- 100
departments <- c(
"Front Office", "Housekeeping", "Food & Beverage",
"Kitchen", "Maintenance", "HR", "Finance"
)
# List penampung
all_data <- vector("list", n_companies)
# Loop perusahaan
for (c in 1:n_companies) {
company_id <- paste0("HOTEL_", sprintf("%02d", c))
# Loop karyawan dalam bentuk data.table langsung
dt <- data.table(
employee_id = paste0(company_id, "_E", sprintf("%03d", 1:n_employees)),
company_id = company_id,
# Salary (skewed)
salary = pmin(
pmax(round(rlnorm(n_employees, meanlog = 8.7, sdlog = 0.35)), 2500),
20000
),
performance_score = round(runif(n_employees, 55, 100), 1),
KPI_score = round(runif(n_employees, 60, 100), 1),
department = sample(departments, n_employees, replace = TRUE)
)
all_data[[c]] <- dt
}
# Gabungkan semua
hotel_multi_dt <- rbindlist(all_data)
# Lihat hasil
hotel_multi_dt7.2 Summarize per company
library(data.table)
summary_company <- hotel_multi_dt[, .(
avg_salary = round(mean(salary), 2),
avg_KPI = round(mean(KPI_score), 2),
top_performers = sum(performance_score >= 90)
), by = company_id][order(company_id)]
summary_company7.3 Loop to categorize employees into KPI tiers
library(data.table)
set.seed(123)
# Parameter
n_companies <- 7
n_employees <- 100
departments <- c(
"Front Office", "Housekeeping", "Food & Beverage",
"Kitchen", "Maintenance", "HR", "Finance"
)
# List penampung
all_data <- vector("list", n_companies)
# Loop perusahaan
for (c in 1:n_companies) {
company_id <- paste0("HOTEL_", sprintf("%02d", c))
# Loop karyawan dalam bentuk data.table langsung
dt <- data.table(
employee_id = paste0(company_id, "_E", sprintf("%03d", 1:n_employees)),
company_id = company_id,
# Salary (skewed)
salary = pmin(
pmax(round(rlnorm(n_employees, meanlog = 8.7, sdlog = 0.35)), 2500),
20000
),
performance_score = round(runif(n_employees, 55, 100), 1),
KPI_score = round(runif(n_employees, 60, 100), 1),
department = sample(departments, n_employees, replace = TRUE)
)
all_data[[c]] <- dt
}
# Gabungkan semua
hotel_multi_dt <- rbindlist(all_data)
# Tambahkan kolom KPI_tier
hotel_multi_dt[, KPI_tier := fifelse(
KPI_score >= 90, "Excellent",
fifelse(
KPI_score >= 80, "Good",
fifelse(
KPI_score >= 70, "Average",
"Low"
)
)
)]
# Lihat hasil
hotel_multi_dt7.4 Visualization
7.4.1 Barchart Top Performance per Company
library(plotly)
library(dplyr)
top_perf <- hotel_multi_dt %>%
group_by(company_id) %>%
summarise(top_performers = sum(performance_score >= 90))
plot_ly(
top_perf,
x = ~company_id,
y = ~top_performers,
type = 'bar',
marker = list(
color = "#00FFFF",
line = list(color = "#FF00FF", width = 2) # glow outline
)
) %>%
layout(
title = list(
text = "Top Performers per Company",
font = list(color = "#FF00FF", size = 20)
),
# BACKGROUND HITAM
plot_bgcolor = "#000000",
paper_bgcolor = "#000000",
# FONT NEON
font = list(color = "#00FFFF"),
# GRID + AXIS GLOW
xaxis = list(
title = "Company",
gridcolor = "#222222",
zerolinecolor = "#444444",
tickfont = list(color = "#00FFFF")
),
yaxis = list(
title = "Top Performers",
gridcolor = "#222222",
zerolinecolor = "#444444",
tickfont = list(color = "#00FFFF")
)
)Interpretation
Through the bar chart, differences in the distribution of top performers across companies can be observed. Companies with taller bars indicate that they have more high-performing employees, which may reflect effective human resource management, a good performance evaluation system, or a work environment that supports productivity. Conversely, companies with fewer top performers may indicate potential areas for improvement in employee performance management, such as training, work motivation, or a suboptimal assessment system.
7.4.2 Histogram Salary Distribution
library(plotly)
p <- plot_ly(
data = hotel_multi_dt,
x = ~salary,
type = "histogram",
marker = list(color = "#39FF14")
)
p <- p %>%
layout(
title = list(text = "Salary Distribution"),
plot_bgcolor = "#000000",
paper_bgcolor = "#000000",
font = list(color = "#39FF14"),
xaxis = list(title = "Salary"),
yaxis = list(title = "Count")
)
pInterpretation
The histogram indicates that most employees are concentrated within a certain salary range, suggesting that the company has a dominant pay level for the majority of its workforce. There are fewer employees in the higher salary ranges, which implies that high-paying positions are limited. If the distribution is skewed to the right, it shows that most employees earn lower to mid-level salaries, with only a small number receiving significantly higher pay. This pattern reflects a typical organizational structure where top-level positions are fewer compared to entry- and mid-level roles.
library(plotly)
# Warna neon untuk setiap department
neon_colors <- c(
"#39FF14", # Neon green
"#FF0730", # Neon red
"#00FFFF", # Neon cyan
"#FF00FF", # Neon magenta
"#FFCC00", # Neon yellow
"#FF6600" # Neon orange
)
# Buat model regression untuk keseluruhan data
model <- lm(KPI_score ~ salary, data = hotel_multi_dt)
hotel_multi_dt$fitted_kpi <- fitted(model)
p <- plot_ly(
data = hotel_multi_dt,
x = ~salary,
y = ~KPI_score,
color = ~department,
colors = neon_colors,
type = "scatter",
mode = "markers",
marker = list(
size = 8,
opacity = 0.8
)
) %>%
add_lines(
x = ~salary,
y = ~fitted_kpi,
line = list(color = "#FF0730", width = 3),
name = "Regression Line",
inherit = FALSE
) %>%
layout(
title = list(
text = "Salary vs KPI Score by Department with Regression Line",
font = list(color = "#00FFFF", size = 18)
),
plot_bgcolor = "#000000",
paper_bgcolor = "#000000",
font = list(color = "#39FF14"),
xaxis = list(
title = "Salary",
gridcolor = "#333333",
tickfont = list(color = "#FFCC00")
),
yaxis = list(
title = "KPI Score",
gridcolor = "#333333",
tickfont = list(color = "#FFCC00")
),
legend = list(
font = list(color = "#00FFFF"),
bgcolor = "rgba(0,0,0,0.7)",
bordercolor = "#39FF14",
borderwidth = 1
)
)
pInterpretation
The histogram indicates that most employees are concentrated within a certain salary range, suggesting that the company has a dominant pay level for the majority of its workforce. There are fewer employees in the higher salary ranges, which implies that high-paying positions are limited. If the distribution is skewed to the right, it shows that most employees earn lower to mid-level salaries, with only a small number receiving significantly higher pay. This pattern reflects a typical organizational structure where top-level positions are fewer compared to entry- and mid-level roles.