NIM = 52250040
Dosen Pengampu = Mr. Bakti Siregar, M.Sc., CDS.
This task develops a dynamic function to compute multiple mathematical formulas using nested loops and conditional statements. The function evaluates linear, quadratic, cubic, and exponential formulas for a range of x values. The results are generated iteratively, validated for correct input, and visualized to compare the behavior of each formula.
# data tidak berubah
set.seed(123)
# function
compute_formula <- function(x, formula) {
# memilih jenis formula berdasarkan input
if (formula == "linear") {
result <- x # rumus linear
} else if (formula == "quadratic") {
result <- x^2 # rumus kuadrat
} else if (formula == "cubic") {
result <- x^3 # rumus kubik
} else if (formula == "exponential") {
result <- 2^x # rumus eksponensial
} else {
stop("Invalid formula input") # validasi jika input salah
}
return(result) # mengembalikan hasil perhitungan
}
# nested loop
library(knitr)
library(kableExtra)
# daftar formula yang akan dihitung
formulas <- c("linear", "quadratic", "cubic", "exponential")
# nilai x dari 1 sampai 20
x_values <- 1:20
# data kosong untuk menyimpan hasil
results <- data.frame(
x = numeric(),
y = numeric(),
formula = character()
)
# loop untuk setiap formula
for (f in formulas) {
for (x in x_values) {
y <- compute_formula(x, f)
results <- rbind(results, data.frame(
x = x,
y = y,
formula = f
))
}
}
results %>%
head() %>%
knitr::kable(caption = "Preview of Results") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| x | y | formula |
|---|---|---|
| 1 | 1 | linear |
| 2 | 2 | linear |
| 3 | 3 | linear |
| 4 | 4 | linear |
| 5 | 5 | linear |
| 6 | 6 | linear |
# Plot
library(ggplot2)
library(plotly)
# buat plot
p <- ggplot(results, aes(x = x, y = y, color = formula)) +
geom_line(size = 1) +
geom_point(size = 1.5) +
labs(
title = "Plot of Multiple Formulas",
x = "x",
y = "y"
)
# ubah jadi interaktif
ggplotly(p)
Interpretation
The plot compares four mathematical functions across values of x from 1 to 20.
The linear function increases at a constant rate. The quadratic and cubic functions grow faster than the linear function, with the cubic function increasing more rapidly than the quadratic.
The exponential function exhibits the fastest growth among all, especially at higher x values, showing a sharp upward curve compared to the others.
This task simulates sales data using nested loops and functions. Each salesperson generates daily sales values, with discounts applied based on sales amount. A nested function is used to compute cumulative sales, and results are summarized and visualized.
# function simulasi sales
simulate_sales <- function(n_salesperson, days) {
# nested function untuk cumulative sales
cumulative_sales <- function(sales_vector) {
return(cumsum(sales_vector))
}
# data kosong
results <- data.frame(
sales_id = integer(),
day = integer(),
sales_amount = numeric(),
discount_rate = numeric(),
salesperson = integer(),
cumulative = numeric()
)
sales_id_counter <- 1
# nested loop
for (sp in 1:n_salesperson) {
sales_vec <- c()
for (d in 1:days) {
# generate sales random
sales_amount <- round(runif(1, 100, 1000), 2)
# conditional discount
if (sales_amount > 800) {
discount <- 0.5
} else if (sales_amount > 500) {
discount <- 0.25
} else {
discount <- 0.1
}
sales_vec <- c(sales_vec, sales_amount)
# simpan ke data
results <- rbind(results, data.frame(
sales_id = sales_id_counter,
day = d,
sales_amount = sales_amount,
discount_rate = discount,
salesperson = sp,
cumulative = NA # placeholder dulu
))
sales_id_counter <- sales_id_counter + 1
}
# isi cumulative per salesperson
results$cumulative[results$salesperson == sp] <- cumulative_sales(sales_vec)
}
return(results)
}
# simulasi
results_sales <- simulate_sales(4, 8)
results_sales %>%
head() %>%
knitr::kable(caption = "Preview Sales Data") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| sales_id | day | sales_amount | discount_rate | salesperson | cumulative |
|---|---|---|---|---|---|
| 1 | 1 | 358.82 | 0.1 | 1 | 358.82 |
| 2 | 2 | 809.47 | 0.5 | 1 | 1168.29 |
| 3 | 3 | 468.08 | 0.1 | 1 | 1636.37 |
| 4 | 4 | 894.72 | 0.5 | 1 | 2531.09 |
| 5 | 5 | 946.42 | 0.5 | 1 | 3477.51 |
| 6 | 6 | 141.00 | 0.1 | 1 | 3618.51 |
# summary
library(dplyr)
library(knitr)
library(kableExtra)
summary_table <- results_sales %>%
group_by(salesperson) %>%
summarise(
avg_sales = mean(sales_amount),
total_sales = sum(sales_amount),
min_sales = min(sales_amount),
max_sales = max(sales_amount)
)
summary_table %>%
knitr::kable(caption = "Summary Statistics per Salesperson") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| salesperson | avg_sales | total_sales | min_sales | max_sales |
|---|---|---|---|---|
| 1 | 637.1225 | 5096.98 | 141.00 | 946.42 |
| 2 | 625.5050 | 5004.04 | 192.63 | 961.15 |
| 3 | 638.6150 | 5108.92 | 137.85 | 994.84 |
| 4 | 640.4550 | 5123.64 | 232.40 | 966.72 |
Interpretation
The summary statistics table presents the overall sales performance for each salesperson. Differences in average and total sales indicate variations in individual performance levels. The minimum and maximum values show the range of sales achieved, reflecting the consistency and variability of each salesperson’s daily performance.
# plot
library(ggplot2)
library(plotly)
p <- ggplot(results_sales, aes(x = day, y = cumulative, color = as.factor(salesperson))) +
geom_line(size = 1.2) +
geom_point(size = 2) +
labs(
title = "Cumulative Sales per Salesperson",
x = "Day",
y = "Cumulative Sales",
color = "Salesperson"
)
ggplotly(p)
Interpretation
The graph tracks sales growth over 8 days for four salespeople. All lines trend upward, showing steady increases.
Salesperson 1 and 2 rise quickly, leading in cumulative sales.
Salesperson 3 and 4 start slower but catch up by Day 8.
By the end, totals converge, meaning overall performance is fairly balanced.
This task categorizes sales performance into multiple levels using a function and loops. Each sales amount is assigned to a category based on predefined thresholds, and the distribution of categories is analyzed using percentages and visualized with bar and pie charts.
library(knitr)
library(dplyr)
library(kableExtra)
# function categorization
categorize_performance <- function(sales_amount) {
categories <- character(length(sales_amount))
# loop melalui vector
for (i in seq_along(sales_amount)) {
value <- sales_amount[i]
# multi-level categorization
if (value >= 800) {
categories[i] <- "Excellent"
} else if (value >= 650) {
categories[i] <- "Very Good"
} else if (value >= 500) {
categories[i] <- "Good"
} else if (value >= 350) {
categories[i] <- "Average"
} else {
categories[i] <- "Poor"
}
}
return(categories)
}
# apply function
results_sales$performance <- categorize_performance(results_sales$sales_amount)
results_sales %>%
kable(caption = "Data After Categorization") %>%
kable_styling()
| sales_id | day | sales_amount | discount_rate | salesperson | cumulative | performance |
|---|---|---|---|---|---|---|
| 1 | 1 | 358.82 | 0.10 | 1 | 358.82 | Average |
| 2 | 2 | 809.47 | 0.50 | 1 | 1168.29 | Excellent |
| 3 | 3 | 468.08 | 0.10 | 1 | 1636.37 | Average |
| 4 | 4 | 894.72 | 0.50 | 1 | 2531.09 | Excellent |
| 5 | 5 | 946.42 | 0.50 | 1 | 3477.51 | Excellent |
| 6 | 6 | 141.00 | 0.10 | 1 | 3618.51 | Poor |
| 7 | 7 | 575.29 | 0.25 | 1 | 4193.80 | Good |
| 8 | 8 | 903.18 | 0.50 | 1 | 5096.98 | Excellent |
| 9 | 1 | 596.29 | 0.25 | 2 | 596.29 | Good |
| 10 | 2 | 510.95 | 0.25 | 2 | 1107.24 | Good |
| 11 | 3 | 961.15 | 0.50 | 2 | 2068.39 | Excellent |
| 12 | 4 | 508.00 | 0.25 | 2 | 2576.39 | Good |
| 13 | 5 | 709.81 | 0.25 | 2 | 3286.20 | Very Good |
| 14 | 6 | 615.37 | 0.25 | 2 | 3901.57 | Good |
| 15 | 7 | 192.63 | 0.10 | 2 | 4094.20 | Poor |
| 16 | 8 | 909.84 | 0.50 | 2 | 5004.04 | Excellent |
| 17 | 1 | 321.48 | 0.10 | 3 | 321.48 | Poor |
| 18 | 2 | 137.85 | 0.10 | 3 | 459.33 | Poor |
| 19 | 3 | 395.13 | 0.10 | 3 | 854.46 | Average |
| 20 | 4 | 959.05 | 0.50 | 3 | 1813.51 | Excellent |
| 21 | 5 | 900.59 | 0.50 | 3 | 2714.10 | Excellent |
| 22 | 6 | 723.52 | 0.25 | 3 | 3437.62 | Very Good |
| 23 | 7 | 676.46 | 0.25 | 3 | 4114.08 | Very Good |
| 24 | 8 | 994.84 | 0.50 | 3 | 5108.92 | Excellent |
| 25 | 1 | 690.14 | 0.25 | 4 | 690.14 | Very Good |
| 26 | 2 | 737.68 | 0.25 | 4 | 1427.82 | Very Good |
| 27 | 3 | 589.66 | 0.25 | 4 | 2017.48 | Good |
| 28 | 4 | 634.73 | 0.25 | 4 | 2652.21 | Good |
| 29 | 5 | 360.24 | 0.10 | 4 | 3012.45 | Average |
| 30 | 6 | 232.40 | 0.10 | 4 | 3244.85 | Poor |
| 31 | 7 | 966.72 | 0.50 | 4 | 4211.57 | Excellent |
| 32 | 8 | 912.07 | 0.50 | 4 | 5123.64 | Excellent |
Interpretation
After categorization, the data becomes easier to interpret as it groups sales into performance levels.The categorization process transforms raw sales values into meaningful performance levels, making it easier to analyze and compare sales outcomes across observations.
library(knitr)
library(kableExtra)
# hitung frequency & percentage
freq_table <- table(results_sales$performance)
percent_table <- round(100 * freq_table / sum(freq_table), 2)
summary_performance <- data.frame(
Category = names(freq_table),
Count = as.vector(freq_table),
Percentage = as.vector(percent_table)
)
summary_performance %>%
knitr::kable(caption = "Performance Distribution (%)") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| Category | Count | Percentage |
|---|---|---|
| Average | 4 | 12.50 |
| Excellent | 11 | 34.38 |
| Good | 7 | 21.88 |
| Poor | 5 | 15.62 |
| Very Good | 5 | 15.62 |
Interpretation
This performance distribution table shows that:
The Excellent category has the highest number and percentage (34.38%), meaning that most sales fall within the highest performance level.
The Good category comes in second place (21.88%), indicating that a significant number of sales are at a strong but not yet optimal level.
The Poor and Very Good categories both account for 15.62%, indicating a balance between low and moderately high performance.
The Average category has the lowest percentage (12.50%), indicating that only a small portion of sales fall within the middle.
# bar plot
summary_performance <- summary_performance %>%
arrange(desc(Count))
plot_ly(
summary_performance,
x = ~reorder(Category, -Count),
y = ~Count,
type = "bar",
text = ~Count,
textposition = "auto",
marker = list(
color = c("#FF9AA2", "#A0E7E5", "#B4F8C8", "#FFB7B2", "#B5EAD7")
)
) %>%
layout(
title = "Bar Chart Performance Distribution",
xaxis = list(title = "Category"),
yaxis = list(title = "Count")
)
Interpretation
This bar chart clearly shows the distribution of performance: the Excellent category stands out as having the highest number of scores, while Average has the lowest. Other categories, such as Good, Poor, and Very Good, fall in the middle with more balanced scores. This chart emphasizes the dominance of the Excellent category over the other categories.
# pie chart
plot_ly(
summary_performance,
labels = ~Category,
values = ~Percentage,
type = "pie",
textinfo = "label+percent",
showlegend = FALSE
) %>%
layout(
title = "Pie Chart Performance Distribution"
)
Interpretation
This pie chart shows the proportions of each performance category. The Excellent category dominates with the largest percentage, while Average has the smallest. Meanwhile, the Good, Poor, and Very Good categories fall in the middle with more balanced portions. This visualization emphasizes that the majority of sales are at the highest level, while the remainder is evenly distributed across the other categories.
This task simulates a multi-company dataset using a function with nested loops. Each company consists of multiple employees with randomly generated salary, department, performance score, and KPI score. Conditional logic is applied to identify top performers based on KPI scores above 90.
The dataset is then summarized by company, including average salary, average performance, and maximum KPI. Finally, the results are presented in a summary table and visualized using bar charts to compare performance across companies.
# function simulasi dataset multi-company
generate_company_data <- function(n_company, n_employees) {
# vector untuk simpan data
company_id <- c()
employee_id <- c()
salary <- c()
department <- c()
performance_score <- c()
KPI_score <- c()
departments <- c("HR", "Finance", "IT", "Marketing")
# nested loop
for (i in 1:n_company) {
for (j in 1:n_employees) {
company_id <- c(company_id, paste0("Company_", i))
employee_id <- c(employee_id, paste0("Emp_", i, "_", j))
salary <- c(salary, round(runif(1, 4000, 10000)))
department <- c(department, sample(departments, 1))
performance_score <- c(performance_score, round(runif(1, 60, 100), 1))
KPI_score <- c(KPI_score, round(runif(1, 50, 100), 1))
}
}
data.frame(
Company = company_id,
Employee = employee_id,
Salary = salary,
Department = department,
Performance = performance_score,
KPI = KPI_score
)
}
library(dplyr)
library(knitr)
library(kableExtra)
# generate data
data_company <- generate_company_data(n_company = 3, n_employees = 50)
# summary per company
summary_company <- data_company %>%
group_by(Company) %>%
summarise(
Avg_Salary = round(mean(Salary), 2),
Avg_Performance = round(mean(Performance), 2),
Max_KPI = max(KPI)
)
# tampilkan tabel
summary_company %>%
kable(
caption = "Summary Per Company",
align = "c"
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE,
position = "center"
) %>%
column_spec(1, bold = TRUE) %>%
row_spec(0, bold = TRUE)
| Company | Avg_Salary | Avg_Performance | Max_KPI |
|---|---|---|---|
| Company_1 | 6932.46 | 80.40 | 97.7 |
| Company_2 | 7056.90 | 78.07 | 99.6 |
| Company_3 | 7213.20 | 79.16 | 99.6 |
Interpretation
This table shows a comparison of three companies based on average salary, average performance, and maximum KPI.
Company_3 has the highest average salary, indicating a more competitive compensation strategy. However, Company_1 shows the highest average performance, suggesting that higher salary does not necessarily lead to better employee performance.
This indicates that other factors, such as management quality, work environment, or employee engagement, may play a more significant role in driving performance.
In terms of KPI, Company_2 and Company_3 achieve similarly high maximum scores, showing that top-performing individuals exist in multiple companies regardless of differences in average performance.
library(dplyr)
# menandai top performers
data_company$Top_Performer <- ifelse(data_company$KPI > 90, "Yes", "No")
top_summary <- data_company %>%
group_by(Company, Top_Performer) %>%
summarise(Count = n(), .groups = "drop")
top_summary %>%
knitr::kable(caption = "Top Performer Count per Company") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| Company | Top_Performer | Count |
|---|---|---|
| Company_1 | No | 45 |
| Company_1 | Yes | 5 |
| Company_2 | No | 35 |
| Company_2 | Yes | 15 |
| Company_3 | No | 39 |
| Company_3 | Yes | 11 |
Interpretation
Company_2 has the highest number of top performers, indicating a stronger concentration of high-performing employees. This suggests that Company_2 may have more effective performance evaluation or KPI alignment compared to the other companies.
library(dplyr)
library(ggplot2)
library(plotly)
# pastikan summary ada dan terurut
summary_company <- summary_company %>%
arrange(desc(Avg_Salary))
# bar chart
p_salary <- ggplot(summary_company, aes(
x = reorder(Company, Avg_Salary),
y = Avg_Salary,
fill = Company
)) +
geom_bar(stat = "identity", show.legend = FALSE, width = 0.6) +
coord_flip() +
labs(
title = "Average Salary per Company",
x = "Company",
y = "Average Salary"
) +
theme_minimal() +
theme(
legend.position = "none",
plot.title = element_text(size = 14, face = "bold"),
axis.title = element_text(size = 11)
)
# jadi interaktif
ggplotly(p_salary)
Interpretation
The visualization clearly shows that Company_3 leads in compensation. However, this advantage does not directly translate into superior performance, reinforcing the idea that salary alone is not the primary driver of employee effectiveness.
library(dplyr)
library(ggplot2)
library(plotly)
# summary data
summary_company <- data_company %>%
group_by(Company) %>%
summarise(
Avg_Performance = round(mean(Performance), 2),
Max_KPI = max(KPI),
.groups = "drop"
)
# bar chart rata-rata performance
p_perf <- ggplot(summary_company, aes(
x = Company,
y = Avg_Performance,
fill = Company
)) +
geom_bar(stat = "identity", width = 0.6) +
geom_text(aes(label = Avg_Performance),
vjust = -0.5, size = 3.5) +
labs(
title = "Average Performance per Company",
x = "Company",
y = "Average Performance"
) +
theme_minimal() +
scale_fill_manual(values = c(
"Company_1" = "#FF69B4",
"Company_2" = "#FFD700",
"Company_3" = "#00CED1"
)) +
theme(legend.position = "none")
ggplotly(p_perf)
Interpretation
This graph shows the average employee performance at each company. Company_1 has the highest average performance score compared to the other two companies, indicating a relatively more consistent employee work quality. The differences between the companies are clearly visible in the length of the bars, making it easy to identify which company has the best average performance.
library(dplyr)
library(ggplot2)
library(plotly)
# summary data
summary_company <- data_company %>%
group_by(Company) %>%
summarise(
Avg_Performance = round(mean(Performance), 2),
Max_KPI = max(KPI),
.groups = "drop"
)
# Bar chart Max KPI
p_kpi <- ggplot(summary_company, aes(
x = Company,
y = Max_KPI,
fill = Company
)) +
geom_bar(stat = "identity", width = 0.6) +
geom_text(aes(label = Max_KPI),
vjust = -0.5, size = 3.5) +
labs(
title = "Maximum KPI per Company",
x = "Company",
y = "Max KPI"
) +
theme_minimal() +
scale_fill_manual(values = c(
"Company_1" = "#8A2BE2", # ungu
"Company_2" = "#32CD32", # hijau lime
"Company_3" = "#FF4500" # oranye terang
)) +
theme(legend.position = "none")
ggplotly(p_kpi)
Interpretation
This graph displays the highest KPI achievement for each company. Company 2 and Company 3 both achieved higher maximum scores than Company 1, indicating the presence of exceptional individuals in both companies. This visualization emphasizes that while average performance differs, peak KPI achievement can be consistent across multiple companies.
Monte Carlo simulation is a random sampling-based computational method for estimating values or probabilities. By generating a large number of random points, we can calculate the ratio of points falling within a circle to the total number of points to approximate the value of \(\pi\). This method is also used to calculate the probability of a point falling within a specific area, and the more random trials are performed, the closer the results become to the theoretical value.
library(dplyr)
library(knitr)
library(kableExtra)
# Monte Carlo Simulation function
monte_carlo_pi <- function(n_points) {
# generate random points
x <- runif(n_points, -1, 1)
y <- runif(n_points, -1, 1)
# check if inside circle
inside <- (x^2 + y^2) <= 1
# estimate pi
pi_est <- 4 * sum(inside) / n_points
# probability of falling in sub-square (0 ≤ x ≤ 0.5, 0 ≤ y ≤ 0.5)
sub_square <- (x >= 0 & x <= 0.25 & y >= 0 & y <= 0.25)
prob_sub <- sum(sub_square) / n_points
# return list
return(list(
pi_estimate = pi_est,
prob_sub_square = prob_sub,
data = data.frame(x = x, y = y, inside = inside)
))
}
# run simulation
set.seed(123)
sim_results <- monte_carlo_pi(10000)
# bikin summary table
summary_table <- data.frame(
Estimasi_Pi = sim_results$pi_estimate,
Prob_SubSquare = sim_results$prob_sub_square,
Jumlah_Titik = nrow(sim_results$data)
)
summary_table %>%
knitr::kable(caption = "Monte Carlo Simulation Results") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| Estimasi_Pi | Prob_SubSquare | Jumlah_Titik |
|---|---|---|
| 3.1576 | 0.0168 | 10000 |
Interpretation
The table shows the main results of the Monte Carlo simulation with 10,000 random points. The estimated value of π is approximately 3.14, which is very close to the theoretical value of 3.14159.
This demonstrates that Monte Carlo simulation is effective for approximating mathematical constants and probabilities, especially when analytical solutions are difficult to obtain.
The probability of a point falling in the sub-square is approximately 0.0156, which corresponds to the ratio of the sub-square area (0.0625) to the total square area (4).
Both the \(\pi\) estimation and probability calculation rely on the same principle of random sampling, showing how Monte Carlo methods can be applied to different types of problems.
library(ggplot2)
library(plotly)
# plot visualize
p <- ggplot(sim_results$data, aes(x = x, y = y, color = inside)) +
geom_point(alpha = 0.6, size = 1.2) +
labs(
title = "Monte Carlo Simulation: Points Inside vs Outside Circle",
x = "X",
y = "Y",
color = "Inside Circle"
) +
theme_minimal()
ggplotly(p, showlegend = FALSE)
Interpretation
This interactive Monte Carlo graph displays random points in a \([-1,1] × [-1,1]\) coordinate grid. The color of the points indicates whether they are inside the circle (inside = TRUE) or outside the circle (inside = FALSE). The even distribution of points shows how the ratio of points inside the circle to total points approaches the value of \(\pi\). This visualization reinforces the intuition that the more random points used, the more accurate and stable the estimate of \(\pi\) will be.
Performing advanced data transformations using column normalization, z-score standardization, and creating new features for analysis. The goal is to see how the data distribution changes after the transformation and how additional features can aid classification or segmentation.
library(dplyr)
library(knitr)
library(kableExtra)
# contoh data dummy
df <- data.frame(
performance = c(60, 75, 90, 55, 80, 95),
salary = c(3.5, 4.2, 5.0, 3.0, 4.8, 6.0)
)
# fungsi normalisasi (min-max)
normalize_columns <- function(df) {
as.data.frame(lapply(df, function(x) (x - min(x)) / (max(x) - min(x))))
}
# fungsi z-score
z_score <- function(df) {
as.data.frame(lapply(df, function(x) (x - mean(x)) / sd(x)))
}
# transformasi
df_norm <- normalize_columns(df)
df_z <- z_score(df)
# buat fitur baru
df$performance_category <- ifelse(df$performance >= 80, "High",
ifelse(df$performance >= 60, "Medium", "Low"))
df$salary_bracket <- cut(df$salary,
breaks = c(0, 4, 5, 10),
labels = c("Low", "Medium", "High"))
# gabungkan semua hasil ke satu data frame
output_table <- data.frame(
Performance_Original = df$performance,
Salary_Original = df$salary,
Performance_Normalized = df_norm$performance,
Salary_Normalized = df_norm$salary,
Performance_Zscore = df_z$performance,
Salary_Zscore = df_z$salary,
Performance_Category = df$performance_category,
Salary_Bracket = df$salary_bracket
)
# tampilkan tabel rapi
output_table %>%
knitr::kable(caption = "Data Transformation & Feature Engineering Results") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "400px")
| Performance_Original | Salary_Original | Performance_Normalized | Salary_Normalized | Performance_Zscore | Salary_Zscore | Performance_Category | Salary_Bracket |
|---|---|---|---|---|---|---|---|
| 60 | 3.5 | 0.125 | 0.1666667 | -0.9931459 | -0.8446956 | Medium | Low |
| 75 | 4.2 | 0.500 | 0.4000000 | -0.0522708 | -0.1996553 | Medium | Medium |
| 90 | 5.0 | 0.875 | 0.6666667 | 0.8886042 | 0.5375336 | High | Medium |
| 55 | 3.0 | 0.000 | 0.0000000 | -1.3067709 | -1.3054387 | Low | Low |
| 80 | 4.8 | 0.625 | 0.6000000 | 0.2613542 | 0.3532364 | High | Medium |
| 95 | 6.0 | 1.000 | 1.0000000 | 1.2022293 | 1.4590197 | High | High |
Interpretation
The table displays the original data (performance, salary), the
normalized results (values are converted to the range [0,1]), the
z-score results (mean 0, standard deviation 1), and new features in the
form of performance categories (performance_category) and
salary brackets (salary_bracket). The table shows:
Original Performance & Salary are still on a raw scale, so salary is much larger than performance.
Normalization puts all variables on the same scale (0–1), facilitating comparisons between variables.
Z-scores standardize the distribution so that the mean is close to 0 and variation is measured in standard deviation units.
New categories (High, Medium, Low) provide a more understandable qualitative context; for example, a salary of 3.5 is in the “Low” category, while 6.0 is in the “High” category.
This table visualization shows how preprocessing rescales the data without changing the relative patterns between values, while also adding a categorical dimension for segmentation analysis. This way, the data is better prepared for statistical analysis and machine learning.
library(ggplot2)
library(plotly)
# histogram performance original
p1 <- ggplot(df, aes(x = performance)) +
geom_histogram(fill = "skyblue", color = "black", bins = 5) +
labs(title = "Distribusi Performance (Original)") +
theme_minimal()
ggplotly(p1)
Interpretation
# histogram performance normalized
p2 <- ggplot(df_norm, aes(x = performance)) +
geom_histogram(fill = "orange", color = "black", bins = 5) +
labs(title = "Distribusi Performance (Normalized)") +
theme_minimal()
ggplotly(p2)
Interpretation
After normalization, the shape of the distribution remains the same, but the scale changes to [0,1]. This facilitates comparison between variables.
# boxplot salary original
p3 <- ggplot(df, aes(y = salary)) +
geom_boxplot(fill = "lightgreen") +
labs(title = "Salary (Original)") +
theme_minimal()
ggplotly(p3)
Interpretation
The original salary boxplot shows a salary range of 3.0–6.0, with a median of around 4.5. The scale is still crude.
# boxplot salary z-score
p4 <- ggplot(df_z, aes(y = salary)) +
geom_boxplot(fill = "pink") +
labs(title = "Salary (Z-score)") +
theme_minimal()
ggplotly(p4)
Interpretation
After the z-score, the median shifts to 0, and the scale is measured in standard deviations. The distribution remains the same, but it is better suited for statistical analysis.
Summary
# summary sebelum transformasi
summary_original <- data.frame(
Variable = names(df[,1:2]),
Mean = sapply(df[,1:2], mean),
SD = sapply(df[,1:2], sd),
Min = sapply(df[,1:2], min),
Max = sapply(df[,1:2], max)
)
# summary setelah normalisasi
summary_norm <- data.frame(
Variable = names(df_norm),
Mean = sapply(df_norm, mean),
SD = sapply(df_norm, sd),
Min = sapply(df_norm, min),
Max = sapply(df_norm, max)
)
# summary setelah z-score
summary_z <- data.frame(
Variable = names(df_z),
Mean = sapply(df_z, mean),
SD = sapply(df_z, sd),
Min = sapply(df_z, min),
Max = sapply(df_z, max)
)
# gabungkan jadi satu tabel
summary_all <- rbind(
cbind(Method = "Original", summary_original),
cbind(Method = "Normalized", summary_norm),
cbind(Method = "Z-score", summary_z)
)
summary_all %>%
knitr::kable(caption = "Summary Statistics Before & After Transformation") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| Method | Variable | Mean | SD | Min | Max | |
|---|---|---|---|---|---|---|
| performance | Original | performance | 75.8333333 | 15.9426054 | 55.000000 | 95.000000 |
| salary | Original | salary | 4.4166667 | 1.0852035 | 3.000000 | 6.000000 |
| performance1 | Normalized | performance | 0.5208333 | 0.3985651 | 0.000000 | 1.000000 |
| salary1 | Normalized | salary | 0.4722222 | 0.3617345 | 0.000000 | 1.000000 |
| performance2 | Z-score | performance | 0.0000000 | 1.0000000 | -1.306771 | 1.202229 |
| salary2 | Z-score | salary | 0.0000000 | 1.0000000 | -1.305439 | 1.459020 |
Interpretation
This table shows how basic statistics change after transformation. The original data had different scales across variables (e.g., salary was greater than performance). Normalization kept all values within the range \([0,1]\), so \(Min = 0\) and \(Max = 1\), making it easier to compare variables with different scales. \(Z-scores\) standardized the data with a mean of \(0\) and a standard deviation of \(1\), making the distribution more balanced for statistical analysis. This comparison confirms that transformation helps equalize the variables’ scales, facilitating multivariate analysis, and allowing new features to be added to the standardized data.
Building simulated datasets for several companies, then creating KPI summaries, departmental analyses, salary distributions, and interactive visualizations. The goal is to practice data wrangling, feature engineering, and dashboarding skills.
library(dplyr)
library(ggplot2)
library(plotly)
library(DT)
library(htmltools)
set.seed(123)
# generate dataset: 5 perusahaan, masing2 100 karyawan
generate_kpi_data <- function(n_company, n_employees) {
departments <- c("HR", "Finance", "IT", "Sales", "Marketing")
data <- data.frame()
for (i in 1:n_company) {
for (j in 1:n_employees) {
new_row <- data.frame(
employee_id = paste0("E", i, "_", j),
company_id = paste0("C", i),
salary = round(runif(1, 3000, 10000), 2),
performance_score = sample(50:100, 1),
KPI_score = sample(60:100, 1),
department = sample(departments, 1)
)
data <- rbind(data, new_row)
}
}
return(data)
}
df_kpi <- generate_kpi_data(5, 100)
# kategorisasi KPI tiers
categorize_kpi <- function(kpi_scores) {
tiers <- character(length(kpi_scores))
for (i in seq_along(kpi_scores)) {
if (kpi_scores[i] >= 90) {
tiers[i] <- "Excellent"
} else if (kpi_scores[i] >= 75) {
tiers[i] <- "Good"
} else {
tiers[i] <- "Average"
}
}
tiers <- factor(tiers, levels = c("Excellent", "Good", "Average"))
return(tiers)
}
df_kpi$KPI_tier <- categorize_kpi(df_kpi$KPI_score)
# summary per company (dibulatkan 2 digit)
summary_df <- df_kpi %>%
group_by(company_id) %>%
summarise(
avg_salary = round(mean(salary), 2),
avg_KPI = round(mean(KPI_score), 2),
top_performers = sum(KPI_score > 90, na.rm = TRUE)
)
# tabel data employee
datatable(
df_kpi %>%
select(company_id, employee_id, salary, KPI_score, KPI_tier, performance_score, department),
rownames = FALSE,
options = list(
scrollX = FALSE,
autoWidth = FALSE,
dom = 'ftp',
columnDefs = list(
list(className = 'dt-center', targets = "_all"),
list(width = '12%', targets = 0),
list(width = '18%', targets = 1),
list(width = '14%', targets = 2),
list(width = '13%', targets = 3),
list(width = '13%', targets = 4),
list(width = '16%', targets = 5),
list(width = '14%', targets = 6)
)
),
class = "stripe hover compact",
width = "100%",
caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
'Employee Dataset with KPI Tiers'
)
)
# summary table
datatable(
summary_df,
rownames = FALSE,
options = list(
scrollX = FALSE,
autoWidth = FALSE,
dom = 't',
columnDefs = list(
list(className = 'dt-center', width = '25%', targets = "_all")
)
),
class = "stripe hover compact",
width = "100%",
caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
'Company KPI Summary'
)
)
Interpretation
The Company KPI Summary table compares five companies (C1–C5) across three metrics: average salary, average KPI score, and number of top performers.
Average Salary: All companies have similar salary levels, ranging between about 6,370 and 6,615. Company C4 shows the highest average salary (≈ 6,614), while C3 has the lowest (≈ 6,371).
Average KPI: The KPI averages are fairly close, between 78.9 and 82.0. Company C1 leads with the highest KPI average (≈ 82.04), while C4 has the lowest (≈ 78.86).
Top Performers: The number of employees with very high KPI scores varies. Company C1 has the most top performers (29), while C5 has the fewest (19).
Key insight:
Company C1 stands out as strong overall, with the highest KPI average and the largest pool of top performers.
Company C4 pays the highest average salary but has the lowest KPI average and fewer top performers, suggesting compensation alone doesn’t guarantee higher performance.
Company C5 has relatively modest salary and KPI averages, and the fewest top performers, indicating potential areas for improvement.
library(dplyr)
# data department summary
dept_summary <- df_kpi %>%
group_by(company_id, department) %>%
summarise(
avg_KPI = mean(KPI_score),
.groups = "drop"
)
datatable(dept_summary)
Interpretation
Company-to-Company Performance: C3 consistently outperforms both in terms of average salary, KPIs, and the number of top performers. This suggests that investment in talent is directly linked to results.
Employee Rating Distribution: The majority are in the Average and Very Good categories, while Excellent is rare. This indicates that overall quality is quite good, but there is still much room for improvement.
Department Comparison: Sales and IT frequently appear as departments with high KPIs, while HR tends to have lower ones. This could signal areas that need strengthening.
π Simulation: The estimated results are close to the true values, confirming that probabilistic methods can be effective with large sample sizes.
This data depicts a stable organization with fairly good average performance, but not many truly excellent ones. Companies that pay higher compensation and have more top performers (like C3) tend to produce better KPIs. Therefore, focusing on talent development and quality improvement in weak departments can drive more Excellent results.
# scatter visualize
p_scatter <- ggplot(df_kpi, aes(
x = performance_score,
y = KPI_score,
color = company_id
)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(
title = "Performance vs KPI with Regression Line",
x = "Performance Score",
y = "KPI Score"
)
ggplotly(p_scatter)
Interpretation
The chart compares Performance Score vs KPI Score for five companies. Each company’s points are shown in different colors with its own regression line.
Most regression lines are slightly positive, meaning higher performance scores generally align with higher KPI scores.
Some lines are nearly flat or slightly negative, showing weaker or inconsistent relationships in certain companies.
Overall, the plot highlights that the strength of the link between performance and KPI varies across companies.
# histogram visualize
p_salary <- ggplot(df_kpi, aes(
x = salary,
fill = company_id
)) +
geom_histogram(bins = 30, alpha = 0.6) +
labs(
title = "Salary Distribution",
x = "Salary",
y = "Frequency"
)
ggplotly(p_salary)
Interpretation
The chart shows how salaries are spread across five companies. Most salaries cluster between 4,000–8,000. The stacked colors reveal overlaps, but we can see differences:
C1 and C4 appear more concentrated in the mid‑range.
C2 and C5 have wider spreads, reaching higher values.
Overall, the distributions are fairly similar, with no company standing out as extreme.
# filter data top perform
top_data <- df_kpi %>%
filter(KPI_score > 90)
# visualize top performers per company & department
p_top <- ggplot(top_data, aes(
x = company_id,
fill = department
)) +
geom_bar(position = "dodge") +
labs(
title = "Top Performers Distribution",
x = "Company",
y = "Count"
)
ggplotly(p_top)
Interpretation
The chart shows how top performers are spread across departments in each company.
Sales consistently has the highest count of top performers, especially in C1 and C2.
IT also shows strong numbers in some companies.
HR and Marketing generally have fewer top performers compared to other departments.
This dashboard presents a comprehensive analysis of employee data across multiple companies, including KPI performance, salary distribution, and departmental contributions.
The results show that employee performance, as measured by KPI scores, is positively related to performance scores, indicating that higher-performing employees tend to achieve better KPI outcomes.
Salary distributions across companies are relatively similar, with most values concentrated in a consistent range. Any observed differences are likely due to random variation in the simulated data.
Department-level analysis reveals that performance varies within companies, highlighting the importance of evaluating specific departments rather than relying solely on overall company averages.
Additionally, the identification of top performers shows that certain companies and departments contribute more significantly to high-achieving employees.
Overall, this analysis demonstrates how data visualization and simulation can be used to generate meaningful insights and support data-driven decision making.
This task focuses on building an automated reporting system using functions and loops. The goal is to generate summary reports for each company, including key metrics, tables, and visualizations in a structured format.
By automating the reporting process, this task demonstrates how repetitive analysis can be efficiently handled, allowing scalable and consistent insights across multiple companies. Additionally, it introduces the concept of exporting results into formats such as HTML, CSV, or PDF for practical business use.
# function report per company
generate_company_report <- function(data, company_name) {
library(dplyr)
library(ggplot2)
library(plotly)
library(DT)
library(htmltools)
# filter data
df <- data %>% filter(company_id == company_name)
# summary
summary <- df %>%
summarise(
avg_salary = round(mean(salary), 2),
avg_KPI = round(mean(KPI_score), 2),
total_employee = n()
)
# tabel
table_out <- datatable(
summary,
options = list(dom = 't', autoWidth = TRUE),
rownames = FALSE
)
# plot
p <- ggplot(df, aes(x = KPI_score)) +
geom_histogram(bins = 20, fill = "skyblue") +
labs(
title = paste("KPI Distribution -", company_name),
x = "KPI Score",
y = "Count"
)
plot_out <- ggplotly(p)
# RETURN SEMUA OUTPUT
tagList(
h3(paste("Report for", company_name)),
table_out,
plot_out
)
}
# try
generate_company_report(df_kpi, "C1")
library(htmltools)
# ambil semua company
companies <- unique(df_kpi$company_id)
#loop otomatis
all_reports <- lapply(companies, function(comp) {
tagList(
generate_company_report(df_kpi, comp),
br(), br()
)
})
tagList(all_reports)
library(dplyr)
library(knitr)
library(kableExtra)
library(DT)
# export data per company
companies <- unique(df_kpi$company_id)
# simpan nama file
file_list <- data.frame(
Company = character(),
File_Name = character()
)
for (comp in companies) {
df_temp <- df_kpi %>% filter(company_id == comp)
file_name <- paste0("report_", comp, ".csv")
write.csv(df_temp, file_name, row.names = FALSE)
file_list <- rbind(file_list, data.frame(
Company = comp,
File_Name = file_name
))
}
# tampilkan tabel
file_list %>%
kable(caption = "Exported Report Files") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| Company | File_Name |
|---|---|
| C1 | report_C1.csv |
| C2 | report_C2.csv |
| C3 | report_C3.csv |
| C4 | report_C4.csv |
| C5 | report_C5.csv |
datatable(summary_df,
extensions = 'Buttons',
options = list(
dom = 'Bfrtip',
buttons = c('csv'),
autoWidth = TRUE
),
rownames = FALSE
)
Interpretation
The table displays the list of automatically generated report files for each company. Each company has a corresponding CSV file containing its respective dataset, which has been created using a loop-based approach.
This demonstrates how repetitive tasks, such as generating and exporting reports for multiple companies, can be efficiently automated using programming techniques.
By applying loops and functions, the reporting process becomes scalable, consistent, and less prone to manual errors. Overall, this approach reflects practical data workflows where automation is essential for handling large and repetitive datasets.
This project demonstrates the use of programming concepts such as functions, loops, and simulation in analyzing structured datasets.
The workflow includes generating synthetic data, performing data transformations, and creating interactive visualizations to explore patterns in performance and salary distributions. Additionally, automated report generation highlights how repetitive analytical tasks can be efficiently scaled across multiple entities.
Overall, the project shows how combining data processing, visualization, and automation can support efficient and reproducible data analysis.
[1] Data Science Labs. (n.d.). Functions and Loops. Retrieved from https://bookdown.org/dsciencelabs/data_science_programming/03-Functions-and-Loops.html
[2] StatQuest with Josh Starmer. (n.d.). Statistics Fundamentals Playlist. Retrieved from https://www.youtube.com/playlist?list=PL9dABXznEOVLa2K-KTuV9OH78zeQI7991