Functions & Loops + Data Science
Assignment ~ Week 5
Adinda Maiza Ishfahani
Data Science Undergraduate at ITSB
NIM : 52250074
1 Dynamic Multi-Formula Function
In this task, a dynamic function was developed that is capable of handling various types of mathematical equations, specifically linear, quadratic, cubic, and exponential equations.
library(ggplot2)
library(tidyr)
library(plotly)
compute_formula <- function(x, formulas){
results <- data.frame(x=x)
for(f in formulas){
if(f=="linear"){
results[[f]] <- 2*x + 3
} else if(f=="quadratic"){
results[[f]] <- x^2 + 2*x + 1
} else if(f=="cubic"){
results[[f]] <- x^3
} else if(f=="exponential"){
results[[f]] <- exp(x)
} else {
stop("Invalid formula")
}
}
return(results)
}
x <- 1:20
df <- compute_formula(x, c("linear","quadratic","cubic"))
df_long <- tidyr::pivot_longer(df, -x)
p <- ggplot(df_long, aes(x, value, color=name)) +
geom_line() +
ggtitle("Multi Formula Plot")
# Ubah jadi interaktif
ggplotly(p)1.1 Interpretasi
The resulting visualization illustrates differences in growth characteristics among functions. Linear functions show constant growth, while polynomial and exponential functions demonstrate significantly accelerated growth. This is relevant in various applications, such as modeling economic growth, population dynamics, and trend analysis in data.
2 Nested Simulation: Multi-Sales & Discounts
sales data is simulated using a nested loop approach, representing the interaction between multiple salespeople and specific time periods.
library(ggplot2)
library(dplyr)
library(plotly)
simulate_sales <- function(n_salesperson, days){
data <- data.frame()
for(i in 1:n_salesperson){
for(d in 1:days){
sales <- runif(1, 100, 1000)
discount <- ifelse(sales > 700, 0.2, 0.1)
data <- rbind(data, data.frame(
salesperson=i, day=d,
sales=sales, discount=discount
))
}
}
data <- data %>%
group_by(salesperson) %>%
mutate(cumulative = cumsum(sales))
return(data)
}
sales_data <- simulate_sales(3,10)
p <- ggplot(sales_data, aes(day, cumulative, color=factor(salesperson))) +
geom_line() +
ggtitle("Cumulative Sales")
# Ubah jadi interaktif
ggplotly(p)2.1 Interpretasi
The simulation results show that high-performing salespeople exhibit a steeper increase in their cumulative curves. Furthermore, the implementation of conditional discounts reflects a business strategy that is adaptive to sales volume.
3 Multi-Level Performance Categorization
library(plotly)
categorize_performance <- function(sales){
categories <- c()
for(s in sales){
if(s > 800) categories <- c(categories,"Excellent")
else if(s > 600) categories <- c(categories,"Very Good")
else if(s > 400) categories <- c(categories,"Good")
else if(s > 200) categories <- c(categories,"Average")
else categories <- c(categories,"Poor")
}
return(categories)
}
cats <- categorize_performance(sales_data$sales)
perf_table <- as.data.frame(table(cats))
# Plot interaktif
plot_ly(perf_table,
x = ~cats,
y = ~Freq,
type = "bar") %>%
layout(
title = "Performance Distribution",
width = 500, # lebar (px)
height = 350 # tinggi (px)
)library(plotly)
# Fungsi kategorisasi performa
categorize_performance <- function(sales){
categories <- c()
for(s in sales){
if(s > 800){
categories <- c(categories, "Excellent")
} else if(s > 600){
categories <- c(categories, "Very Good")
} else if(s > 400){
categories <- c(categories, "Good")
} else if(s > 200){
categories <- c(categories, "Average")
} else {
categories <- c(categories, "Poor")
}
}
return(categories)
}
# Data
sales_data <- runif(100, 100, 1000)
# Kategorisasi
categories <- categorize_performance(sales_data)
# Hitung frekuensi
counts <- as.data.frame(table(categories))
# Pie interaktif
plot_ly(counts,
labels = ~categories,
values = ~Freq,
type = 'pie',
textinfo = 'label+percent') %>%
layout(
title = "Distribusi Kategori Performa Penjualan",
width = 500,
height = 400
)3.1 Interpretasi
The performance category distribution provides an overview of sales quality. The proportion of specific categories can serve as an indicator for evaluating organizational performance. Visualizations in the form of bar charts and pie charts significantly enhance the readability of the information.
4 Multi-Company Dataset Simulation
knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
library(knitr)
generate_company_data <- function(n_company, n_employees){
data <- data.frame()
for(c in 1:n_company){
for(e in 1:n_employees){
salary <- runif(1, 3000, 10000)
perf <- runif(1, 50, 100)
kpi <- runif(1, 50, 100)
data <- rbind(data, data.frame(
company=c, employee=e,
salary=salary,
performance=perf,
KPI=kpi
))
}
}
return(data)
}
company_data <- generate_company_data(3, 50)
summary_table <- company_data %>%
summarise(
across(
c(salary, performance, KPI),
list(
Min = min,
Q1 = ~quantile(. , 0.25),
Median = median,
Mean = mean,
Q3 = ~quantile(. , 0.75),
Max = max
)
)
)
kable(summary_table, caption = "Summary Statistik Data Perusahaan")| salary_Min | salary_Q1 | salary_Median | salary_Mean | salary_Q3 | salary_Max | performance_Min | performance_Q1 | performance_Median | performance_Mean | performance_Q3 | performance_Max | KPI_Min | KPI_Q1 | KPI_Median | KPI_Mean | KPI_Q3 | KPI_Max |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3113.713 | 4888.721 | 6741.415 | 6631.755 | 8369.591 | 9991.177 | 50.23825 | 64.14734 | 77.66848 | 75.9003 | 88.35374 | 99.94609 | 50.19821 | 60.44406 | 72.90977 | 74.16575 | 87.98709 | 99.91977 |
4.1 Interpretasi
This analysis enables the identification of top-performing companies based on indicators such as average KPIs and salaries. Furthermore, determining top performers based on specific KPI thresholds reflects performance evaluation practices in modern organizations.
5 Monte Carlo Simulation: Estimasi π dan Probabilitas
library(plotly)
monte_carlo_pi <- function(n){
x <- runif(n)
y <- runif(n)
inside <- (x^2 + y^2) <= 1
pi_est <- 4 * mean(inside)
data <- data.frame(
x = x,
y = y,
inside = ifelse(inside, "Inside", "Outside")
)
plot_ly(data,
x = ~x,
y = ~y,
color = ~inside,
colors = c("blue", "red"),
type = "scatter",
mode = "markers") %>%
layout(
title = paste("Estimasi Pi (Monte Carlo):", round(pi_est,4)),
xaxis = list(title = "X"),
yaxis = list(title = "Y")
)
}
monte_carlo_pi(1000)5.1 Interpretasi
This approach demonstrates that mathematical problems can be solved through probability-based simulations. This concept is widely used in various fields, such as finance, artificial intelligence, and risk analysis.
6 Advanced Data Transformation & Feature Engineering
normalize_columns <- function(df){
for(col in names(df)){
if(is.numeric(df[[col]])){
df[[col]] <- (df[[col]] - min(df[[col]])) /
(max(df[[col]]) - min(df[[col]]))
}
}
return(df)
}
norm_data <- normalize_columns(company_data)
hist(norm_data$salary, main="Normalized Salary")6.1 Interpretasi
The comparison of distributions before and after transformation shows a significant change in data spread. This is essential as a preprocessing step prior to the application of machine learning models.
7 Mini Project: Company KPI Dashboard & Simulation
summary_company <- company_data %>%
group_by(company) %>%
summarise(avg_salary=mean(salary),
avg_KPI=mean(KPI))
ggplot(summary_company, aes(factor(company), avg_salary)) +
geom_bar(stat="identity") +
ggtitle("Avg Salary per Company")# Load library
library(ggplot2)
library(plotly)
# 1. Menyiapkan data (sesuai kode Anda)
set.seed(123)
df_company <- data.frame(
salary = runif(100, 3000, 10000),
KPI_score = runif(100, 50, 100),
company_id = sample(1:3, 100, replace = TRUE)
)
# 2. Membuat ggplot statis dan menyimpannya dalam variabel 'p'
p <- ggplot(df_company, aes(x = salary, y = KPI_score, color = factor(company_id))) +
geom_point(size = 2, alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE) +
labs(
title = "Hubungan Salary dengan KPI Score (Interaktif)",
x = "Salary",
y = "KPI Score",
color = "Company ID"
) +
theme_minimal()
# 3. Mengubah menjadi interaktif
ggplotly(p)## `geom_smooth()` using formula = 'y ~ x'
7.1 Interpretasi
The resulting dashboard provides a comprehensive overview of company performance. Visualizations such as bar charts and scatter plots enable the identification of patterns and relationships between variables, such as the correlation between salary and KPI.
8 Automated Report Generation
library(ggplot2)
library(dplyr)
# Pastikan sudah ada data dari Task 4
# Jika belum, generate ulang:
set.seed(123)
df_company <- data.frame(
company_id = sample(1:3, 150, replace = TRUE),
salary = runif(150, 3000, 10000),
KPI_score = runif(150, 50, 100),
performance_score = runif(150, 50, 100),
department = sample(c("IT","HR","Finance","Marketing"), 150, replace = TRUE)
)
# Loop otomatis per company
for(c in unique(df_company$company_id)){
cat("## 📌 Company", c, "\n")
data_subset <- df_company %>% filter(company_id == c)
# 1. Summary Table
print(summary(data_subset))
# 2. Bar Plot: Department Distribution
p1 <- ggplot(data_subset, aes(x = department, fill = department)) +
geom_bar() +
labs(title = paste("Department Distribution - Company", c),
x = "Department", y = "Count") +
theme_minimal()
print(p1)
# 3. Scatter Plot: Salary vs KPI
p2 <- ggplot(data_subset, aes(x = salary, y = KPI_score)) +
geom_point(color = "blue", alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = paste("Salary vs KPI - Company", c),
x = "Salary", y = "KPI Score") +
theme_minimal()
print(p2)
# 4. Histogram Salary
p3 <- ggplot(data_subset, aes(x = salary)) +
geom_histogram(bins = 15, fill = "skyblue", color = "black") +
labs(title = paste("Salary Distribution - Company", c),
x = "Salary", y = "Frequency") +
theme_minimal()
print(p3)
}## ## 📌 Company 3
## company_id salary KPI_score performance_score
## Min. :3 Min. :3140 Min. :51.40 Min. :50.81
## 1st Qu.:3 1st Qu.:5074 1st Qu.:62.17 1st Qu.:64.35
## Median :3 Median :6283 Median :74.04 Median :74.19
## Mean :3 Mean :6641 Mean :75.04 Mean :74.41
## 3rd Qu.:3 3rd Qu.:8592 3rd Qu.:87.16 3rd Qu.:87.10
## Max. :3 Max. :9803 Max. :99.30 Max. :99.65
## department
## Length:54
## Class :character
## Mode :character
##
##
##
## ## 📌 Company 2
## company_id salary KPI_score performance_score
## Min. :2 Min. :3044 Min. :52.91 Min. :50.92
## 1st Qu.:2 1st Qu.:4523 1st Qu.:59.86 1st Qu.:61.67
## Median :2 Median :6370 Median :67.70 Median :75.95
## Mean :2 Mean :6260 Mean :71.20 Mean :75.71
## 3rd Qu.:2 3rd Qu.:7841 3rd Qu.:82.89 3rd Qu.:90.64
## Max. :2 Max. :9996 Max. :99.83 Max. :98.96
## department
## Length:54
## Class :character
## Mode :character
##
##
##
## ## 📌 Company 1
## company_id salary KPI_score performance_score
## Min. :1 Min. :3236 Min. :50.02 Min. :51.87
## 1st Qu.:1 1st Qu.:5007 1st Qu.:65.52 1st Qu.:65.85
## Median :1 Median :6811 Median :76.56 Median :78.52
## Mean :1 Mean :6524 Mean :77.18 Mean :78.40
## 3rd Qu.:1 3rd Qu.:8083 3rd Qu.:89.15 3rd Qu.:91.57
## Max. :1 Max. :9785 Max. :99.56 Max. :98.45
## department
## Length:42
## Class :character
## Mode :character
##
##
##
8.1 Interpretasi
The resulting dashboard provides a comprehensive overview of company performance. Visualizations such as bar charts and scatter plots enable the identification of relationship patterns between variables, for example, between salary and KPI.
Referensi
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer.
VanderPlas, J. (2016). Python Data Science Handbook. O’Reilly.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer.