Functions & Loops + Data Science
Assignment ~ Week 5
1 Introduction
This assignment explores the use of functions, loops, and conditional logic in R for building structured and automated data science workflows. The project integrates simulation, mathematical modeling, and visualization to demonstrate scalable analysis techniques.
1.1 Objectives
- Build multi-layer functions with nested loops and conditional
logic.
- Handle multi-dataset simulations.
- Perform advanced statistics, data transformation, and visualization.
- Develop an automated data science workflow.
2 Task 1 – Dynamic Multi-Formula Function
2.1 Function Definition
compute_formula <- function(x, formula) {
# Fungsi ini menerima input x dan tipe formula, mengembalikan hasil per formula
if (formula == "linear") {
return(2*x + 3) # linear
} else if (formula == "quadratic") {
return(0.5*x^2 + 2*x + 1) # kuadrat
} else if (formula == "cubic") {
return(0.05*x^3 - 0.5*x^2 + x) # kubik
} else if (formula == "exponential") {
return(exp(x/6)) # eksponensial
} else {
stop("Input formula tidak valid!")
}
}2.2 Nested Loop Computation
x_values <- 1:20
formula_list <- c("linear","quadratic","cubic","exponential")
results <- data.frame()
for (f in formula_list) {
for (x in x_values) {
results <- rbind(results,
data.frame(x=x,
y=compute_formula(x,f),
formula=f))
}
}
head(results)2.3 Visualization (Revised)
2.4 Interpretation
- Multiple mathematical formulas (linear, quadratic, cubic, exponential) are computed dynamically using a single function.
- Nested loops allow calculation across a range of input values for each formula.
- Results are organized into a structured data frame for easy analysis.
- Line plots with points visualize the output of all formulas, showing trends and differences clearly.
3 Task 2 – Multi-Sales & Discounts
3.1 Function Definition
simulate_sales <- function(n_salesperson, days) {
# Simulasi data sales per salesperson per hari
sales_data <- data.frame()
for (s in 1:n_salesperson) {
for (d in 1:days) {
sales_amount <- sample(200:1200,1) # simulasi nilai sales
if (sales_amount > 900) {
discount <- 0.25
} else if (sales_amount > 600) {
discount <- 0.15
} else {
discount <- 0.05
}
sales_data <- rbind(sales_data,
data.frame(
salesperson_id=s,
day=d,
sales_amount=sales_amount,
discount_rate=discount))
}
}
sales_data
}3.3 Nested Function (Cumulative)
3.5 Visualization
3.6 Interpretation
- Daily sales data is simulated for multiple salespeople over a defined period.
- Discounts are automatically assigned based on sales thresholds.
- Cumulative sales per salesperson highlight top performers over time.
- Line plots visualize cumulative sales trajectories, allowing performance comparison
4 Task 3 – Performance Categorization
4.1 Function Definition
4.2 Loop Implementation
4.3 Result Table
category_df <- as.data.frame(prop.table(table(sales_data$category))*100)
colnames(category_df) <- c("Category","Percentage")
category_df4.4 Visualization (Revised – Pie Chart Interaktif)
4.5 Interpretation
- Sales amounts are categorized into performance levels: Poor, Average, Good, Very Good, Excellent.
- This categorization identifies individual and overall team performance.
- Pie charts illustrate the distribution of performance categories.
5 Task 4 – Company Dataset Simulation
5.1 Function Definition
generate_company_data <- function(n_company,n_employees){
# Generate data karyawan per perusahaan
data <- data.frame()
for(c in 1:n_company){
for(e in 1:n_employees){
data <- rbind(data,
data.frame(
company_id=c,
employee_id=e,
salary=sample(3000:10000,1),
department=sample(c("HR","IT","Finance","Marketing"),1),
performance_score=runif(1,60,100),
KPI_score=runif(1,50,100)
))
}
}
data
}5.2 Implementation
5.3 Summary
summary_company <- aggregate(cbind(salary,performance_score,KPI_score)~company_id,
company_data,
function(x) c(mean=mean(x), max=max(x)))
summary_company5.4 Visualization
5.5 Interpretation
- Synthetic company data includes employee salary, department, performance score, and KPI score.
- Summary statistics provide a snapshot of workforce characteristics per company.
- Boxplots show salary distributions, revealing variability within and across companies.
- Top performers are identified based on KPI scores.
6 Task 5 – Monte Carlo Simulation
6.1 Function Definition
6.3 Visualization
6.4 Interpretation *
- Monte Carlo simulation estimates π by sampling random points in a unit square.
- The simulation also calculates probabilities of points falling in specific regions.
- Scatter plots visualize points inside vs. outside the circle.
- This demonstrates statistical estimation and visual validation.
7 Task 6 – Data Transformation
7.1 Functions
normalize_columns <- function(df){
# Normalisasi kolom numeric
num <- sapply(df,is.numeric)
for(col in names(df)[num]){
df[[col]] <- (df[[col]]-min(df[[col]]))/
(max(df[[col]])-min(df[[col]]))
}
df
}
z_score <- function(df){
# Z-score kolom numeric
num <- sapply(df,is.numeric)
for(col in names(df)[num]){
df[[col]] <- (df[[col]]-mean(df[[col]]))/sd(df[[col]])
}
df
}7.2 Apply Transformation
7.3 Visualization (All Plotly)
7.4 Interpretation
- Numeric columns are normalized (min-max) and standardized (z-score).
- Feature engineering, such as salary brackets, enables meaningful categorization.
- Histograms and boxplots highlight patterns and variability in salaries.
- Data transformations facilitate comparison and downstream analysis.
8 Task 7 – KPI Dashboard Mini Project
8.2 KPI Summary per Company
summary_kpi <- company_big %>%
group_by(company_id) %>%
summarise(avg_salary=mean(salary),
avg_KPI=mean(KPI_score),
top_performers=sum(KPI_score>90))
summary_kpi8.3 Categorize KPI Tiers
8.5 Visualizations
# Scatter KPI vs Salary with Regression
p7_scatter <- plot_ly(company_big, x=~KPI_score, y=~salary,
color=~factor(company_id), type='scatter', mode='markers') %>%
layout(title='KPI vs Salary per Company') %>%
add_lines(x=~KPI_score, y=~predict(lm(salary~KPI_score, data=company_big)), line=list(color='black'))
p7_scatter# Grouped Bar Chart Department Count per Company
dept_summary <- company_big %>%
group_by(company_id, department) %>%
summarise(count=n())
p7_bar <- plot_ly(dept_summary, x=~factor(company_id), y=~count, color=~department,
type='bar') %>%
layout(barmode='group', title='Employees per Department per Company')
p7_bar8.6 Interpretation
- KPI dashboards summarize average salary, average KPI, and top performer counts per company.
- Employees are categorized into KPI tiers (Low, Medium, High, Top).
- Scatter plots show KPI vs. Salary relationships, with regression lines for trends.
- Bar charts display department-wise employee counts per company.
- Histograms display salary distributions across companies.
9 Task 8 – Automated Company Report
library(plotly)
library(dplyr)
library(kableExtra)
library(htmltools)
library(ggplot2)
# Function to generate report per company
generate_company_report <- function(company_df, company_id){
df <- company_df %>% filter(company_id == !!company_id)
# Summary table
summary_tbl <- df %>%
summarise(
Company = unique(company_id),
Avg_Salary = round(mean(salary),2),
Avg_KPI = round(mean(KPI_score),2),
Top_Performers = sum(KPI_score > 90)
)
# Scatter plot: KPI vs Salary
p_scatter <- ggplot(df, aes(x=KPI_score, y=salary, color=department)) +
geom_point(alpha=0.7) +
geom_smooth(method="lm", se=FALSE, color="black") +
labs(title=paste("KPI vs Salary - Company", company_id),
x="KPI Score", y="Salary") +
theme_minimal()
# Histogram: Salary
p_hist <- ggplot(df, aes(x=salary)) +
geom_histogram(fill="#7C3AED", bins=15, alpha=0.7) +
labs(title=paste("Salary Distribution - Company", company_id),
x="Salary", y="Count") +
theme_minimal()
# Bar chart: Department count
dept_summary <- df %>% group_by(department) %>% summarise(count=n())
p_dept <- plot_ly(dept_summary, x=~department, y=~count, type='bar', color=~department) %>%
layout(title=paste("Employees per Department - Company", company_id),
xaxis=list(title="Department"),
yaxis=list(title="Count"))
list(
summary = summary_tbl,
scatter = plotly::ggplotly(p_scatter),
histogram = plotly::ggplotly(p_hist),
bar_chart = p_dept
)
}
# Loop over all companies
company_ids <- unique(company_big$company_id)
reports <- lapply(company_ids, function(cid){
generate_company_report(company_big, cid)
})
names(reports) <- paste0("Company_", company_ids)
# Render reports
report_html <- lapply(company_ids, function(cid){
rep <- reports[[paste0("Company_", cid)]]
tagList(
h2(paste("Company", cid)),
HTML(kable(rep$summary, "html") %>%
kable_styling(full_width = F, position = "center", bootstrap_options = "striped")),
rep$scatter,
rep$histogram,
rep$bar_chart
)
})
browsable(tagList(report_html))Company 1
| Company | Avg_Salary | Avg_KPI | Top_Performers |
|---|---|---|---|
| 1 | 6399.8 | 72.3 | 17 |
Company 2
| Company | Avg_Salary | Avg_KPI | Top_Performers |
|---|---|---|---|
| 2 | 6630.13 | 77.11 | 21 |
Company 3
| Company | Avg_Salary | Avg_KPI | Top_Performers |
|---|---|---|---|
| 3 | 6637.58 | 74.61 | 17 |
Company 4
| Company | Avg_Salary | Avg_KPI | Top_Performers |
|---|---|---|---|
| 4 | 6451.07 | 75.57 | 18 |
Company 5
| Company | Avg_Salary | Avg_KPI | Top_Performers |
|---|---|---|---|
| 5 | 6484.17 | 77.89 | 29 |
# Export combined summary CSV
dashboard_data <- bind_rows(lapply(reports, function(r) r$summary))
write.csv(dashboard_data, "dashboard_company_data.csv", row.names = FALSE)9.1 Interpretation
- Automatic reports are generated for each company.
- Scatter plots show the relationship between KPIs and Salary, using a regression line.
- Histograms display the salary distribution for each company.
- Bar charts display the number of employees per department.
- All processes use the + loop function, according to a scalable workflow.
- CSV export allows data to be used for further analysis or additional reporting.
10 Conclusion
This assignment highlights how functions and loops can automate data workflows effectively. Using dynamic computations, simulations, and nested loops, we analyze sales, employee performance, and Monte Carlo simulations in a reproducible way. Visualizations like line plots, scatter plots, histograms, and bar charts helped reveal trends, distributions, and top performers. Data transformations and engineering features improved clarity and enabled meaningful categorization. Automated dashboards and company reports summarized key metrics, while CSV exports allowed further analysis and reporting. Overall, the project demonstrates how structured R workflows can integrate computation, visualization, and reporting to generate actionable insights efficiently.
References
- Siregar, B. (n.d.). Data Science Programming: Study Case Using R and Python. Retrieved from https://bookdown.org/dsciencelabs/data_science_programming/