Functions & Loops + Data Science
Functions, Loops, and Conditional Logic in R
Assignment Week 5 — Institut Teknologi Sains Bandung
Name : Hirose Kawarin Sirait
ID Number : 52250012
Study Program : Data Science
Lecturer : Mr. Bakti Siregar, M.Sc., CDS.
Course : Data Science Programming
Format : RPubs (R)
Introduction
In this practicum, we explore various concepts in data science
including function development, data simulation, statistical
analysis, and visualization.
The objective of this project is to build structured and reusable
functions using loops and conditional logic, simulate real-world
datasets, and analyze the results through meaningful visualizations.
Each task is designed to reflect practical data science scenarios,
such as sales analysis, performance evaluation, and company-level data
processing. By completing this practicum, we aim to enhance our
understanding of how data can be generated, transformed, and interpreted
effectively.
Furthermore, this project emphasizes clean coding
practices, clear data presentation, and
insightful interpretation to support
data-driven decision making.
Objectives
This project aims to:
Build multi-layer functions using loops and conditional
logic
Perform data simulation and transformation
Create data visualizations
Develop automated workflows in R
TASK 2 – Nested
Simulation: Multi-Sales & Discounts
MAIN FUNCTION
simulate_sales <- function(n_salesperson, days) {
data <- data.frame()
for (s in 1:n_salesperson) {
for (d in 1:days) {
sales_amount <- sample(100:1000, 1)
# Conditional discount
if (sales_amount > 800) {
discount_rate <- 0.2
} else if (sales_amount > 500) {
discount_rate <- 0.1
} else {
discount_rate <- 0.05
}
data <- rbind(data, data.frame(
salesperson_id = s,
day = d,
sales_amount = sales_amount,
discount_rate = discount_rate
))
}
}
return(data)
}
RUN SIMULATION
sales_data <- simulate_sales(5, 10)
head(sales_data)
## salesperson_id day sales_amount discount_rate
## 1 1 1 118 0.05
## 2 1 2 559 0.10
## 3 1 3 841 0.20
## 4 1 4 137 0.05
## 5 1 5 238 0.05
## 6 1 6 541 0.10
FUNCTION CUMULATIVE
SALES
calculate_cumulative <- function(df) {
df$cumulative_sales <- ave(df$sales_amount, df$salesperson_id, FUN = cumsum)
return(df)
}
sales_data <- calculate_cumulative(sales_data)
head(sales_data)
## salesperson_id day sales_amount discount_rate cumulative_sales
## 1 1 1 118 0.05 118
## 2 1 2 559 0.10 677
## 3 1 3 841 0.20 1518
## 4 1 4 137 0.05 1655
## 5 1 5 238 0.05 1893
## 6 1 6 541 0.10 2434
TABLE
📊 Sales Simulation Data
|
salesperson_id
|
day
|
sales_amount
|
discount_rate
|
cumulative_sales
|
|
1
|
1
|
118
|
0.05
|
118
|
|
1
|
2
|
559
|
0.10
|
677
|
|
1
|
3
|
841
|
0.20
|
1518
|
|
1
|
4
|
137
|
0.05
|
1655
|
|
1
|
5
|
238
|
0.05
|
1893
|
|
1
|
6
|
541
|
0.10
|
2434
|
|
1
|
7
|
434
|
0.05
|
2868
|
|
1
|
8
|
961
|
0.20
|
3829
|
|
1
|
9
|
100
|
0.05
|
3929
|
|
1
|
10
|
706
|
0.10
|
4635
|
SUMMARY STATS
aggregate(sales_amount ~ salesperson_id, data = sales_data, sum)
## salesperson_id sales_amount
## 1 1 4635
## 2 2 4963
## 3 3 4201
## 4 4 4662
## 5 5 6552
VISUALIZATION
library(ggplot2)
library(plotly)
p <- ggplot(sales_data, aes(x = day, y = cumulative_sales, color = factor(salesperson_id))) +
geom_line(linewidth=1.2) +
geom_point() +
labs(
title = "Cumulative Sales per Salesperson",
x = "Day",
y = "Cumulative Sales",
color = "Salesperson"
) +
theme_minimal() +
theme(
plot.background = element_rect(fill = "#F3E8FF", color = NA),
legend.position = "top"
)
ggplotly(p)
INTERPRETATION
The simulation shows how each salesperson accumulates sales over
time.
Salespersons with higher daily sales grow faster.
Discount rates affect the final revenue indirectly.
The cumulative trend helps identify top performers.
TASK 4 – Multi-Company
Dataset Simulation
In this task, we simulate a dataset representing multiple companies
and their employees. Each company contains several employees with
attributes such as salary, department, performance score, and KPI
score.
The objective is to generate structured data using nested loops and
apply conditional logic to identify top performers. This simulation
reflects real-world organizational data, where companies analyze
employee performance and salary distribution.
We will also summarize the dataset at the company level, including
average salary, average performance score, and maximum KPI score,
followed by visualizations to better understand the patterns within the
data.
FUNCTION GENERATE
DATA
generate_company_data <- function(n_company, n_employees) {
data <- data.frame()
departments <- c("HR", "Finance", "IT", "Marketing")
for (c in 1:n_company) {
for (e in 1:n_employees) {
salary <- sample(3000:10000, 1)
performance_score <- sample(60:100, 1)
KPI_score <- sample(50:100, 1)
department <- sample(departments, 1)
# conditional: top performer
if (KPI_score > 90) {
performer <- "Top Performer"
} else {
performer <- "Regular"
}
data <- rbind(data, data.frame(
company_id = c,
employee_id = paste0("C", c, "_E", e),
salary = salary,
department = department,
performance_score = performance_score,
KPI_score = KPI_score,
performer_status = performer
))
}
}
return(data)
}
GENERATE DATA
company_data <- generate_company_data(5, 20)
head(company_data)
## company_id employee_id salary department performance_score KPI_score
## 1 1 C1_E1 9996 IT 75 75
## 2 1 C1_E2 6692 HR 83 91
## 3 1 C1_E3 6726 HR 61 70
## 4 1 C1_E4 3310 Marketing 70 56
## 5 1 C1_E5 6916 HR 63 86
## 6 1 C1_E6 7234 Finance 72 57
## performer_status
## 1 Regular
## 2 Top Performer
## 3 Regular
## 4 Regular
## 5 Regular
## 6 Regular
SUMMARY PER
COMPANY
summary_data <- aggregate(cbind(salary, performance_score, KPI_score) ~ company_id,
data = company_data,
FUN = mean)
max_kpi <- aggregate(KPI_score ~ company_id, data = company_data, max)
summary_data$max_KPI <- max_kpi$KPI_score
summary_data
## company_id salary performance_score KPI_score max_KPI
## 1 1 6572.90 78.45 73.55 94
## 2 2 6788.55 72.45 76.85 100
## 3 3 6782.00 76.95 75.50 99
## 4 4 6455.90 81.55 70.80 92
## 5 5 6498.65 82.40 80.15 98
TABLE
Company Summary
|
company_id
|
salary
|
performance_score
|
KPI_score
|
max_KPI
|
|
1
|
6572.90
|
78.45
|
73.55
|
94
|
|
2
|
6788.55
|
72.45
|
76.85
|
100
|
|
3
|
6782.00
|
76.95
|
75.50
|
99
|
|
4
|
6455.90
|
81.55
|
70.80
|
92
|
|
5
|
6498.65
|
82.40
|
80.15
|
98
|
VISUALIZATION
Average Salary per
Company
library(ggplot2)
ggplot(summary_data, aes(x=factor(company_id), y=salary, fill=factor(company_id))) +
geom_bar(stat="identity") +
geom_text(aes(label=round(salary,0)), vjust=-0.5) +
labs(
title="Average Salary per Company",
x="Company",
y="Average Salary"
) +
theme_minimal() +
theme(
plot.background = element_rect(fill="#F3E8FF"),
legend.position="none"
)

KPI Distribution
(Scatter)
ggplot(company_data, aes(x=performance_score, y=KPI_score, color=factor(company_id))) +
geom_point() +
labs(
title="Performance vs KPI",
x="Performance Score",
y="KPI Score",
color="Company"
) +
theme_minimal() +
theme(
plot.background = element_rect(fill="#F3E8FF")
)

INTERPRETATION
The generated dataset represents multiple companies with varying
employee attributes.
From the summary table and bar chart:
Each company shows different average salary levels, indicating
variation in compensation structures.
Some companies have higher average salaries, which may reflect
higher performance or different roles.
From the scatter plot:
There is a visible relationship between performance score and KPI
score.
Employees with higher performance scores tend to have higher KPI
values.
Top performers (KPI > 90) are distributed across different
companies, indicating that high performance is not limited to a single
company.
Overall, the simulation demonstrates how organizational data can be
analyzed to identify performance trends and company-level
differences.
TASK 5 – Monte Carlo
Simulation: Pi & Probability
In this task, we use Monte Carlo simulation to estimate the value of
π (pi) and analyze probability through random point generation.
By generating random points within a square and checking whether they
fall inside a circle, we can approximate π mathematically. Additionally,
we compute the probability of points falling within a defined
sub-region.
This method demonstrates how randomness and probability can be used
to solve mathematical problems and simulate real-world uncertainty.
FUNCTION
monte_carlo_pi <- function(n_points) {
x <- runif(n_points, -1, 1)
y <- runif(n_points, -1, 1)
inside_circle <- x^2 + y^2 <= 1
pi_estimate <- 4 * sum(inside_circle) / n_points
# probability sub-square (misalnya area kecil)
inside_square <- (x > 0 & x < 0.5 & y > 0 & y < 0.5)
probability <- sum(inside_square) / n_points
data <- data.frame(x, y, inside_circle)
return(list(
pi_estimate = pi_estimate,
probability = probability,
data = data
))
}
RUN
result <- monte_carlo_pi(5000)
result$pi_estimate
## [1] 3.0968
result$probability
## [1] 0.0612
VISUALIZATION
ggplot(result$data, aes(x=x, y=y, color=inside_circle)) +
geom_point(alpha=0.6) +
labs(
title="Monte Carlo Simulation for Pi",
subtitle=paste("Estimated Pi =", round(result$pi_estimate,4)),
x="X",
y="Y"
) +
theme_minimal() +
theme(
plot.background = element_rect(fill="#F3E8FF")
)

INTRPRETATION
The simulation estimates the value of π by comparing points inside a
circle to the total number of random points.
- The estimated value of π approaches the true value (≈ 3.14)
as the number of points increases.
The visualization shows points inside and outside the circle,
forming a circular pattern.
The probability result indicates how likely a random point falls
within the defined sub-square region.
Points inside the circle form a circular pattern, representing the
area of the unit circle. Points outside the circle remain within the
square but do not satisfy the circle equation. The ratio of points
inside the circle to total points approximates the ratio of the circle’s
area to the square’s area, which is used to estimate π. As the number of
points increases, the circular shape becomes clearer and the estimation
of π becomes more accurate
This demonstrates how randomness can approximate mathematical
constants and analyze probability.
TASK 7 – Mini Project:
Company KPI Dashboard
In this mini project, a comprehensive dataset was built representing
several companies and their employees, including salaries, performance
scores, KPI scores, and departments.
The goal is to simulate real-world company data and create a
dashboard that summarizes key performance indicators (KPIs). We
categorize employees, analyze company-level metrics, and visualize
patterns using advanced plots.
This task integrates all previous concepts, including data
simulation, loops, feature engineering, and visualization, to produce a
complete data analysis workflow.
GENERATE DATA
dashboard_data <- generate_company_data(6, 50)
head(dashboard_data)
## company_id employee_id salary department performance_score KPI_score
## 1 1 C1_E1 8493 Finance 63 64
## 2 1 C1_E2 3503 Marketing 70 100
## 3 1 C1_E3 3077 HR 100 64
## 4 1 C1_E4 4300 Marketing 99 61
## 5 1 C1_E5 8764 Marketing 87 68
## 6 1 C1_E6 5182 HR 91 77
## performer_status
## 1 Regular
## 2 Top Performer
## 3 Regular
## 4 Regular
## 5 Regular
## 6 Regular
SUMMARY KPI
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.2
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:kableExtra':
##
## group_rows
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
summary_kpi <- dashboard_data %>%
group_by(company_id) %>%
summarise(
avg_salary = mean(salary),
avg_KPI = mean(KPI_score),
top_performers = sum(KPI_score > 90)
)
summary_kpi
## # A tibble: 6 × 4
## company_id avg_salary avg_KPI top_performers
## <int> <dbl> <dbl> <int>
## 1 1 6252. 73.2 5
## 2 2 6663. 74.0 6
## 3 3 6311. 74.6 5
## 4 4 6994. 71.4 7
## 5 5 6478. 74.2 12
## 6 6 7006. 72.7 8
KPI CATEGORY
dashboard_data$KPI_category <- ifelse(
dashboard_data$KPI_score > 90, "High",
ifelse(dashboard_data$KPI_score > 75, "Medium", "Low")
)
DASHBOARD
VISUALIZATION
Bar Chart KPI
p1 <- ggplot(summary_kpi, aes(x=factor(company_id), y=avg_KPI, fill=factor(company_id))) +
geom_bar(stat="identity") +
geom_text(aes(label=round(avg_KPI,1)), vjust=-0.5) +
theme_minimal() +
labs(title="Average KPI per Company")
ggplotly(p1)
Scatter +
Regression
p2 <- ggplot(dashboard_data, aes(x=salary, y=KPI_score, color=department)) +
geom_point() +
geom_smooth(method="lm", se=FALSE) +
theme_minimal() +
labs(title="Salary vs KPI")
ggplotly(p2)
## `geom_smooth()` using formula = 'y ~ x'
Distribution
Salary
p3 <- ggplot(dashboard_data, aes(x=salary)) +
geom_histogram(fill="#7C3AED", bins=20) +
theme_minimal() +
labs(title="Salary Distribution")
ggplotly(p3)
INTERPRETATION
The dashboard provides a complete overview of company
performance.
The bar chart shows differences in average KPI across companies,
helping identify top-performing companies.
The scatter plot reveals a positive relationship between salary
and KPI, indicating that higher-paid employees tend to perform
better.
The histogram illustrates the distribution of salaries across all
companies.
Overall, the dashboard highlights key performance patterns and
supports data-driven decision making at the company level.
TASK 8 – Automated
Report Generation
In this final task, we develop an automated reporting system that
generates summaries for each company using functions and loops.
The goal is to create a scalable workflow where reports, including
tables and visualizations, are automatically produced without manual
repetition.
This approach reflects real-world data science practices, where
automation is essential for handling large datasets efficiently and
consistently.
FUNCTION REPORT
generate_report <- function(data, company_id) {
df <- data[data$company_id == company_id, ]
summary <- data.frame(
Company = company_id,
Avg_Salary = mean(df$salary),
Avg_KPI = mean(df$KPI_score),
Top_Performers = sum(df$KPI_score > 90)
)
return(list(data=df, summary=summary))
}
LOOP ALL COMPANY
company_ids <- unique(dashboard_data$company_id)
reports <- list()
for (cid in company_ids) {
reports[[as.character(cid)]] <- generate_report(dashboard_data, cid)
}
DISPLAY AUTOMATIC
REPORT
## ## Company 1
## <table class="table" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;">
## <thead>
## <tr>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Company </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Avg_Salary </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Avg_KPI </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Top_Performers </th>
## </tr>
## </thead>
## <tbody>
## <tr>
## <td style="text-align:center;"> 1 </td>
## <td style="text-align:center;"> 6251.88 </td>
## <td style="text-align:center;"> 73.18 </td>
## <td style="text-align:center;"> 5 </td>
## </tr>
## </tbody>
## </table>## Company 2
## <table class="table" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;">
## <thead>
## <tr>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Company </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Avg_Salary </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Avg_KPI </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Top_Performers </th>
## </tr>
## </thead>
## <tbody>
## <tr>
## <td style="text-align:center;"> 2 </td>
## <td style="text-align:center;"> 6662.68 </td>
## <td style="text-align:center;"> 74.02 </td>
## <td style="text-align:center;"> 6 </td>
## </tr>
## </tbody>
## </table>## Company 3
## <table class="table" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;">
## <thead>
## <tr>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Company </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Avg_Salary </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Avg_KPI </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Top_Performers </th>
## </tr>
## </thead>
## <tbody>
## <tr>
## <td style="text-align:center;"> 3 </td>
## <td style="text-align:center;"> 6310.7 </td>
## <td style="text-align:center;"> 74.58 </td>
## <td style="text-align:center;"> 5 </td>
## </tr>
## </tbody>
## </table>## Company 4
## <table class="table" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;">
## <thead>
## <tr>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Company </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Avg_Salary </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Avg_KPI </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Top_Performers </th>
## </tr>
## </thead>
## <tbody>
## <tr>
## <td style="text-align:center;"> 4 </td>
## <td style="text-align:center;"> 6994.08 </td>
## <td style="text-align:center;"> 71.36 </td>
## <td style="text-align:center;"> 7 </td>
## </tr>
## </tbody>
## </table>## Company 5
## <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
## <thead>
## <tr>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Company </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Avg_Salary </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Avg_KPI </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Top_Performers </th>
## </tr>
## </thead>
## <tbody>
## <tr>
## <td style="text-align:center;"> 5 </td>
## <td style="text-align:center;"> 6477.78 </td>
## <td style="text-align:center;"> 74.24 </td>
## <td style="text-align:center;"> 12 </td>
## </tr>
## </tbody>
## </table>## Company 6
## <table class="table" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;">
## <thead>
## <tr>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Company </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Avg_Salary </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Avg_KPI </th>
## <th style="text-align:center;color: white !important;background-color: rgba(106, 13, 173, 255) !important;"> Top_Performers </th>
## </tr>
## </thead>
## <tbody>
## <tr>
## <td style="text-align:center;"> 6 </td>
## <td style="text-align:center;"> 7006.44 </td>
## <td style="text-align:center;"> 72.66 </td>
## <td style="text-align:center;"> 8 </td>
## </tr>
## </tbody>
## </table>
VISUALIZATION
SCATTER +
REGRESSION
library(ggplot2)
library(plotly)
library(htmltools)
plots <- list()
for (cid in company_ids) {
df <- reports[[as.character(cid)]]$data
p <- ggplot(df, aes(x=salary, y=KPI_score, color=department)) +
geom_point(size=2, alpha=0.7) +
geom_smooth(method="lm", se=FALSE) +
labs(
title=paste("Company", cid, "- Salary vs KPI"),
x="Salary",
y="KPI Score"
) +
theme_minimal()
plots[[cid]] <- ggplotly(p)
}
tagList(plots)
HISTOGRAM
plots_hist <- list()
for (cid in company_ids) {
df <- reports[[as.character(cid)]]$data
p <- ggplot(df, aes(x=salary)) +
geom_histogram(fill="#7C3AED", bins=15) +
labs(
title=paste("Salary Distribution - Company", cid),
x="Salary",
y="Count"
) +
theme_minimal() +
theme(plot.background = element_rect(fill="#F3E8FF"))
plots_hist[[cid]] <- ggplotly(p)
}
tagList(plots_hist)
EXPORT
write.csv(dashboard_data, "company_data.csv", row.names = FALSE)
INTERPRETATION
The automated reporting system successfully generates summaries and
visualizations for each company using loops and functions.
The scatter plots show the relationship between salary and KPI,
where higher salaries tend to be associated with higher KPI
scores.
The regression lines highlight a positive trend across
companies.
Differences between departments can be observed through color
variations.
The salary distribution plots reveal how employee compensation
varies within each company.
The interactive features allow deeper exploration of the data, making
it easier to identify patterns and outliers. Overall, automation
improves efficiency and demonstrates how data science workflows can
scale to handle multiple datasets simultaneously.