Data Science Programming
Assignment ~ Week 4 ~
R Programming Data Science Statistics
INTRODUCTION
This practicum aims to train the implementation of Control Flow logic in Python and R programming languages. The main focus of this material is the ability of programs to make decisions automatically and perform repetitive tasks on data. In this session, two main competencies are developed:
- Mastering conditional statements (
if,if-else,if-else if-else) - Applying various looping structures (
for,while,break,next)
Logic used for decision making based on certain conditions
Iteration over a collection of data with a known number of repetitions
A loop that runs while a specified condition remains true
Stops the loop immediately when a certain condition is met
Skips the current iteration and continues with the next iteration of the loop
Employee Dataset
| name | age | salary | position | performance |
|---|---|---|---|---|
| Bagas | 25 | 5000 | Staff | Good |
| Joan | 30 | 7000 | Supervisor | Very Good |
| Alya | 27 | 6500 | Staff | Average |
| Dwi | 35 | 10000 | Manager | Good |
| Nabil | 40 | 12000 | Director | Very Good |
1 Conditional Statement
In the module, it is explained that conditional statements are used to determine the program’s actions based on certain conditions.
In this practicum, employee bonuses are determined based on performance:
Very Good → 20% of the salary
Good → 10% of the salary
Average → 5% of the salary.
library(knitr)
library(kableExtra)
name <- c("Bagas","Joan","Alya","Dwi","Nabil")
age <- c(25,30,27,35,40)
salary <- c(5000,7000,6500,10000,12000)
position <- c("Staff","Supervisor","Staff","Manager","Director")
performance <- c("Good","Very Good","Average","Good","Very Good")
employee_data <- data.frame(name, age, salary, position, performance)
bonus_result <- data.frame()
for (i in 1:nrow(employee_data)) {
emp_name <- employee_data$name[i]
emp_salary <- employee_data$salary[i]
emp_perf <- employee_data$performance[i]
if (emp_perf == "Very Good") {
bonus <- 0.20 * emp_salary
} else if (emp_perf == "Good") {
bonus <- 0.10 * emp_salary
} else if (emp_perf == "Average") {
bonus <- 0.05 * emp_salary
} else {
bonus <- 0
}
bonus_result <- rbind(
bonus_result,
data.frame(
Name = emp_name,
Salary = emp_salary,
Performance = emp_perf,
Bonus = bonus
)
)
}
knitr::kable(
bonus_result,
format = "html",
caption = "Employee Bonus Calculation",
align = "l"
) %>%
kable_styling(full_width = TRUE) %>%
row_spec(0, background = "#2e7d32", color = "white")| Name | Salary | Performance | Bonus |
|---|---|---|---|
| Bagas | 5000 | Good | 500 |
| Joan | 7000 | Very Good | 1400 |
| Alya | 6500 | Average | 325 |
| Dwi | 10000 | Good | 1000 |
| Nabil | 12000 | Very Good | 2400 |
1.1 visualization
## Warning: package 'ggplot2' was built under R version 4.5.2
ggplot(bonus_result, aes(x = reorder(Name, Salary), y = Salary, fill = Performance)) +
geom_col(width = 0.65, color = "white") +
geom_text(aes(label = Salary),
vjust = -0.5,
size = 4,
fontface = "bold") +
scale_fill_manual(values = c(
"Very Good" = "#2E8B57",
"Good" = "#66CDAA",
"Average" = "#B2DF8A"
)) +
labs(
title = "Employee Salary Distribution",
subtitle = "Salary Based on Employee Performance",
x = "Employee Name",
y = "Salary",
fill = "Performance Level"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 12, color = "gray40", hjust = 0.5),
axis.text = element_text(size = 11),
legend.position = "top"
)1.2 Interpretation
The table and visualization show employee salaries, performance levels, and the bonuses they receive. Bonuses are calculated based on performance: Very Good employees receive 20% of their salary, Good receive 10%, and Average receive 5%. From the results, Nabil receives the highest bonus (2400) because he has the highest salary and a Very Good performance rating. Joan also receives a high bonus of 1400 due to her Very Good performance. Dwi and Bagas, who have Good performance, receive bonuses of 1000 and 500, while Alya receives the lowest bonus of 325 due to her Average performance. The bar chart shows the distribution of employee salaries and highlights performance levels using different colors.
2 for Loop (Gaji > 6000)
library(knitr)
library(kableExtra)
result_table <- data.frame()
for (i in 1:nrow(employee_data)) {
emp_name <- employee_data$name[i]
emp_salary <- employee_data$salary[i]
if (emp_salary > 6000) {
result_table <- rbind(
result_table,
data.frame(
Name = emp_name,
Salary = emp_salary
)
)
}
}
kable(
result_table,
format = "html",
caption = "Employees with Salary Greater Than 6000",
align = "l"
) %>%
kable_styling(full_width = TRUE) %>%
row_spec(0, background = "#2e7d32", color = "white")| Name | Salary |
|---|---|
| Joan | 7000 |
| Alya | 6500 |
| Dwi | 10000 |
| Nabil | 12000 |
2.1 Interpretation
The table shows employees whose salaries are greater than 6000. From the dataset, four employees meet this condition: Joan, Alya, Dwi, and Nabil. This indicates that most employees in the dataset earn salaries above 6000.
3 While Loop
library(knitr)
library(kableExtra)
i <- 1
result_table <- data.frame()
while (i <= nrow(employee_data)) {
emp_name <- employee_data$name[i]
emp_position <- employee_data$position[i]
result_table <- rbind(
result_table,
data.frame(
Name = emp_name,
Position = emp_position
)
)
if (emp_position == "Manager") {
break
}
i <- i + 1
}
kable(
result_table,
format = "html",
caption = "Employees Until Manager Found",
align = c("l","l")
) %>%
kable_styling(full_width = TRUE) %>%
row_spec(0, background = "#2e7d32", color = "white")| Name | Position |
|---|---|
| Bagas | Staff |
| Joan | Supervisor |
| Alya | Staff |
| Dwi | Manager |
3.1 Interpretation
The table shows employee data from the beginning of the dataset until a Manager is encountered. Once the program finds an employee with the position Manager, the loop stops and no further data is processed.
4 Break (Gaji > 10000)
library(knitr)
library(kableExtra)
result_table <- data.frame()
for (i in 1:nrow(employee_data)) {
emp_name <- employee_data$name[i]
emp_salary <- employee_data$salary[i]
if (emp_salary > 10000) {
result_table <- rbind(
result_table,
data.frame(
Name = emp_name,
Salary = emp_salary,
Status = "Salary above 10000 - Loop Stopped"
)
)
break
}
result_table <- rbind(
result_table,
data.frame(
Name = emp_name,
Salary = emp_salary,
Status = "Checked"
)
)
}
kable(
result_table,
caption = "Break Statement Example",
align = c("l","l","l")
) %>%
kable_styling(full_width = TRUE) %>%
row_spec(0, background = "#2e7d32", color = "white")| Name | Salary | Status |
|---|---|---|
| Bagas | 5000 | Checked |
| Joan | 7000 | Checked |
| Alya | 6500 | Checked |
| Dwi | 10000 | Checked |
| Nabil | 12000 | Salary above 10000 - Loop Stopped |
4.1 Interpretation
The provided table illustrates a break statement in action, where a loop processes a list of employees until a specific condition is met. The program successfully checks the first four employees because their salaries are 10,000 or less, marking them as “Checked.” However, upon reaching Nabil, the system detects a salary of 12,000, which triggers the condition salary > 10,000. This executes the break command, causing the loop to terminate immediately and stop any further processing. Essentially, the “break” acts as an emergency exit that shuts down the entire cycle the moment a target or limit is hit, ensuring the program doesn’t waste resources on remaining data.
5 Continue / Next
library(knitr)
library(kableExtra)
result_table <- data.frame()
for (i in 1:nrow(employee_data)) {
emp_name <- employee_data$name[i]
emp_perf <- employee_data$performance[i]
if (emp_perf == "Average") {
next
}
result_table <- rbind(
result_table,
data.frame(
Name = emp_name,
Performance = emp_perf
)
)
}
kable(
result_table,
format = "html",
caption = "Employees with Performance Not Equal to Average",
align = "l"
) %>%
kable_styling(full_width = TRUE) %>%
row_spec(0, background = "#2e7d32", color = "white")| Name | Performance |
|---|---|
| Bagas | Good |
| Joan | Very Good |
| Dwi | Good |
| Nabil | Very Good |
5.1 Interpretation
The table shows employees whose performance is not categorized as Average. The program uses the next statement to skip employees with Average performance during the loop iteration. As a result, the employee named Alya is excluded from the table, while the remaining employees with Good and Very Good performance are displayed.
6 Conclusion
Overall, this practicum helps improve the understanding of how program control flow structures can be used to process data efficiently, make automatic decisions, and manage looping operations in data analysis using R.
7 References
[1] Siregar, B. (n.d.). Data Science Programming: Chapter 02: Syntax and Control Flow. dsciencelabs. https://bookdown.org/dsciencelabs/data_science_programming/02-Syntax-and-Control-Flow.html?authuser=0#practicum