Data Science Programming

Assignment ~ Week 4 ~

Jihan Ramadhani Deandri

Data Science undergraduate student

R Programming Data Science Statistics

INTRODUCTION

This practicum aims to train the implementation of Control Flow logic in Python and R programming languages. The main focus of this material is the ability of programs to make decisions automatically and perform repetitive tasks on data. In this session, two main competencies are developed:

Mastering conditional statements (if, if-else, if-else if-else)
Applying various looping structures (for, while, break, next)

Conditional Statement
Logic used for decision making based on certain conditions

For Loop
Iteration over a collection of data with a known number of repetitions

While Loop
A loop that runs while a specified condition remains true

Break
Stops the loop immediately when a certain condition is met

Continue / Next
Skips the current iteration and continues with the next iteration of the loop

Employee Dataset

name	age	salary	position	performance
Bagas	25	5000	Staff	Good
Joan	30	7000	Supervisor	Very Good
Alya	27	6500	Staff	Average
Dwi	35	10000	Manager	Good
Nabil	40	12000	Director	Very Good

1 Conditional Statement

In the module, it is explained that conditional statements are used to determine the program’s actions based on certain conditions.

In this practicum, employee bonuses are determined based on performance:

Very Good → 20% of the salary
Good → 10% of the salary
Average → 5% of the salary.

library(knitr)
library(kableExtra)

name <- c("Bagas","Joan","Alya","Dwi","Nabil")
age <- c(25,30,27,35,40)
salary <- c(5000,7000,6500,10000,12000)
position <- c("Staff","Supervisor","Staff","Manager","Director")
performance <- c("Good","Very Good","Average","Good","Very Good")

employee_data <- data.frame(name, age, salary, position, performance)

bonus_result <- data.frame()

for (i in 1:nrow(employee_data)) {
  
  emp_name <- employee_data$name[i]
  emp_salary <- employee_data$salary[i]
  emp_perf <- employee_data$performance[i]
  
  if (emp_perf == "Very Good") {
    bonus <- 0.20 * emp_salary
    
  } else if (emp_perf == "Good") {
    bonus <- 0.10 * emp_salary
    
  } else if (emp_perf == "Average") {
    bonus <- 0.05 * emp_salary
    
  } else {
    bonus <- 0
  }
  
  bonus_result <- rbind(
    bonus_result,
    data.frame(
      Name = emp_name,
      Salary = emp_salary,
      Performance = emp_perf,
      Bonus = bonus
    )
  )
}

knitr::kable(
  bonus_result,
  format = "html",
  caption = "Employee Bonus Calculation",
  align = "l"
) %>%
  kable_styling(full_width = TRUE) %>%
  row_spec(0, background = "#2e7d32", color = "white")

Employee Bonus Calculation
Name	Salary	Performance	Bonus
Bagas	5000	Good	500
Joan	7000	Very Good	1400
Alya	6500	Average	325
Dwi	10000	Good	1000
Nabil	12000	Very Good	2400

1.1 visualization

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.5.2

ggplot(bonus_result, aes(x = reorder(Name, Salary), y = Salary, fill = Performance)) +

  geom_col(width = 0.65, color = "white") +

  geom_text(aes(label = Salary),
            vjust = -0.5,
            size = 4,
            fontface = "bold") +

  scale_fill_manual(values = c(
    "Very Good" = "#2E8B57",
    "Good" = "#66CDAA",
    "Average" = "#B2DF8A"
  )) +

  labs(
    title = "Employee Salary Distribution",
    subtitle = "Salary Based on Employee Performance",
    x = "Employee Name",
    y = "Salary",
    fill = "Performance Level"
  ) +

  theme_minimal() +

  theme(
    plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
    plot.subtitle = element_text(size = 12, color = "gray40", hjust = 0.5),
    axis.text = element_text(size = 11),
    legend.position = "top"
  )

1.2 Interpretation

The table and visualization show employee salaries, performance levels, and the bonuses they receive. Bonuses are calculated based on performance: Very Good employees receive 20% of their salary, Good receive 10%, and Average receive 5%. From the results, Nabil receives the highest bonus (2400) because he has the highest salary and a Very Good performance rating. Joan also receives a high bonus of 1400 due to her Very Good performance. Dwi and Bagas, who have Good performance, receive bonuses of 1000 and 500, while Alya receives the lowest bonus of 325 due to her Average performance. The bar chart shows the distribution of employee salaries and highlights performance levels using different colors.

2 for Loop (Gaji > 6000)

library(knitr)
library(kableExtra)

result_table <- data.frame()

for (i in 1:nrow(employee_data)) {
  
  emp_name <- employee_data$name[i]
  emp_salary <- employee_data$salary[i]
  
  if (emp_salary > 6000) {
    
    result_table <- rbind(
      result_table,
      data.frame(
        Name = emp_name,
        Salary = emp_salary
      )
    )
    
  }
}

kable(
  result_table,
  format = "html",
  caption = "Employees with Salary Greater Than 6000",
  align = "l"
) %>%
  kable_styling(full_width = TRUE) %>%
  row_spec(0, background = "#2e7d32", color = "white")

Employees with Salary Greater Than 6000
Name	Salary
Joan	7000
Alya	6500
Dwi	10000
Nabil	12000

2.1 Interpretation

The table shows employees whose salaries are greater than 6000. From the dataset, four employees meet this condition: Joan, Alya, Dwi, and Nabil. This indicates that most employees in the dataset earn salaries above 6000.

3 While Loop

library(knitr)
library(kableExtra)

i <- 1
result_table <- data.frame()

while (i <= nrow(employee_data)) {
  
  emp_name <- employee_data$name[i]
  emp_position <- employee_data$position[i]
  
  result_table <- rbind(
    result_table,
    data.frame(
      Name = emp_name,
      Position = emp_position
    )
  )
  
  if (emp_position == "Manager") {
    break
  }
  
  i <- i + 1
}

kable(
  result_table,
  format = "html",
  caption = "Employees Until Manager Found",
  align = c("l","l")
) %>%
  kable_styling(full_width = TRUE) %>%
  row_spec(0, background = "#2e7d32", color = "white")

Employees Until Manager Found
Name	Position
Bagas	Staff
Joan	Supervisor
Alya	Staff
Dwi	Manager

3.1 Interpretation

The table shows employee data from the beginning of the dataset until a Manager is encountered. Once the program finds an employee with the position Manager, the loop stops and no further data is processed.

4 Break (Gaji > 10000)

library(knitr)
library(kableExtra)

result_table <- data.frame()

for (i in 1:nrow(employee_data)) {
  
  emp_name <- employee_data$name[i]
  emp_salary <- employee_data$salary[i]
  
  if (emp_salary > 10000) {
    
    result_table <- rbind(
      result_table,
      data.frame(
        Name = emp_name,
        Salary = emp_salary,
        Status = "Salary above 10000 - Loop Stopped"
      )
    )
    
    break
  }
  
  result_table <- rbind(
    result_table,
    data.frame(
      Name = emp_name,
      Salary = emp_salary,
      Status = "Checked"
    )
  )
}

kable(
  result_table,
  caption = "Break Statement Example",
  align = c("l","l","l")
) %>%
  kable_styling(full_width = TRUE) %>%
  row_spec(0, background = "#2e7d32", color = "white")

Break Statement Example
Name	Salary	Status
Bagas	5000	Checked
Joan	7000	Checked
Alya	6500	Checked
Dwi	10000	Checked
Nabil	12000	Salary above 10000 - Loop Stopped

4.1 Interpretation

The provided table illustrates a break statement in action, where a loop processes a list of employees until a specific condition is met. The program successfully checks the first four employees because their salaries are 10,000 or less, marking them as “Checked.” However, upon reaching Nabil, the system detects a salary of 12,000, which triggers the condition salary > 10,000. This executes the break command, causing the loop to terminate immediately and stop any further processing. Essentially, the “break” acts as an emergency exit that shuts down the entire cycle the moment a target or limit is hit, ensuring the program doesn’t waste resources on remaining data.

5 Continue / Next

library(knitr)
library(kableExtra)

result_table <- data.frame()

for (i in 1:nrow(employee_data)) {
  
  emp_name <- employee_data$name[i]
  emp_perf <- employee_data$performance[i]
  
  if (emp_perf == "Average") {
    next
  }
  
  result_table <- rbind(
    result_table,
    data.frame(
      Name = emp_name,
      Performance = emp_perf
    )
  )
}

kable(
  result_table,
  format = "html",
  caption = "Employees with Performance Not Equal to Average",
  align = "l"
) %>%
  kable_styling(full_width = TRUE) %>%
  row_spec(0, background = "#2e7d32", color = "white")

Employees with Performance Not Equal to Average
Name	Performance
Bagas	Good
Joan	Very Good
Dwi	Good
Nabil	Very Good

5.1 Interpretation

The table shows employees whose performance is not categorized as Average. The program uses the next statement to skip employees with Average performance during the loop iteration. As a result, the employee named Alya is excluded from the table, while the remaining employees with Good and Very Good performance are displayed.

6 Conclusion

Overall, this practicum helps improve the understanding of how program control flow structures can be used to process data efficiently, make automatic decisions, and manage looping operations in data analysis using R.

7 References

[1] Siregar, B. (n.d.). Data Science Programming: Chapter 02: Syntax and Control Flow. dsciencelabs. https://bookdown.org/dsciencelabs/data_science_programming/02-Syntax-and-Control-Flow.html?authuser=0#practicum