Data Science Programming

Assignment ~ Week 4 ~

Jihan Ramadhani Deandri
Data Science undergraduate student

R Programming Data Science Statistics

INTRODUCTION

This practicum aims to train the implementation of Control Flow logic in Python and R programming languages. The main focus of this material is the ability of programs to make decisions automatically and perform repetitive tasks on data. In this session, two main competencies are developed:

  • Mastering conditional statements (if, if-else, if-else if-else)
  • Applying various looping structures (for, while, break, next)

Conditional Statement
Logic used for decision making based on certain conditions
For Loop
Iteration over a collection of data with a known number of repetitions
While Loop
A loop that runs while a specified condition remains true
Break
Stops the loop immediately when a certain condition is met
Continue / Next
Skips the current iteration and continues with the next iteration of the loop

Employee Dataset

name age salary position performance
Bagas 25 5000 Staff Good
Joan 30 7000 Supervisor Very Good
Alya 27 6500 Staff Average
Dwi 35 10000 Manager Good
Nabil 40 12000 Director Very Good

1 Conditional Statement

In the module, it is explained that conditional statements are used to determine the program’s actions based on certain conditions.

In this practicum, employee bonuses are determined based on performance:

  • Very Good → 20% of the salary

  • Good → 10% of the salary

  • Average → 5% of the salary.

library(knitr)
library(kableExtra)

name <- c("Bagas","Joan","Alya","Dwi","Nabil")
age <- c(25,30,27,35,40)
salary <- c(5000,7000,6500,10000,12000)
position <- c("Staff","Supervisor","Staff","Manager","Director")
performance <- c("Good","Very Good","Average","Good","Very Good")

employee_data <- data.frame(name, age, salary, position, performance)

bonus_result <- data.frame()

for (i in 1:nrow(employee_data)) {
  
  emp_name <- employee_data$name[i]
  emp_salary <- employee_data$salary[i]
  emp_perf <- employee_data$performance[i]
  
  if (emp_perf == "Very Good") {
    bonus <- 0.20 * emp_salary
    
  } else if (emp_perf == "Good") {
    bonus <- 0.10 * emp_salary
    
  } else if (emp_perf == "Average") {
    bonus <- 0.05 * emp_salary
    
  } else {
    bonus <- 0
  }
  
  bonus_result <- rbind(
    bonus_result,
    data.frame(
      Name = emp_name,
      Salary = emp_salary,
      Performance = emp_perf,
      Bonus = bonus
    )
  )
}

knitr::kable(
  bonus_result,
  format = "html",
  caption = "Employee Bonus Calculation",
  align = "l"
) %>%
  kable_styling(full_width = TRUE) %>%
  row_spec(0, background = "#2e7d32", color = "white")
Employee Bonus Calculation
Name Salary Performance Bonus
Bagas 5000 Good 500
Joan 7000 Very Good 1400
Alya 6500 Average 325
Dwi 10000 Good 1000
Nabil 12000 Very Good 2400

1.1 visualization

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2
ggplot(bonus_result, aes(x = reorder(Name, Salary), y = Salary, fill = Performance)) +

  geom_col(width = 0.65, color = "white") +

  geom_text(aes(label = Salary),
            vjust = -0.5,
            size = 4,
            fontface = "bold") +

  scale_fill_manual(values = c(
    "Very Good" = "#2E8B57",
    "Good" = "#66CDAA",
    "Average" = "#B2DF8A"
  )) +

  labs(
    title = "Employee Salary Distribution",
    subtitle = "Salary Based on Employee Performance",
    x = "Employee Name",
    y = "Salary",
    fill = "Performance Level"
  ) +

  theme_minimal() +

  theme(
    plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
    plot.subtitle = element_text(size = 12, color = "gray40", hjust = 0.5),
    axis.text = element_text(size = 11),
    legend.position = "top"
  )

1.2 Interpretation

The table and visualization show employee salaries, performance levels, and the bonuses they receive. Bonuses are calculated based on performance: Very Good employees receive 20% of their salary, Good receive 10%, and Average receive 5%. From the results, Nabil receives the highest bonus (2400) because he has the highest salary and a Very Good performance rating. Joan also receives a high bonus of 1400 due to her Very Good performance. Dwi and Bagas, who have Good performance, receive bonuses of 1000 and 500, while Alya receives the lowest bonus of 325 due to her Average performance. The bar chart shows the distribution of employee salaries and highlights performance levels using different colors.

2 for Loop (Gaji > 6000)

library(knitr)
library(kableExtra)

result_table <- data.frame()

for (i in 1:nrow(employee_data)) {
  
  emp_name <- employee_data$name[i]
  emp_salary <- employee_data$salary[i]
  
  if (emp_salary > 6000) {
    
    result_table <- rbind(
      result_table,
      data.frame(
        Name = emp_name,
        Salary = emp_salary
      )
    )
    
  }
}

kable(
  result_table,
  format = "html",
  caption = "Employees with Salary Greater Than 6000",
  align = "l"
) %>%
  kable_styling(full_width = TRUE) %>%
  row_spec(0, background = "#2e7d32", color = "white")
Employees with Salary Greater Than 6000
Name Salary
Joan 7000
Alya 6500
Dwi 10000
Nabil 12000

2.1 Interpretation

The table shows employees whose salaries are greater than 6000. From the dataset, four employees meet this condition: Joan, Alya, Dwi, and Nabil. This indicates that most employees in the dataset earn salaries above 6000.

3 While Loop

library(knitr)
library(kableExtra)

i <- 1
result_table <- data.frame()

while (i <= nrow(employee_data)) {
  
  emp_name <- employee_data$name[i]
  emp_position <- employee_data$position[i]
  
  result_table <- rbind(
    result_table,
    data.frame(
      Name = emp_name,
      Position = emp_position
    )
  )
  
  if (emp_position == "Manager") {
    break
  }
  
  i <- i + 1
}

kable(
  result_table,
  format = "html",
  caption = "Employees Until Manager Found",
  align = c("l","l")
) %>%
  kable_styling(full_width = TRUE) %>%
  row_spec(0, background = "#2e7d32", color = "white")
Employees Until Manager Found
Name Position
Bagas Staff
Joan Supervisor
Alya Staff
Dwi Manager

3.1 Interpretation

The table shows employee data from the beginning of the dataset until a Manager is encountered. Once the program finds an employee with the position Manager, the loop stops and no further data is processed.

4 Break (Gaji > 10000)

library(knitr)
library(kableExtra)

result_table <- data.frame()

for (i in 1:nrow(employee_data)) {
  
  emp_name <- employee_data$name[i]
  emp_salary <- employee_data$salary[i]
  
  if (emp_salary > 10000) {
    
    result_table <- rbind(
      result_table,
      data.frame(
        Name = emp_name,
        Salary = emp_salary,
        Status = "Salary above 10000 - Loop Stopped"
      )
    )
    
    break
  }
  
  result_table <- rbind(
    result_table,
    data.frame(
      Name = emp_name,
      Salary = emp_salary,
      Status = "Checked"
    )
  )
}

kable(
  result_table,
  caption = "Break Statement Example",
  align = c("l","l","l")
) %>%
  kable_styling(full_width = TRUE) %>%
  row_spec(0, background = "#2e7d32", color = "white")
Break Statement Example
Name Salary Status
Bagas 5000 Checked
Joan 7000 Checked
Alya 6500 Checked
Dwi 10000 Checked
Nabil 12000 Salary above 10000 - Loop Stopped

4.1 Interpretation

The provided table illustrates a break statement in action, where a loop processes a list of employees until a specific condition is met. The program successfully checks the first four employees because their salaries are 10,000 or less, marking them as “Checked.” However, upon reaching Nabil, the system detects a salary of 12,000, which triggers the condition salary > 10,000. This executes the break command, causing the loop to terminate immediately and stop any further processing. Essentially, the “break” acts as an emergency exit that shuts down the entire cycle the moment a target or limit is hit, ensuring the program doesn’t waste resources on remaining data.

5 Continue / Next

library(knitr)
library(kableExtra)

result_table <- data.frame()

for (i in 1:nrow(employee_data)) {
  
  emp_name <- employee_data$name[i]
  emp_perf <- employee_data$performance[i]
  
  if (emp_perf == "Average") {
    next
  }
  
  result_table <- rbind(
    result_table,
    data.frame(
      Name = emp_name,
      Performance = emp_perf
    )
  )
}

kable(
  result_table,
  format = "html",
  caption = "Employees with Performance Not Equal to Average",
  align = "l"
) %>%
  kable_styling(full_width = TRUE) %>%
  row_spec(0, background = "#2e7d32", color = "white")
Employees with Performance Not Equal to Average
Name Performance
Bagas Good
Joan Very Good
Dwi Good
Nabil Very Good

5.1 Interpretation

The table shows employees whose performance is not categorized as Average. The program uses the next statement to skip employees with Average performance during the loop iteration. As a result, the employee named Alya is excluded from the table, while the remaining employees with Good and Very Good performance are displayed.

6 Conclusion

Overall, this practicum helps improve the understanding of how program control flow structures can be used to process data efficiently, make automatic decisions, and manage looping operations in data analysis using R.

7 References

[1] Siregar, B. (n.d.). Data Science Programming: Chapter 02: Syntax and Control Flow. dsciencelabs. https://bookdown.org/dsciencelabs/data_science_programming/02-Syntax-and-Control-Flow.html?authuser=0#practicum