Syntax and Control Flow - Programming Data Science

☀️ 🌙

Introduction

In programming and data analysis, writing code doesn’t always mean executing commands sequentially from top to bottom. Sometimes, we need a program that’s “intelligent” enough to make decisions or perform repetitive tasks automatically. This is where the basic concepts of Syntax and Control Flow play a crucial role.

Syntax: These are the grammatical rules in a programming language (such as R and Python) that determine how a command should be written so that it can be understood and executed by a computer.
Control Flow: These are the mechanisms that govern the order in which commands are executed within a program.

In this data analysis exercise, we explore two key pillars of control flow:

Conditional Statements: Using the logical statements if, elif (or else if in R), and else to evaluate data based on certain conditions. For example, determining the amount of an employee’s bonus based on their performance metrics.
Loops: Use for and while to efficiently iterate over rows of data within a dataset.
- Also use additional control statements such as break to forcibly terminate the loop when a condition is met (e.g., finding the Manager job title).
- And the continue (in Python) or ~next (in R) statements to skip certain iterations without stopping the entire process (e.g., ignoring average-performing employees).

By applying this syntax and control flow, we can transform raw data into structured information quickly, efficiently, and automatically.

Practicum

Independent Practicum: Conditional Statements and Loops in Python and R

1.1.1 Objectives

Understand and apply conditional statements (if, if-else, if-elif-else).
Apply loops (for loop, while loop, break, continue) to analyze a dataset.

Dataset

ID	Name	Age	Salary	Position	Performance
1	Bagas	25	5000	Staff	Good
2	Joan	30	7000	Supervisor	Very Good
3	Alya	27	6500	Staff	Average
4	Dwi	35	10000	Manager	Good
5	Nabil	40	12000	Director	Very Good

1.1.2 Conditional Statements

Determine the bonus level based on employee performance:

Excellent \(\rightarrow\) 20% of salary
Good \(\rightarrow\) 10% of salary
Average \(\rightarrow\) 5% of salary

Task:

Write a program in Python and R to calculate each employee’s bonus
Display the output in this format:

"Name: Bagas, Bonus: 500"

Answer

# 1. Dataset
dummy_dataset <- data.frame(
  ID = c(1, 2, 3, 4, 5),
  Name = c("Bagas", "Joan", "Alya", "Dwi", "Nabil"),
  Age = c(25, 30, 27, 35, 40),
  Salary = c(5000, 7000, 6500, 10000, 12000),
  Position = c("Staff", "Supervisor", "Staff", "Manager", "Director"),
  Performance = c("Good", "Very Good", "Average", "Good", "Very Good")
)

# 2. Loop and calculation
for (i in 1:nrow(dummy_dataset)) {
  name <- dummy_dataset$Name[i]
  salary <- dummy_dataset$Salary[i]
  performance <- dummy_dataset$Performance[i]
  
  if (performance == "Very Good") {
    bonus <- salary * 0.20
  } else if (performance == "Good") { 
    bonus <- salary * 0.10
  } else if (performance == "Average") {
    bonus <- salary * 0.05
  } else {
    bonus <- 0
  }
  
  # 3. Print the output
  cat(sprintf('"Name: %s, Bonus: %g"\n', name, bonus))
}

"Name: Bagas, Bonus: 500"
"Name: Joan, Bonus: 1400"
"Name: Alya, Bonus: 325"
"Name: Dwi, Bonus: 1000"
"Name: Nabil, Bonus: 2400"

Interpretation

The use of conditional statements in this analysis serves as an automated decision-making algorithm to objectively determine employee bonus amounts based on performance evaluation metrics. By defining structured business rules—namely, allocating a bonus of 20% of salary for “Very Good” performance, 10% for “Good,” and 5% for “Average”—the system is able to execute calculations on all rows of data simultaneously without the need for manual intervention. This approach not only ensures accuracy and eliminates the risk of human error, but also demonstrates how programming logic (vectorization) can simplify the process of drawing conclusions from data, making it much more efficient, faster, and readily implementable for larger datasets.

1.1.3 Loops (For & While)

Use a for loop to list employees with salaries greater than 6000.

Answer

# 1. Ensure the dummy dataset exists
dummy_dataset <- data.frame(
  ID = c(1, 2, 3, 4, 5),
  Name = c("Bagas", "Joan", "Alya", "Dwi", "Nabil"),
  Age = c(25, 30, 27, 35, 40),
  Salary = c(5000, 7000, 6500, 10000, 12000),
  Position = c("Staff", "Supervisor", "Staff", "Manager", "Director"),
  Performance = c("Good", "Very Good", "Average", "Good", "Very Good")
)

# 2. For loop to check each employee
for (i in 1:nrow(dummy_dataset)) {
  name <- dummy_dataset$Name[i]
  salary <- dummy_dataset$Salary[i]
  
  # 3. If condition to filter salaries strictly greater than 6000
  if (salary > 6000) {
    # Print the output without quotes around it
    cat(sprintf("Name: %s, Salary: %g\n", name, salary))
  }
}

Name: Joan, Salary: 7000
Name: Alya, Salary: 6500
Name: Dwi, Salary: 10000
Name: Nabil, Salary: 12000

Interpretation

The use of a for loop in this analysis serves as a structured iteration mechanism to automatically execute a set of instructions on each element or row of data. Instead of manually performing calculations one by one, this iterative algorithm systematically traverses the dataset from the first index to the last—for example, to evaluate the performance and calculate bonuses for each employee in turn. This approach is crucial in programming because it not only saves time and minimizes coding errors (human error), but also ensures that the same calculation logic is applied consistently, regardless of the amount of data being processed.

Use a while loop to display the employees until a “Manager” is found.

Answer

# 1. Ensure the dummy dataset exists
dummy_dataset <- data.frame(
  ID = c(1, 2, 3, 4, 5),
  Name = c("Bagas", "Joan", "Alya", "Dwi", "Nabil"),
  Age = c(25, 30, 27, 35, 40),
  Salary = c(5000, 7000, 6500, 10000, 12000),
  Position = c("Staff", "Supervisor", "Staff", "Manager", "Director"),
  Performance = c("Good", "Very Good", "Average", "Good", "Very Good")
)

# 2. Initialize the index variable for the while loop
i <- 1

# 3. Use a while loop to iterate through the dataset
while (i <= nrow(dummy_dataset)) {
  name <- dummy_dataset$Name[i]
  position <- dummy_dataset$Position[i]
  
  # 4. Check if the position is "Manager"
  if (position == "Manager") {
    cat(sprintf("Name: %s, Position: %s (Stop here)\n", name, position))
    break # Forcefully stop the loop because the Manager has been found
  } else {
    cat(sprintf("Name: %s, Position: %s\n", name, position))
  }
  
  # 5. Don't forget to increment the index to prevent an infinite loop
  i <- i + 1
}

Name: Bagas, Position: Staff
Name: Joan, Position: Supervisor
Name: Alya, Position: Staff
Name: Dwi, Position: Manager (Stop here)

Interpretation

The use of the while loop in this analysis functions as a dynamic iterative control structure, where code execution will continue as long as the test condition is true (TRUE). Unlike the for loop, which searches through all data elements with a fixed number of iterations, the while loop algorithm provides high computational flexibility when the iteration stop point is unknown in advance. This approach is ideal for data exploration scenarios—such as scanning a dataset row by row until a specific criterion is found (e.g., finding a specific position or value)—so the system can immediately stop the search process. This makes computation more efficient because memory only processes relevant data until the specific goal of the algorithm is achieved.

Use break to stop the loop when an employee with a salary above 10,000 is found.

Answer

# 1. Ensure the dummy dataset is in memory
dummy_dataset <- data.frame(
  ID = c(1, 2, 3, 4, 5),
  Name = c("Bagas", "Joan", "Alya", "Dwi", "Nabil"),
  Age = c(25, 30, 27, 35, 40),
  Salary = c(5000, 7000, 6500, 10000, 12000),
  Position = c("Staff", "Supervisor", "Staff", "Manager", "Director"),
  Performance = c("Good", "Very Good", "Average", "Good", "Very Good")
)

# 2. Use a for loop to iterate through the dataset
for (i in 1:nrow(dummy_dataset)) {
  name <- dummy_dataset$Name[i]
  salary <- dummy_dataset$Salary[i]
  
  # 3. Check if the salary is strictly greater than 10,000
  if (salary > 10000) {
    # If yes, print the stop message and use break to exit the loop
    cat(sprintf("(Stopped because %s has a salary above 10,000)\n", name))
    break
  } else {
    # If not (salary 10,000 or below), print the name and salary
    cat(sprintf("Name: %s, Salary: %g\n", name, salary))
  }
}

Name: Bagas, Salary: 5000
Name: Joan, Salary: 7000
Name: Alya, Salary: 6500
Name: Dwi, Salary: 10000
(Stopped because Nabil has a salary above 10,000)

Interpretatation

The use of the break statement in this analysis serves as a control flow interrupt mechanism to instantly stop the looping process when a specific condition or target has been met. Instead of letting an iteration algorithm (such as a for or while loop) blindly execute the entire dataset to the end, break allows the system to immediately exit the looping block once the search criteria are successfully identified—for example, when finding a specific job position in the data. The implementation of this strategy is essential in data programming because it not only prevents the processing of redundant data rows but also significantly optimizes the efficiency of computational time, especially when dealing with large-dimensional datasets.

Use continue employees with “Average” performance.

Answer

# 1. Ensure the dataset exists (using the English version)
dummy_dataset <- data.frame(
  ID = c(1, 2, 3, 4, 5),
  Name = c("Bagas", "Joan", "Alya", "Dwi", "Nabil"),
  Age = c(25, 30, 27, 35, 40),
  Salary = c(5000, 7000, 6500, 10000, 12000),
  Position = c("Staff", "Supervisor", "Staff", "Manager", "Director"),
  Performance = c("Good", "Very Good", "Average", "Good", "Very Good")
)

# 2. Prepare empty containers to combine all output texts
output_text <- ""
skip_message <- ""

# 3. Use a for loop to iterate through the dataset
for (i in 1:nrow(dummy_dataset)) {
  name <- dummy_dataset$Name[i]
  performance <- dummy_dataset$Performance[i]
  
  # 4. Check for the "Average" condition
  if (performance == "Average") {
    # Store the skip message (without a newline at the end)
    skip_message <- sprintf('(%s is skipped because the performance is "Average")', name)
    
    # Use 'next' to skip to the next iteration (equivalent to 'continue' in Python)
    next 
  }
  
  # Append the employee data to the output_text container
  output_text <- paste0(output_text, sprintf("Name: %s, Performance: %s\n", name, performance))
}

# 5. Combine the main text with the skip message, then print ONLY ONCE at the end
final_result <- paste0(output_text, skip_message)
cat(final_result)

Name: Bagas, Performance: Good
Name: Joan, Performance: Very Good
Name: Dwi, Performance: Good
Name: Nabil, Performance: Very Good
(Alya is skipped because the performance is "Average")

Interpretation

The use of the next (or continue) statement in this analysis serves as a selective control mechanism to skip a specific iteration round without stopping the entire looping process. Unlike break which completely terminates the algorithm, next instructs the system to ignore the rest of the code execution on the data row being evaluated and immediately jump to the next data row. This approach is very effective in the data preprocessing stage, for example to skip rows containing empty values (missing values), anomalies, or data that does not meet the calculation prerequisites. By implementing this logic, code execution becomes much more robust and avoids potential calculation errors, while ensuring that all remaining valid datasets are successfully processed until the end.

Conclusion

In conclusion, a solid understanding of syntax and control flow is a crucial foundation for data processing using Python or R.

Based on the code implementation, we can conclude several key points:

Decision Automation: The if-else code block allows us to map complex business logic (such as bonus hierarchies) directly into the program, reducing the risk of human error in manual calculations.
Data Processing Efficiency: The use of for and while loops simplifies mass data extraction and manipulation. We don’t need to write code repeatedly for each row of employee data.
Data Navigation Flexibility: Control statements like break and continue / next give you complete control over how the data is filtered. We can easily stop the search when a target is found, or discard data anomalies mid-evaluation.

Mastering this control flow not only makes code more concise and readable, but also optimizes program performance when having to handle datasets on a much larger scale.

References

[1] Siregar, B. (n.d.). Syntax and Control Flow. In Data Science Programming. Data Science Labs. https://bookdown.org/dsciencelabs/data_science_programming/02-Syntax-and-Control-Flow.html

Syntax and Control Flow - Programming Data Science