Data Science Programming

Probability Distribution

1 Objective and Conditional Statements

1.1 Objective

  • Understand and implement conditional statements (if, if-else, if-elif-else).

  • Apply loops (for loop, while loop, break, continue) to analyze a dataset. Use the following dummy dataset

ID Name Age Salary Position Performance
1 Bagas 25 5000 Staff Good
2 Joan 30 7000 Supervisor Very Good
3 Alya 27 6500 Staff Average
4 Dwi 35 10000 Manager Good
5 Nabil 40 12000 Director Very Good

1.2 Conditional Statements

Determine bonus levels based on employee performance:

  • Very Good → 20% of salary

  • Good → 10% of salar

  • Average → 5% of salary


Your Task:

  • Write a program in Python and R to calculate each employee’s bonus.

  • Display the output in this format: “Name: Bagas, Bonus: 500”

# Dataset
data <- data.frame(
  ID = c(1, 2, 3, 4, 5),
  Name = c("Bagas", "Joan", "Alya", "Dwi", "Nabil"),
  Age = c(25, 30, 27, 35, 40),
  Salary = c(5000, 7000, 6500, 10000, 12000),
  Position = c("Staff", "Supervisor", "Staff", "Manager", "Director"),
  Performance = c("Good", "Very Good", "Average", "Good", "Very Good")
)

# Function to calculate bonus
calculate_bonus <- function(performance, salary) {
  if (performance == "Very Good") {
    return(salary * 0.2)
  } else if (performance == "Good") {
    return(salary * 0.1)
  } else if (performance == "Average") {
    return(salary * 0.05)
  } else {
    return(0)
  }
}

# Apply function to calculate bonus for each employee
data$Bonus <- mapply(calculate_bonus, data$Performance, data$Salary)

# Display the result
for (i in 1:nrow(data)) {
  cat(paste("Name:", data$Name[i], ", Bonus:", as.integer(data$Bonus[i]), "\n"))
}
## Name: Bagas , Bonus: 500 
## Name: Joan , Bonus: 1400 
## Name: Alya , Bonus: 325 
## Name: Dwi , Bonus: 1000 
## Name: Nabil , Bonus: 2400

Detailed Explanation: The code is written to calculate employee bonuses based on their Performance using Conditional Statements and Looping.

1.2.1 Creating Dataset: The dataset is converted into a DataFrame using the pandas library to make data analysis easier and more structured.

# Dataset
df <- data.frame(
  ID = c(1, 2, 3, 4, 5),
  Name = c("Bagas", "Joan", "Alya", "Dwi", "Nabil"),
  Age = c(25, 30, 27, 35, 40),
  Salary = c(5000, 7000, 6500, 10000, 12000),
  Position = c("Staff", "Supervisor", "Staff", "Manager", "Director"),
  Performance = c("Good", "Very Good", "Average", "Good", "Very Good"),
  stringsAsFactors = FALSE
)

# Display Dataset
print(df)
##   ID  Name Age Salary   Position Performance
## 1  1 Bagas  25   5000      Staff        Good
## 2  2  Joan  30   7000 Supervisor   Very Good
## 3  3  Alya  27   6500      Staff     Average
## 4  4   Dwi  35  10000    Manager        Good
## 5  5 Nabil  40  12000   Director   Very Good

1.2.2 Conditional Statements (if-elif-else):

This part is used to determine the bonus based on the employee’s Performance.

# Function to calculate bonus
calculate_bonus <- function(performance, salary) {
  if (performance == "Very Good") {
    return(salary * 0.2)
  } else if (performance == "Good") {
    return(salary * 0.1)
  } else if (performance == "Average") {
    return(salary * 0.05)
  } else {
    return(0)
  }
}

Explanation:

  • if: Checks if the performance is “Very Good”, if true, the bonus is 20% of the salary.
  • elif: Checks if the performance is “Good”, if true, the bonus is 10% of the salary.
  • elif: Checks if the performance is “Average”, if true, the bonus is 5% of the salary.
  • else: If none of the three categories match, the bonus is automatically 0.

1.2.3 4. Looping with iterrows():

  • iterrows(): Loops through each row in the DataFrame.
  • F-string: Used to format the output.
  • int(): Converts the bonus to an integer to remove decimals.
# Dataset
df <- data.frame(
  ID = c(1, 2, 3, 4, 5),
  Name = c("Bagas", "Joan", "Alya", "Dwi", "Nabil"),
  Age = c(25, 30, 27, 35, 40),
  Salary = c(5000, 7000, 6500, 10000, 12000),
  Position = c("Staff", "Supervisor", "Staff", "Manager", "Director"),
  Performance = c("Good", "Very Good", "Average", "Good", "Very Good"),
  stringsAsFactors = FALSE
)

# Function to calculate bonus
calculate_bonus <- function(performance, salary) {
  if (performance == "Very Good") {
    return(salary * 0.2)
  } else if (performance == "Good") {
    return(salary * 0.1)
  } else if (performance == "Average") {
    return(salary * 0.05)
  } else {
    return(0)
  }
}

# Apply bonus calculation
df$Bonus <- mapply(calculate_bonus, df$Performance, df$Salary)

# Looping to display Name and Bonus
for (i in 1:nrow(df)) {
  cat("Name:", df$Name[i], ", Bonus:", as.integer(df$Bonus[i]), "\n")
}
## Name: Bagas , Bonus: 500 
## Name: Joan , Bonus: 1400 
## Name: Alya , Bonus: 325 
## Name: Dwi , Bonus: 1000 
## Name: Nabil , Bonus: 2400

Applied Concepts:

Concept Explanation
Conditional Statements if-elif-else to determine the bonus
Loops for loop using iterrows()
Anonymous Function lambda
Vectorization apply() for faster data manipulation
Data Structure DataFrame

2 Loops (for & while)

  1. Use a for loop to list employees with a salary greater than 6000.

Expected Output:

  • Name: Joan, Salary: 7000

  • Name: Alya, Salary: 6500

  • Name: Dwi, Salary: 10000

  • Name: Nabil, Salary: 12000

  1. Use a while loop to display employees until a “Manager” is found.

Expected Output:

  • Name: Bagas, Position: Staff

  • Name: Joan, Position: Supervisor

  • Name: Alya, Position: Staff

  • Name: Dwi, Position: Manager (Stop here)

  1. Use break to stop the loop when an employee with a salary above 10,000 is found.

Expected Output:

  • Name: Bagas, Salary: 5000

  • Name: Joan, Salary: 7000

  • Name: Alya, Salary: 6500

  • Name: Dwi, Salary: 10000

(Stopped because Nabil has a salary above 10,000)

  1. Use continue to skip employees with “Average” performance.

Expected Output:

  • Name: Bagas, Performance: Good

  • Name: Joan, Performance: Very Good

  • Name: Dwi, Performance: Good

  • Name: Nabil, Performance: Very Good

(Alya is skipped because the performance is “Average”)

Submission Guidelines:

  1. Submit your Python and R code using your Google Colab and Rpubs.

  2. Ensure the output is displayed correctly.

  3. Add comments in the code to explain your logic.


# Dataset
df <- data.frame(
  ID = c(1, 2, 3, 4, 5),
  Name = c("Bagas", "Joan", "Alya", "Dwi", "Nabil"),
  Age = c(25, 30, 27, 35, 40),
  Salary = c(5000, 7000, 6500, 10000, 12000),
  Position = c("Staff", "Supervisor", "Staff", "Manager", "Director"),
  Performance = c("Good", "Very Good", "Average", "Good", "Very Good"),
  stringsAsFactors = FALSE
)

# For Loop - Salary greater than 6000
cat("\n--- For Loop Salary > 6000 ---\n")
## 
## --- For Loop Salary > 6000 ---
for (i in 1:nrow(df)) {
  if (df$Salary[i] > 6000) {
    cat("Name:", df$Name[i], ", Salary:", df$Salary[i], "\n")
  }
}
## Name: Joan , Salary: 7000 
## Name: Alya , Salary: 6500 
## Name: Dwi , Salary: 10000 
## Name: Nabil , Salary: 12000
# While Loop - Display employees until "Manager" is found
cat("\n--- While Loop Until Manager ---\n")
## 
## --- While Loop Until Manager ---
i <- 1
while (i <= nrow(df)) {
  cat("Name:", df$Name[i], ", Position:", df$Position[i], "\n")
  if (df$Position[i] == "Manager") {
    cat("(Stop here)\n")
    break
  }
  i <- i + 1
}
## Name: Bagas , Position: Staff 
## Name: Joan , Position: Supervisor 
## Name: Alya , Position: Staff 
## Name: Dwi , Position: Manager 
## (Stop here)
# Break Loop - Stop when salary above 10000
cat("\n--- Break Loop Salary > 10000 ---\n")
## 
## --- Break Loop Salary > 10000 ---
for (i in 1:nrow(df)) {
  cat("Name:", df$Name[i], ", Salary:", df$Salary[i], "\n")
  if (df$Salary[i] > 10000) {
    cat("(Stopped because salary above 10,000)\n")
    break
  }
}
## Name: Bagas , Salary: 5000 
## Name: Joan , Salary: 7000 
## Name: Alya , Salary: 6500 
## Name: Dwi , Salary: 10000 
## Name: Nabil , Salary: 12000 
## (Stopped because salary above 10,000)
# Continue Loop - Skip "Average" Performance
cat("\n--- Continue to Skip Average Performance ---\n")
## 
## --- Continue to Skip Average Performance ---
for (i in 1:nrow(df)) {
  if (df$Performance[i] == "Average") {
    next
  }
  cat("Name:", df$Name[i], ", Performance:", df$Performance[i], "\n")
}
## Name: Bagas , Performance: Good 
## Name: Joan , Performance: Very Good 
## Name: Dwi , Performance: Good 
## Name: Nabil , Performance: Very Good

2.0.1 Additional Explanation:

Explanation of Loop Implementation

  1. For Loop to List Employees with Salary Greater Than 6000

The for loop iterates over each row in the DataFrame using the iterrows() function. The if condition checks whether the employee’s salary is greater than 6000. If the condition is true, the employee’s name and salary are printed.

Code Explanation:

  • for index, row in df.iterrows(): Iterates through each row of the DataFrame.

  • if row["Salary"] > 6000: Condition to filter employees with a salary greater than 6000.

  • print(): Displays the employee’s name and salary.

  1. While Loop to Display Employees Until “Manager” is Found

The while loop iterates through the DataFrame until an employee with the position “Manager” is found. The loop stops when the condition inside the if statement is satisfied, and the break statement is executed.

Code Explanation:

  • i = 0: Initializes the loop counter.

  • while i < len(df): Loops through the DataFrame until the index reaches the number of rows.

  • df.loc[i, 'Position']: Accesses the position of the current employee.

  • if df.loc[i, "Position"] == "Manager": Condition to stop the loop.

  • break: Terminates the loop when the condition is met.

  1. Break Loop When Salary Above 10,000 is Found

The for loop iterates through each row in the DataFrame. The if condition checks whether the employee’s salary is above 10,000. When the condition is satisfied, the loop terminates using the break statement.

Code Explanation:

  • if row["Salary"] > 10000: Condition to find the first employee with a salary greater than 10,000.

  • break: Stops the loop execution when the condition is met.

  1. Continue to Skip Employees with “Average” Performance

The continue statement is used inside the for loop to skip employees whose performance is marked as “Average”. The loop continues to the next iteration without executing the remaining code in the current iteration.

Code Explanation:

  • if row["Performance"] == "Average": Condition to check for “Average” performance.

  • continue: Skips the current iteration and moves to the next row.

  • print(): Displays employee details if their performance is not “Average”.