Data Science Programming
Probability Distribution
1 Objective and Conditional Statements
1.1 Objective
Understand and implement conditional statements (if, if-else, if-elif-else).
Apply loops (for loop, while loop, break, continue) to analyze a dataset. Use the following dummy dataset
ID | Name | Age | Salary | Position | Performance |
---|---|---|---|---|---|
1 | Bagas | 25 | 5000 | Staff | Good |
2 | Joan | 30 | 7000 | Supervisor | Very Good |
3 | Alya | 27 | 6500 | Staff | Average |
4 | Dwi | 35 | 10000 | Manager | Good |
5 | Nabil | 40 | 12000 | Director | Very Good |
1.2 Conditional Statements
Determine bonus levels based on employee performance:
Very Good → 20% of salary
Good → 10% of salar
Average → 5% of salary
Your Task:
Write a program in Python and R to calculate each employee’s bonus.
Display the output in this format: “Name: Bagas, Bonus: 500”
# Dataset
data <- data.frame(
ID = c(1, 2, 3, 4, 5),
Name = c("Bagas", "Joan", "Alya", "Dwi", "Nabil"),
Age = c(25, 30, 27, 35, 40),
Salary = c(5000, 7000, 6500, 10000, 12000),
Position = c("Staff", "Supervisor", "Staff", "Manager", "Director"),
Performance = c("Good", "Very Good", "Average", "Good", "Very Good")
)
# Function to calculate bonus
calculate_bonus <- function(performance, salary) {
if (performance == "Very Good") {
return(salary * 0.2)
} else if (performance == "Good") {
return(salary * 0.1)
} else if (performance == "Average") {
return(salary * 0.05)
} else {
return(0)
}
}
# Apply function to calculate bonus for each employee
data$Bonus <- mapply(calculate_bonus, data$Performance, data$Salary)
# Display the result
for (i in 1:nrow(data)) {
cat(paste("Name:", data$Name[i], ", Bonus:", as.integer(data$Bonus[i]), "\n"))
}
## Name: Bagas , Bonus: 500
## Name: Joan , Bonus: 1400
## Name: Alya , Bonus: 325
## Name: Dwi , Bonus: 1000
## Name: Nabil , Bonus: 2400
Detailed Explanation: The code is written to calculate employee bonuses based on their Performance using Conditional Statements and Looping.
1.2.1 Creating Dataset: The dataset is converted into a DataFrame using the pandas library to make data analysis easier and more structured.
# Dataset
df <- data.frame(
ID = c(1, 2, 3, 4, 5),
Name = c("Bagas", "Joan", "Alya", "Dwi", "Nabil"),
Age = c(25, 30, 27, 35, 40),
Salary = c(5000, 7000, 6500, 10000, 12000),
Position = c("Staff", "Supervisor", "Staff", "Manager", "Director"),
Performance = c("Good", "Very Good", "Average", "Good", "Very Good"),
stringsAsFactors = FALSE
)
# Display Dataset
print(df)
## ID Name Age Salary Position Performance
## 1 1 Bagas 25 5000 Staff Good
## 2 2 Joan 30 7000 Supervisor Very Good
## 3 3 Alya 27 6500 Staff Average
## 4 4 Dwi 35 10000 Manager Good
## 5 5 Nabil 40 12000 Director Very Good
1.2.2 Conditional Statements (if-elif-else):
This part is used to determine the bonus based on the employee’s Performance.
# Function to calculate bonus
calculate_bonus <- function(performance, salary) {
if (performance == "Very Good") {
return(salary * 0.2)
} else if (performance == "Good") {
return(salary * 0.1)
} else if (performance == "Average") {
return(salary * 0.05)
} else {
return(0)
}
}
Explanation:
- if: Checks if the performance is “Very Good”, if true, the bonus is 20% of the salary.
- elif: Checks if the performance is “Good”, if true, the bonus is 10% of the salary.
- elif: Checks if the performance is “Average”, if true, the bonus is 5% of the salary.
- else: If none of the three categories match, the bonus is automatically 0.
1.2.3 4. Looping with iterrows():
- iterrows(): Loops through each row in the
DataFrame.
- F-string: Used to format the output.
- int(): Converts the bonus to an integer to remove decimals.
# Dataset
df <- data.frame(
ID = c(1, 2, 3, 4, 5),
Name = c("Bagas", "Joan", "Alya", "Dwi", "Nabil"),
Age = c(25, 30, 27, 35, 40),
Salary = c(5000, 7000, 6500, 10000, 12000),
Position = c("Staff", "Supervisor", "Staff", "Manager", "Director"),
Performance = c("Good", "Very Good", "Average", "Good", "Very Good"),
stringsAsFactors = FALSE
)
# Function to calculate bonus
calculate_bonus <- function(performance, salary) {
if (performance == "Very Good") {
return(salary * 0.2)
} else if (performance == "Good") {
return(salary * 0.1)
} else if (performance == "Average") {
return(salary * 0.05)
} else {
return(0)
}
}
# Apply bonus calculation
df$Bonus <- mapply(calculate_bonus, df$Performance, df$Salary)
# Looping to display Name and Bonus
for (i in 1:nrow(df)) {
cat("Name:", df$Name[i], ", Bonus:", as.integer(df$Bonus[i]), "\n")
}
## Name: Bagas , Bonus: 500
## Name: Joan , Bonus: 1400
## Name: Alya , Bonus: 325
## Name: Dwi , Bonus: 1000
## Name: Nabil , Bonus: 2400
Applied Concepts:
Concept | Explanation |
---|---|
Conditional Statements | if-elif-else to determine the bonus |
Loops | for loop using iterrows() |
Anonymous Function | lambda |
Vectorization | apply() for faster data manipulation |
Data Structure | DataFrame |
2 Loops (for & while)
- Use a for loop to list employees with a salary greater than 6000.
Expected Output:
Name: Joan, Salary: 7000
Name: Alya, Salary: 6500
Name: Dwi, Salary: 10000
Name: Nabil, Salary: 12000
- Use a while loop to display employees until a “Manager” is found.
Expected Output:
Name: Bagas, Position: Staff
Name: Joan, Position: Supervisor
Name: Alya, Position: Staff
Name: Dwi, Position: Manager (Stop here)
- Use break to stop the loop when an employee with a salary above 10,000 is found.
Expected Output:
Name: Bagas, Salary: 5000
Name: Joan, Salary: 7000
Name: Alya, Salary: 6500
Name: Dwi, Salary: 10000
(Stopped because Nabil has a salary above 10,000)
- Use continue to skip employees with “Average” performance.
Expected Output:
Name: Bagas, Performance: Good
Name: Joan, Performance: Very Good
Name: Dwi, Performance: Good
Name: Nabil, Performance: Very Good
(Alya is skipped because the performance is “Average”)
Submission Guidelines:
Submit your Python and R code using your Google Colab and Rpubs.
Ensure the output is displayed correctly.
Add comments in the code to explain your logic.
# Dataset
df <- data.frame(
ID = c(1, 2, 3, 4, 5),
Name = c("Bagas", "Joan", "Alya", "Dwi", "Nabil"),
Age = c(25, 30, 27, 35, 40),
Salary = c(5000, 7000, 6500, 10000, 12000),
Position = c("Staff", "Supervisor", "Staff", "Manager", "Director"),
Performance = c("Good", "Very Good", "Average", "Good", "Very Good"),
stringsAsFactors = FALSE
)
# For Loop - Salary greater than 6000
cat("\n--- For Loop Salary > 6000 ---\n")
##
## --- For Loop Salary > 6000 ---
for (i in 1:nrow(df)) {
if (df$Salary[i] > 6000) {
cat("Name:", df$Name[i], ", Salary:", df$Salary[i], "\n")
}
}
## Name: Joan , Salary: 7000
## Name: Alya , Salary: 6500
## Name: Dwi , Salary: 10000
## Name: Nabil , Salary: 12000
# While Loop - Display employees until "Manager" is found
cat("\n--- While Loop Until Manager ---\n")
##
## --- While Loop Until Manager ---
i <- 1
while (i <= nrow(df)) {
cat("Name:", df$Name[i], ", Position:", df$Position[i], "\n")
if (df$Position[i] == "Manager") {
cat("(Stop here)\n")
break
}
i <- i + 1
}
## Name: Bagas , Position: Staff
## Name: Joan , Position: Supervisor
## Name: Alya , Position: Staff
## Name: Dwi , Position: Manager
## (Stop here)
##
## --- Break Loop Salary > 10000 ---
for (i in 1:nrow(df)) {
cat("Name:", df$Name[i], ", Salary:", df$Salary[i], "\n")
if (df$Salary[i] > 10000) {
cat("(Stopped because salary above 10,000)\n")
break
}
}
## Name: Bagas , Salary: 5000
## Name: Joan , Salary: 7000
## Name: Alya , Salary: 6500
## Name: Dwi , Salary: 10000
## Name: Nabil , Salary: 12000
## (Stopped because salary above 10,000)
# Continue Loop - Skip "Average" Performance
cat("\n--- Continue to Skip Average Performance ---\n")
##
## --- Continue to Skip Average Performance ---
for (i in 1:nrow(df)) {
if (df$Performance[i] == "Average") {
next
}
cat("Name:", df$Name[i], ", Performance:", df$Performance[i], "\n")
}
## Name: Bagas , Performance: Good
## Name: Joan , Performance: Very Good
## Name: Dwi , Performance: Good
## Name: Nabil , Performance: Very Good
2.0.1 Additional Explanation:
Explanation of Loop Implementation
- For Loop to List Employees with Salary Greater Than 6000
The for
loop iterates over each row in the DataFrame
using the iterrows()
function. The if
condition checks whether the employee’s salary is greater than 6000. If
the condition is true, the employee’s name and salary are printed.
Code Explanation:
for index, row in df.iterrows()
: Iterates through each row of the DataFrame.if row["Salary"] > 6000
: Condition to filter employees with a salary greater than 6000.print()
: Displays the employee’s name and salary.
- While Loop to Display Employees Until “Manager” is Found
The while
loop iterates through the DataFrame until an
employee with the position “Manager” is found. The loop stops when the
condition inside the if
statement is satisfied, and the
break
statement is executed.
Code Explanation:
i = 0
: Initializes the loop counter.while i < len(df)
: Loops through the DataFrame until the index reaches the number of rows.df.loc[i, 'Position']
: Accesses the position of the current employee.if df.loc[i, "Position"] == "Manager"
: Condition to stop the loop.break
: Terminates the loop when the condition is met.
- Break Loop When Salary Above 10,000 is Found
The for
loop iterates through each row in the DataFrame.
The if
condition checks whether the employee’s salary is
above 10,000. When the condition is satisfied, the loop terminates using
the break
statement.
Code Explanation:
if row["Salary"] > 10000
: Condition to find the first employee with a salary greater than 10,000.break
: Stops the loop execution when the condition is met.
- Continue to Skip Employees with “Average” Performance
The continue
statement is used inside the
for
loop to skip employees whose performance is marked as
“Average”. The loop continues to the next iteration without executing
the remaining code in the current iteration.
Code Explanation:
if row["Performance"] == "Average"
: Condition to check for “Average” performance.continue
: Skips the current iteration and moves to the next row.print()
: Displays employee details if their performance is not “Average”.