Advanced Practicum:Functions & Loops + Data Science
Functions & Loops + Data Science~ Week 5
<div class="logo-inner">
<img src="C:/Users/Nurul Iffah/Downloads/fotoku.jpg" alt="Nurul Iffah">
</div>
NURUL IFFAH
Program Studi
Sains Data
Universitas
INSTITUT TEKNOLOGI SAINS BANDUNG
Dosen Pengampu
Bakti Siregar, M Sc., CSD
1 Dynamic Multi Formula Function
1.1 Introduction
In this task, a dynamic function is created to compute several types of mathematical formulas, including linear, quadratic, cubic, and exponential functions. The purpose of this task is to understand the use of functions, nested loops, and conditional logic in R programming, as well as to visualize the results using graphs.
1.2 Function Implementation
| Formula | Expression | Description |
|---|---|---|
| Linear | 2x + 3 | Increases at a constant rate |
| Quadratic | x^2 + 2x + 1 | Forms a curved parabola |
| Cubic | x^3 | Grows faster than quadratic |
| Exponential | 2^x | Grows very rapidly |
Explanation: This function calculates values based on the selected formula type. The if-else structure is used to choose the formula, while stop() is used to validate the input and prevent errors.
1.3 Calculation Using Nested Loops
x_values <- 1:20
formulas <- c("linear", "quadratic", "cubic", "exponential")
results <- list()
for (f in formulas) {
y_values <- c()
for (x in x_values) {
y <- compute_formula(x, f)
y_values <- c(y_values, y)
}
results[[f]] <- y_values
}Explanation: Nested loops are used to calculate multiple formulas efficiently. The outer loop goes through each formula, while the inner loop computes values for x from 1 to 20. The results are stored in a list.
1.4 Visualization
1.5 Interprentation
Based on the visualization, each function shows a different growth pattern. The linear function increases at a constant rate. The quadratic function grows faster and forms a curved shape. The cubic function increases even more sharply, especially for larger x values. Meanwhile, the exponential function grows the fastest, showing a very steep increase as x becomes larger.
This shows that higher-level functions produce faster growth.
1.6 Conclusion
The function successfully computes different types of mathematical formulas using a dynamic approach. Nested loops help process multiple calculations efficiently. The visualization also makes it easier to understand the differences between each function.
2 Nested Simulation: Multi Sales & Discounts
2.1 Introduction
In this task, we simulate sales data for multiple salespersons over several days. The goal is to apply nested loops, conditional logic, and functions to generate realistic sales data, calculate cumulative sales, and apply discounts based on sales performance.
2.2 Funcion Implementation
| Component | Description |
|---|---|
| Nested Function | Calculates cumulative sales per salesperson |
| Outer Loop | Loops through each salesperson |
| Inner Loop | Loops through each day |
| Discount Logic | Applies discount based on sales amount |
| Cumulative Calculation | Tracks total sales over time |
Explanation: This function generates sales data using nested loops. The outer loop iterates through each salesperson, while the inner loop represents daily sales. Conditional logic is used to assign discount rates based on the sales amount. Cumulative sales are calculated by adding daily sales over time.
2.3 Data Generation
| sales_id | sales_amount | cumulative_sales |
|---|---|---|
| 1 | 490.5714 | 2535.429 |
| 2 | 632.1429 | 2409.714 |
| 3 | 433.2857 | 1249.286 |
| 4 | 555.4286 | 1885.143 |
| 5 | 727.7143 | 3107.143 |
Explanation: This step generates data for 5 salespersons over 5 days. The output includes sales amount, discount rate, and cumulative sales.
2.4 Visualization
2.5 Interpretation
The line chart shows the cumulative sales of each salesperson over time. Each line represents one salesperson, and the upward trend indicates that total sales increase day by day.
Some salespersons show a steeper increase, which means they achieve higher sales in a shorter period. Others have a more gradual growth, indicating lower or more consistent daily sales.
Overall, the differences in the lines reflect variations in performance among salespersons. The chart clearly shows how each individual progresses over time and allows easy comparison of their total sales.
2.6 Conclusion
The simulation successfully models real-world sales data using nested loops and conditional logic. It also demonstrates how cumulative performance can be tracked and visualized effectively. This approach helps in understanding sales trends and comparing performance across individuals.
3 Multi Level Performance Categorization
3.1 Introduction
In this task, student performance is categorized into multiple levels based on their scores. The purpose is to transform numerical data into meaningful categories and analyze the distribution using percentages and visualizations.
3.2 Function Implementation
| Category | Sales_Range | Description |
|---|---|---|
| Excellent | >= 900 | Very high sales performance |
| Very Good | 700–899 | Above average sales |
| Good | 500–699 | Moderate sales performance |
| Average | 300–499 | Below average performance |
| Poor | < 300 | Low sales performance |
Explanation: This function categorizes each score into five levels. A loop is used to process each value, and conditional statements determine the appropriate category.
3.3 Visualization
3.4 Interprentation
The bar chart presents the distribution of sales performance categories in descending order. The category with the highest frequency appears first, indicating that most sales fall within this performance level.
The following categories show lower frequencies, which means fewer sales are classified in those groups. This suggests that while some performance levels are dominant, others occur less frequently.
Overall, the visualization highlights the imbalance in sales distribution, where certain performance categories are more common than others. The sorted arrangement makes it easier to identify which categories contribute the most to overall sales performance.
4 Multi-Company Dataset Simulation
4.1 Introduction
In this task, a dataset is generated to simulate multiple companies and their employees. Each employee has attributes such as salary, department, performance score, and KPI score. The goal is to analyze company performance using summary statistics and visualize the results.
4.2 Function Imlementation
| Component | Description |
|---|---|
| Nested Loop | Loops through companies and employees |
| Company ID | Identifies each company |
| Employee Data | Generates salary and employee ID |
| Department Assignment | Assigns department randomly |
| Performance Metrics | Generates performance and KPI scores |
Explanation:
The function generates a dataset using nested loops. The outer loop represents companies, while the inner loop represents employees within each company. Each employee is assigned a salary, department, and performance scores. This simulates a real-world company dataset.
4.3 Data Generate
| company_id | salary | performance_score | KPI_score |
|---|---|---|---|
| 1 | 5952.6 | 80.9 | 73.9 |
| 2 | 5879.6 | 80.7 | 75.4 |
| 3 | 6787.8 | 82.3 | 79.3 |
| 4 | 5576.9 | 81.5 | 72.4 |
| 5 | 6503.7 | 68.3 | 70.7 |
4.4 Visualization
4.5 Interprentation
The bar chart shows the average KPI scores for each company in descending order. The company with the highest KPI score appears first, indicating better overall performance. Other companies show lower KPI values, reflecting differences in employee performance and productivity.
This visualization makes it easy to compare company performance and identify which company performs the best. The variation in KPI scores suggests that performance is not evenly distributed across companies.
4.6 Conclusion
The dataset successfully simulates multiple companies and employees using nested loops. The summary analysis and visualization provide insights into company performance. The bar chart helps identify which company has the highest average KPI score.
5 Monte Carlo Simulation: Pi & Probability
5.1 Introduction
In this task, a Monte Carlo simulation is used to estimate the value of π (pi) and analyze probability. Random points are generated within a square, and points that fall inside a circle are counted. Additionally, the probability of points falling within a smaller sub-square is calculated.
5.2 Function Implementation
| Component | Description |
|---|---|
| Loop | Generates points iteratively |
| Random Points | Random (x,y) between -1 and 1 |
| Circle Check | Checks if point is inside circle |
| Sub-square Check | Checks if point is inside smaller square |
| Pi Calculation | Estimates pi using Monte Carlo method |
Explanation:
The function generates random points using a loop. Each point is checked whether it lies inside the unit circle or not. The number of points inside the circle is used to estimate π. Additionally, the function calculates the probability of points falling within a smaller square region.
5.3 Visualization
5.4 Interprentation
The scatter plot shows randomly generated points within a square. Points inside the circle are displayed in one color, while points outside are shown in another. The circular boundary represents the unit circle.
The proportion of points inside the circle is used to estimate the value of π. As the number of points increases, the estimate becomes more accurate.
The probability of points falling within the smaller square represents the likelihood of a point being located in that region. This demonstrates how Monte Carlo simulation can be used to approximate both mathematical constants and probabilities.
5.5 Conclusion
The Monte Carlo simulation successfully estimates the value of π and calculates probability using random sampling. The visualization clearly distinguishes points inside and outside the circle. This approach demonstrates how randomness can be used to solve mathematical problems.
6 Advanced Data Transformation & Feature Engineering
6.1 Introduction
In this task, data transformation techniques are applied to improve data quality and prepare it for analysis. The process includes normalization, z-score standardization, and feature engineering. New features such as performance category and salary bracket are also created. The results are compared before and after transformation using visualizations.
6.2 Function Implementation
Normalize Function, Z-Score Function
| Function | Description | Method |
|---|---|---|
| normalize_columns | Scales numeric values to range 0–1 | (x - min) / (max - min) |
| z_score | Standardizes values based on mean and standard deviation | (x - mean) / sd |
Explanation
The normalization function rescales numeric values between 0 and 1, making them comparable across variables. The z-score function standardizes values based on their mean and standard deviation. Both functions use loops to process each numeric column.
6.2.1 Feature Engineering
# Performance category
company_data$performance_category <- ifelse(company_data$performance_score >= 85, "High",
ifelse(company_data$performance_score >= 70, "Medium", "Low"))
# Salary bracket
company_data$salary_bracket <- ifelse(company_data$salary >= 8000, "High",
ifelse(company_data$salary >= 5000, "Medium", "Low"))6.3 Visualization
6.3.1 Histogram (Before vs After Normalization)
6.3.2 Boxplot (Before vs After Z-Score)
6.4 Interpretation
The histogram shows the distribution of salary before and after normalization. After normalization, the values are scaled between 0 and 1, making them easier to compare across variables.
The boxplot shows the distribution of performance scores before and after applying the z-score transformation. After transformation, the data is centered around zero, which helps in identifying deviations from the mean.
The new features, such as performance category and salary bracket, simplify the interpretation of numerical data by converting them into meaningful groups.
6.5 Conclusion
The transformation process successfully improves the data structure through normalization and standardization. Feature engineering adds new meaningful variables that enhance data analysis. The visualizations clearly show the differences before and after transformation.
7 Mini Project: Company KPI Dashboard & Simulation
7.1 Introduction
In this mini project, a dataset is generated to simulate multiple companies and their employees. Each company contains a number of employees with attributes such as salary, performance score, KPI score, and department. The goal is to analyze company performance, identify top performers, and visualize key insights using various plots.
7.2 Function Implementation
| Component | Description |
|---|---|
| Company Loop | Loops through multiple companies |
| Employee Loop | Generates employees per company |
| Employee Data | Creates employee ID and salary |
| Department | Assigns department randomly |
| Performance Metrics | Generates performance and KPI scores |
Explanation:
The function generates a dataset using nested loops. Each company has a random number of employees between 50 and 200. Each employee is assigned attributes such as salary, performance score, KPI score, and department. This simulates a real-world company dataset.
7.3 Generate Data
| company_id | salary | KPI_score |
|---|---|---|
| 1 | 6420.546 | 75.15464 |
| 2 | 6796.938 | 73.93750 |
| 3 | 6712.113 | 75.41935 |
| 4 | 6604.392 | 73.37975 |
| 5 | 6090.982 | 74.61404 |
7.4 Top Performance (KPI > 90)
| employee_id | company_id | salary | performance_score | KPI_score | department | |
|---|---|---|---|---|---|---|
| 3 | E1_3 | 1 | 7001 | 82 | 99 | HR |
| 6 | E1_6 | 1 | 4846 | 95 | 91 | Marketing |
| 7 | E1_7 | 1 | 6372 | 72 | 97 | Marketing |
| 11 | E1_11 | 1 | 6528 | 77 | 99 | Marketing |
| 23 | E1_23 | 1 | 9969 | 64 | 94 | Finance |
| 27 | E1_27 | 1 | 6282 | 68 | 98 | Finance |
7.5 Visualization
7.5.1 1. Bar Chart: Average KPI per Company
7.5.2 Grouped Bar Chart (Department Analysis)
7.5.3 Salary Distribution
7.5.4 Relationship Between Performance Score and KPI Score
7.6 Interpretation
The bar chart shows the average KPI score for each company, allowing comparison of overall performance. Some companies achieve higher KPI values, indicating better productivity.
The grouped bar chart presents the distribution of departments within each company, showing how employees are distributed across different roles.
The salary distribution highlights the spread of employee salaries, indicating variability in compensation.
The scatter plot with a regression line shows a positive relationship between performance score and KPI score, suggesting that higher performance is associated with higher KPI values.
7.7 Conclusion
The dataset successfully simulates multiple companies with realistic employee data. The analysis identifies key insights such as company performance, department distribution, and relationships between variables. The visualizations help present the data clearly and support decision-making.