Advanced Practicum:Functions & Loops + Data Science

Functions & Loops + Data Science~ Week 5

<div class="logo-inner">
  <img src="C:/Users/Nurul Iffah/Downloads/fotoku.jpg" alt="Nurul Iffah">
</div>

NURUL IFFAH

ADVANCE PRACTICUM
FUNCTION & LOOPS + DATA SCIENCE ~ Week 5

Program Studi
Sains Data

Universitas
INSTITUT TEKNOLOGI SAINS BANDUNG

Dosen Pengampu
Bakti Siregar, M Sc., CSD

1 Dynamic Multi Formula Function

1.1 Introduction

In this task, a dynamic function is created to compute several types of mathematical formulas, including linear, quadratic, cubic, and exponential functions. The purpose of this task is to understand the use of functions, nested loops, and conditional logic in R programming, as well as to visualize the results using graphs.

1.2 Function Implementation

Formula Expression Description
Linear 2x + 3 Increases at a constant rate
Quadratic x^2 + 2x + 1 Forms a curved parabola
Cubic x^3 Grows faster than quadratic
Exponential 2^x Grows very rapidly

Explanation: This function calculates values based on the selected formula type. The if-else structure is used to choose the formula, while stop() is used to validate the input and prevent errors.

1.3 Calculation Using Nested Loops

x_values <- 1:20
formulas <- c("linear", "quadratic", "cubic", "exponential")

results <- list()

for (f in formulas) {
  y_values <- c()
  
  for (x in x_values) {
    y <- compute_formula(x, f)
    y_values <- c(y_values, y)
  }
  
  results[[f]] <- y_values
}

Explanation: Nested loops are used to calculate multiple formulas efficiently. The outer loop goes through each formula, while the inner loop computes values for x from 1 to 20. The results are stored in a list.

1.4 Visualization

1.5 Interprentation

Based on the visualization, each function shows a different growth pattern. The linear function increases at a constant rate. The quadratic function grows faster and forms a curved shape. The cubic function increases even more sharply, especially for larger x values. Meanwhile, the exponential function grows the fastest, showing a very steep increase as x becomes larger.

This shows that higher-level functions produce faster growth.

1.6 Conclusion

The function successfully computes different types of mathematical formulas using a dynamic approach. Nested loops help process multiple calculations efficiently. The visualization also makes it easier to understand the differences between each function.


2 Nested Simulation: Multi Sales & Discounts

2.1 Introduction

In this task, we simulate sales data for multiple salespersons over several days. The goal is to apply nested loops, conditional logic, and functions to generate realistic sales data, calculate cumulative sales, and apply discounts based on sales performance.

2.2 Funcion Implementation

Component Description
Nested Function Calculates cumulative sales per salesperson
Outer Loop Loops through each salesperson
Inner Loop Loops through each day
Discount Logic Applies discount based on sales amount
Cumulative Calculation Tracks total sales over time

Explanation: This function generates sales data using nested loops. The outer loop iterates through each salesperson, while the inner loop represents daily sales. Conditional logic is used to assign discount rates based on the sales amount. Cumulative sales are calculated by adding daily sales over time.

2.3 Data Generation

sales_id sales_amount cumulative_sales
1 490.5714 2535.429
2 632.1429 2409.714
3 433.2857 1249.286
4 555.4286 1885.143
5 727.7143 3107.143

Explanation: This step generates data for 5 salespersons over 5 days. The output includes sales amount, discount rate, and cumulative sales.

2.4 Visualization

2.5 Interpretation

The line chart shows the cumulative sales of each salesperson over time. Each line represents one salesperson, and the upward trend indicates that total sales increase day by day.

Some salespersons show a steeper increase, which means they achieve higher sales in a shorter period. Others have a more gradual growth, indicating lower or more consistent daily sales.

Overall, the differences in the lines reflect variations in performance among salespersons. The chart clearly shows how each individual progresses over time and allows easy comparison of their total sales.

2.6 Conclusion

The simulation successfully models real-world sales data using nested loops and conditional logic. It also demonstrates how cumulative performance can be tracked and visualized effectively. This approach helps in understanding sales trends and comparing performance across individuals.


3 Multi Level Performance Categorization

3.1 Introduction

In this task, student performance is categorized into multiple levels based on their scores. The purpose is to transform numerical data into meaningful categories and analyze the distribution using percentages and visualizations.

3.2 Function Implementation

Category Sales_Range Description
Excellent >= 900 Very high sales performance
Very Good 700–899 Above average sales
Good 500–699 Moderate sales performance
Average 300–499 Below average performance
Poor < 300 Low sales performance

Explanation: This function categorizes each score into five levels. A loop is used to process each value, and conditional statements determine the appropriate category.

3.3 Visualization

3.4 Interprentation

The bar chart presents the distribution of sales performance categories in descending order. The category with the highest frequency appears first, indicating that most sales fall within this performance level.

The following categories show lower frequencies, which means fewer sales are classified in those groups. This suggests that while some performance levels are dominant, others occur less frequently.

Overall, the visualization highlights the imbalance in sales distribution, where certain performance categories are more common than others. The sorted arrangement makes it easier to identify which categories contribute the most to overall sales performance.


4 Multi-Company Dataset Simulation

4.1 Introduction

In this task, a dataset is generated to simulate multiple companies and their employees. Each employee has attributes such as salary, department, performance score, and KPI score. The goal is to analyze company performance using summary statistics and visualize the results.

4.2 Function Imlementation

Component Description
Nested Loop Loops through companies and employees
Company ID Identifies each company
Employee Data Generates salary and employee ID
Department Assignment Assigns department randomly
Performance Metrics Generates performance and KPI scores

Explanation:

The function generates a dataset using nested loops. The outer loop represents companies, while the inner loop represents employees within each company. Each employee is assigned a salary, department, and performance scores. This simulates a real-world company dataset.

4.3 Data Generate

company_id salary performance_score KPI_score
1 5952.6 80.9 73.9
2 5879.6 80.7 75.4
3 6787.8 82.3 79.3
4 5576.9 81.5 72.4
5 6503.7 68.3 70.7

4.4 Visualization

4.5 Interprentation

The bar chart shows the average KPI scores for each company in descending order. The company with the highest KPI score appears first, indicating better overall performance. Other companies show lower KPI values, reflecting differences in employee performance and productivity.

This visualization makes it easy to compare company performance and identify which company performs the best. The variation in KPI scores suggests that performance is not evenly distributed across companies.

4.6 Conclusion

The dataset successfully simulates multiple companies and employees using nested loops. The summary analysis and visualization provide insights into company performance. The bar chart helps identify which company has the highest average KPI score.


5 Monte Carlo Simulation: Pi & Probability

5.1 Introduction

In this task, a Monte Carlo simulation is used to estimate the value of π (pi) and analyze probability. Random points are generated within a square, and points that fall inside a circle are counted. Additionally, the probability of points falling within a smaller sub-square is calculated.

5.2 Function Implementation

Component Description
Loop Generates points iteratively
Random Points Random (x,y) between -1 and 1
Circle Check Checks if point is inside circle
Sub-square Check Checks if point is inside smaller square
Pi Calculation Estimates pi using Monte Carlo method

Explanation:

The function generates random points using a loop. Each point is checked whether it lies inside the unit circle or not. The number of points inside the circle is used to estimate π. Additionally, the function calculates the probability of points falling within a smaller square region.

5.3 Visualization

5.4 Interprentation

The scatter plot shows randomly generated points within a square. Points inside the circle are displayed in one color, while points outside are shown in another. The circular boundary represents the unit circle.

The proportion of points inside the circle is used to estimate the value of π. As the number of points increases, the estimate becomes more accurate.

The probability of points falling within the smaller square represents the likelihood of a point being located in that region. This demonstrates how Monte Carlo simulation can be used to approximate both mathematical constants and probabilities.

5.5 Conclusion

The Monte Carlo simulation successfully estimates the value of π and calculates probability using random sampling. The visualization clearly distinguishes points inside and outside the circle. This approach demonstrates how randomness can be used to solve mathematical problems.


6 Advanced Data Transformation & Feature Engineering

6.1 Introduction

In this task, data transformation techniques are applied to improve data quality and prepare it for analysis. The process includes normalization, z-score standardization, and feature engineering. New features such as performance category and salary bracket are also created. The results are compared before and after transformation using visualizations.

6.2 Function Implementation

Normalize Function, Z-Score Function

Function Description Method
normalize_columns Scales numeric values to range 0–1 (x - min) / (max - min)
z_score Standardizes values based on mean and standard deviation (x - mean) / sd

Explanation

The normalization function rescales numeric values between 0 and 1, making them comparable across variables. The z-score function standardizes values based on their mean and standard deviation. Both functions use loops to process each numeric column.

6.2.1 Feature Engineering

# Performance category
company_data$performance_category <- ifelse(company_data$performance_score >= 85, "High",
                                     ifelse(company_data$performance_score >= 70, "Medium", "Low"))

# Salary bracket
company_data$salary_bracket <- ifelse(company_data$salary >= 8000, "High",
                               ifelse(company_data$salary >= 5000, "Medium", "Low"))

6.3 Visualization

6.3.1 Histogram (Before vs After Normalization)

6.3.2 Boxplot (Before vs After Z-Score)

6.4 Interpretation

The histogram shows the distribution of salary before and after normalization. After normalization, the values are scaled between 0 and 1, making them easier to compare across variables.

The boxplot shows the distribution of performance scores before and after applying the z-score transformation. After transformation, the data is centered around zero, which helps in identifying deviations from the mean.

The new features, such as performance category and salary bracket, simplify the interpretation of numerical data by converting them into meaningful groups.

6.5 Conclusion

The transformation process successfully improves the data structure through normalization and standardization. Feature engineering adds new meaningful variables that enhance data analysis. The visualizations clearly show the differences before and after transformation.


7 Mini Project: Company KPI Dashboard & Simulation

7.1 Introduction

In this mini project, a dataset is generated to simulate multiple companies and their employees. Each company contains a number of employees with attributes such as salary, performance score, KPI score, and department. The goal is to analyze company performance, identify top performers, and visualize key insights using various plots.

7.2 Function Implementation

Component Description
Company Loop Loops through multiple companies
Employee Loop Generates employees per company
Employee Data Creates employee ID and salary
Department Assigns department randomly
Performance Metrics Generates performance and KPI scores

Explanation:

The function generates a dataset using nested loops. Each company has a random number of employees between 50 and 200. Each employee is assigned attributes such as salary, performance score, KPI score, and department. This simulates a real-world company dataset.

7.3 Generate Data

company_id salary KPI_score
1 6420.546 75.15464
2 6796.938 73.93750
3 6712.113 75.41935
4 6604.392 73.37975
5 6090.982 74.61404

7.4 Top Performance (KPI > 90)

employee_id company_id salary performance_score KPI_score department
3 E1_3 1 7001 82 99 HR
6 E1_6 1 4846 95 91 Marketing
7 E1_7 1 6372 72 97 Marketing
11 E1_11 1 6528 77 99 Marketing
23 E1_23 1 9969 64 94 Finance
27 E1_27 1 6282 68 98 Finance

7.5 Visualization

7.5.1 1. Bar Chart: Average KPI per Company

7.5.2 Grouped Bar Chart (Department Analysis)

7.5.3 Salary Distribution

7.5.4 Relationship Between Performance Score and KPI Score

7.6 Interpretation

The bar chart shows the average KPI score for each company, allowing comparison of overall performance. Some companies achieve higher KPI values, indicating better productivity.

The grouped bar chart presents the distribution of departments within each company, showing how employees are distributed across different roles.

The salary distribution highlights the spread of employee salaries, indicating variability in compensation.

The scatter plot with a regression line shows a positive relationship between performance score and KPI score, suggesting that higher performance is associated with higher KPI values.

7.7 Conclusion

The dataset successfully simulates multiple companies with realistic employee data. The analysis identifies key insights such as company performance, department distribution, and relationships between variables. The visualizations help present the data clearly and support decision-making.