Advanced Practicum:Functions & Loops + Data Science

Functions & Loops + Data Science~ Week 5

<div class="logo-inner">
  <img src="C:/Users/Nurul Iffah/Downloads/fotoku.jpg" alt="Nurul Iffah">
</div>

NURUL IFFAH

ADVANCE PRACTICUM

FUNCTION & LOOPS + DATA SCIENCE ~ Week 5

Program Studi
Sains Data

Universitas
INSTITUT TEKNOLOGI SAINS BANDUNG

Dosen Pengampu
Bakti Siregar, M Sc., CSD

1 Dynamic Multi Formula Function

1.1 Introduction

In this task, a dynamic function is created to compute several types of mathematical formulas, including linear, quadratic, cubic, and exponential functions. The purpose of this task is to understand the use of functions, nested loops, and conditional logic in R programming, as well as to visualize the results using graphs.

1.2 Function Implementation

Formula	Expression	Description
Linear	2x + 3	Increases at a constant rate
Quadratic	x^2 + 2x + 1	Forms a curved parabola
Cubic	x^3	Grows faster than quadratic
Exponential	2^x	Grows very rapidly

Explanation: This function calculates values based on the selected formula type. The if-else structure is used to choose the formula, while stop() is used to validate the input and prevent errors.

1.3 Calculation Using Nested Loops

x_values <- 1:20
formulas <- c("linear", "quadratic", "cubic", "exponential")

results <- list()

for (f in formulas) {
  y_values <- c()
  
  for (x in x_values) {
    y <- compute_formula(x, f)
    y_values <- c(y_values, y)
  }
  
  results[[f]] <- y_values
}

Explanation: Nested loops are used to calculate multiple formulas efficiently. The outer loop goes through each formula, while the inner loop computes values for x from 1 to 20. The results are stored in a list.

1.4 Visualization

1.5 Interprentation

Based on the visualization, each function shows a different growth pattern. The linear function increases at a constant rate. The quadratic function grows faster and forms a curved shape. The cubic function increases even more sharply, especially for larger x values. Meanwhile, the exponential function grows the fastest, showing a very steep increase as x becomes larger.

This shows that higher-level functions produce faster growth.

1.6 Conclusion

The function successfully computes different types of mathematical formulas using a dynamic approach. Nested loops help process multiple calculations efficiently. The visualization also makes it easier to understand the differences between each function.

2 Nested Simulation: Multi Sales & Discounts

2.1 Introduction

In this task, we simulate sales data for multiple salespersons over several days. The goal is to apply nested loops, conditional logic, and functions to generate realistic sales data, calculate cumulative sales, and apply discounts based on sales performance.

2.2 Funcion Implementation

Component	Description
Nested Function	Calculates cumulative sales per salesperson
Outer Loop	Loops through each salesperson
Inner Loop	Loops through each day
Discount Logic	Applies discount based on sales amount
Cumulative Calculation	Tracks total sales over time

Explanation: This function generates sales data using nested loops. The outer loop iterates through each salesperson, while the inner loop represents daily sales. Conditional logic is used to assign discount rates based on the sales amount. Cumulative sales are calculated by adding daily sales over time.

2.3 Data Generation

sales_id	sales_amount	cumulative_sales
1	490.5714	2535.429
2	632.1429	2409.714
3	433.2857	1249.286
4	555.4286	1885.143
5	727.7143	3107.143

Explanation: This step generates data for 5 salespersons over 5 days. The output includes sales amount, discount rate, and cumulative sales.

2.4 Visualization

2.5 Interpretation

The line chart shows the cumulative sales of each salesperson over time. Each line represents one salesperson, and the upward trend indicates that total sales increase day by day.

Some salespersons show a steeper increase, which means they achieve higher sales in a shorter period. Others have a more gradual growth, indicating lower or more consistent daily sales.

Overall, the differences in the lines reflect variations in performance among salespersons. The chart clearly shows how each individual progresses over time and allows easy comparison of their total sales.

2.6 Conclusion

The simulation successfully models real-world sales data using nested loops and conditional logic. It also demonstrates how cumulative performance can be tracked and visualized effectively. This approach helps in understanding sales trends and comparing performance across individuals.

3 Multi Level Performance Categorization

3.1 Introduction

In this task, student performance is categorized into multiple levels based on their scores. The purpose is to transform numerical data into meaningful categories and analyze the distribution using percentages and visualizations.

3.2 Function Implementation

Category	Sales_Range	Description
Excellent	>= 900	Very high sales performance
Very Good	700–899	Above average sales
Good	500–699	Moderate sales performance
Average	300–499	Below average performance
Poor	< 300	Low sales performance

Explanation: This function categorizes each score into five levels. A loop is used to process each value, and conditional statements determine the appropriate category.

3.3 Visualization

3.4 Interprentation

The bar chart presents the distribution of sales performance categories in descending order. The category with the highest frequency appears first, indicating that most sales fall within this performance level.

The following categories show lower frequencies, which means fewer sales are classified in those groups. This suggests that while some performance levels are dominant, others occur less frequently.

Overall, the visualization highlights the imbalance in sales distribution, where certain performance categories are more common than others. The sorted arrangement makes it easier to identify which categories contribute the most to overall sales performance.

4 Multi-Company Dataset Simulation

4.1 Introduction

In this task, a dataset is generated to simulate multiple companies and their employees. Each employee has attributes such as salary, department, performance score, and KPI score. The goal is to analyze company performance using summary statistics and visualize the results.

4.2 Function Imlementation

Component	Description
Nested Loop	Loops through companies and employees
Company ID	Identifies each company
Employee Data	Generates salary and employee ID
Department Assignment	Assigns department randomly
Performance Metrics	Generates performance and KPI scores

Explanation:

The function generates a dataset using nested loops. The outer loop represents companies, while the inner loop represents employees within each company. Each employee is assigned a salary, department, and performance scores. This simulates a real-world company dataset.

4.3 Data Generate

company_id	salary	performance_score	KPI_score
1	5952.6	80.9	73.9
2	5879.6	80.7	75.4
3	6787.8	82.3	79.3
4	5576.9	81.5	72.4
5	6503.7	68.3	70.7

4.4 Visualization

4.5 Interprentation

The bar chart shows the average KPI scores for each company in descending order. The company with the highest KPI score appears first, indicating better overall performance. Other companies show lower KPI values, reflecting differences in employee performance and productivity.

This visualization makes it easy to compare company performance and identify which company performs the best. The variation in KPI scores suggests that performance is not evenly distributed across companies.

4.6 Conclusion

The dataset successfully simulates multiple companies and employees using nested loops. The summary analysis and visualization provide insights into company performance. The bar chart helps identify which company has the highest average KPI score.

5 Monte Carlo Simulation: Pi & Probability

5.1 Introduction

In this task, a Monte Carlo simulation is used to estimate the value of π (pi) and analyze probability. Random points are generated within a square, and points that fall inside a circle are counted. Additionally, the probability of points falling within a smaller sub-square is calculated.

5.2 Function Implementation

Component	Description
Loop	Generates points iteratively
Random Points	Random (x,y) between -1 and 1
Circle Check	Checks if point is inside circle
Sub-square Check	Checks if point is inside smaller square
Pi Calculation	Estimates pi using Monte Carlo method

Explanation:

The function generates random points using a loop. Each point is checked whether it lies inside the unit circle or not. The number of points inside the circle is used to estimate π. Additionally, the function calculates the probability of points falling within a smaller square region.

5.3 Visualization

5.4 Interprentation

The scatter plot shows randomly generated points within a square. Points inside the circle are displayed in one color, while points outside are shown in another. The circular boundary represents the unit circle.

The proportion of points inside the circle is used to estimate the value of π. As the number of points increases, the estimate becomes more accurate.

The probability of points falling within the smaller square represents the likelihood of a point being located in that region. This demonstrates how Monte Carlo simulation can be used to approximate both mathematical constants and probabilities.

5.5 Conclusion

The Monte Carlo simulation successfully estimates the value of π and calculates probability using random sampling. The visualization clearly distinguishes points inside and outside the circle. This approach demonstrates how randomness can be used to solve mathematical problems.

6 Advanced Data Transformation & Feature Engineering

6.1 Introduction

In this task, data transformation techniques are applied to improve data quality and prepare it for analysis. The process includes normalization, z-score standardization, and feature engineering. New features such as performance category and salary bracket are also created. The results are compared before and after transformation using visualizations.

6.2 Function Implementation

Normalize Function, Z-Score Function

Function	Description	Method
normalize_columns	Scales numeric values to range 0–1	(x - min) / (max - min)
z_score	Standardizes values based on mean and standard deviation	(x - mean) / sd

Explanation

The normalization function rescales numeric values between 0 and 1, making them comparable across variables. The z-score function standardizes values based on their mean and standard deviation. Both functions use loops to process each numeric column.

6.2.1 Feature Engineering

# Performance category
company_data$performance_category <- ifelse(company_data$performance_score >= 85, "High",
                                     ifelse(company_data$performance_score >= 70, "Medium", "Low"))

# Salary bracket
company_data$salary_bracket <- ifelse(company_data$salary >= 8000, "High",
                               ifelse(company_data$salary >= 5000, "Medium", "Low"))

6.3 Visualization

6.3.1 Histogram (Before vs After Normalization)

6.3.2 Boxplot (Before vs After Z-Score)

6.4 Interpretation

The histogram shows the distribution of salary before and after normalization. After normalization, the values are scaled between 0 and 1, making them easier to compare across variables.

The boxplot shows the distribution of performance scores before and after applying the z-score transformation. After transformation, the data is centered around zero, which helps in identifying deviations from the mean.

The new features, such as performance category and salary bracket, simplify the interpretation of numerical data by converting them into meaningful groups.

6.5 Conclusion

The transformation process successfully improves the data structure through normalization and standardization. Feature engineering adds new meaningful variables that enhance data analysis. The visualizations clearly show the differences before and after transformation.

7 Mini Project: Company KPI Dashboard & Simulation

7.1 Introduction

In this mini project, a dataset is generated to simulate multiple companies and their employees. Each company contains a number of employees with attributes such as salary, performance score, KPI score, and department. The goal is to analyze company performance, identify top performers, and visualize key insights using various plots.

7.2 Function Implementation

Component	Description
Company Loop	Loops through multiple companies
Employee Loop	Generates employees per company
Employee Data	Creates employee ID and salary
Department	Assigns department randomly
Performance Metrics	Generates performance and KPI scores

Explanation:

The function generates a dataset using nested loops. Each company has a random number of employees between 50 and 200. Each employee is assigned attributes such as salary, performance score, KPI score, and department. This simulates a real-world company dataset.

7.3 Generate Data

company_id	salary	KPI_score
1	6420.546	75.15464
2	6796.938	73.93750
3	6712.113	75.41935
4	6604.392	73.37975
5	6090.982	74.61404

7.4 Top Performance (KPI > 90)

	employee_id	company_id	salary	performance_score	KPI_score	department
3	E1_3	1	7001	82	99	HR
6	E1_6	1	4846	95	91	Marketing
7	E1_7	1	6372	72	97	Marketing
11	E1_11	1	6528	77	99	Marketing
23	E1_23	1	9969	64	94	Finance
27	E1_27	1	6282	68	98	Finance

7.5 Visualization

7.5.1 1. Bar Chart: Average KPI per Company

7.5.2 Grouped Bar Chart (Department Analysis)

7.5.3 Salary Distribution

7.5.4 Relationship Between Performance Score and KPI Score

7.6 Interpretation

The bar chart shows the average KPI score for each company, allowing comparison of overall performance. Some companies achieve higher KPI values, indicating better productivity.

The grouped bar chart presents the distribution of departments within each company, showing how employees are distributed across different roles.

The salary distribution highlights the spread of employee salaries, indicating variability in compensation.

The scatter plot with a regression line shows a positive relationship between performance score and KPI score, suggesting that higher performance is associated with higher KPI values.

7.7 Conclusion

The dataset successfully simulates multiple companies with realistic employee data. The analysis identifies key insights such as company performance, department distribution, and relationships between variables. The visualizations help present the data clearly and support decision-making.