Functions & Loops + Data Science

Assignment ~ Week 5

Adinda Maiza Ishfahani

Data Science Undergraduate at ITSB

NIM : 52250074

1 Dynamic Multi-Formula Function

In this task, a dynamic function was developed that is capable of handling various types of mathematical equations, specifically linear, quadratic, cubic, and exponential equations.

library(ggplot2)
library(tidyr)
library(plotly)

compute_formula <- function(x, formulas){
  results <- data.frame(x=x)
  
  for(f in formulas){
    if(f=="linear"){
      results[[f]] <- 2*x + 3
    } else if(f=="quadratic"){
      results[[f]] <- x^2 + 2*x + 1
    } else if(f=="cubic"){
      results[[f]] <- x^3
    } else if(f=="exponential"){
      results[[f]] <- exp(x)
    } else {
      stop("Invalid formula")
    }
  }
  return(results)
}

x <- 1:20
df <- compute_formula(x, c("linear","quadratic","cubic"))

df_long <- tidyr::pivot_longer(df, -x)

p <- ggplot(df_long, aes(x, value, color=name)) +
  geom_line() +
  ggtitle("Multi Formula Plot")

# Ubah jadi interaktif
ggplotly(p)

1.1 Interpretasi

The resulting visualization illustrates differences in growth characteristics among functions. Linear functions show constant growth, while polynomial and exponential functions demonstrate significantly accelerated growth. This is relevant in various applications, such as modeling economic growth, population dynamics, and trend analysis in data.

2 Nested Simulation: Multi-Sales & Discounts

sales data is simulated using a nested loop approach, representing the interaction between multiple salespeople and specific time periods.

library(ggplot2)
library(dplyr)
library(plotly)

simulate_sales <- function(n_salesperson, days){
  data <- data.frame()
  
  for(i in 1:n_salesperson){
    for(d in 1:days){
      sales <- runif(1, 100, 1000)
      discount <- ifelse(sales > 700, 0.2, 0.1)
      
      data <- rbind(data, data.frame(
        salesperson=i, day=d,
        sales=sales, discount=discount
      ))
    }
  }
  
  data <- data %>%
    group_by(salesperson) %>%
    mutate(cumulative = cumsum(sales))
  
  return(data)
}

sales_data <- simulate_sales(3,10)

p <- ggplot(sales_data, aes(day, cumulative, color=factor(salesperson))) +
  geom_line() +
  ggtitle("Cumulative Sales")

# Ubah jadi interaktif
ggplotly(p)

2.1 Interpretasi

The simulation results show that high-performing salespeople exhibit a steeper increase in their cumulative curves. Furthermore, the implementation of conditional discounts reflects a business strategy that is adaptive to sales volume.

3 Multi-Level Performance Categorization

library(plotly)

categorize_performance <- function(sales){
  categories <- c()
  
  for(s in sales){
    if(s > 800) categories <- c(categories,"Excellent")
    else if(s > 600) categories <- c(categories,"Very Good")
    else if(s > 400) categories <- c(categories,"Good")
    else if(s > 200) categories <- c(categories,"Average")
    else categories <- c(categories,"Poor")
  }
  
  return(categories)
}

cats <- categorize_performance(sales_data$sales)
perf_table <- as.data.frame(table(cats))

# Plot interaktif
plot_ly(perf_table,
        x = ~cats,
        y = ~Freq,
        type = "bar") %>%
  
  layout(
    title = "Performance Distribution",
    width = 500,   # lebar (px)
    height = 350   # tinggi (px)
  )

library(plotly)

# Fungsi kategorisasi performa
categorize_performance <- function(sales){
  categories <- c()
  
  for(s in sales){
    if(s > 800){
      categories <- c(categories, "Excellent")
    } else if(s > 600){
      categories <- c(categories, "Very Good")
    } else if(s > 400){
      categories <- c(categories, "Good")
    } else if(s > 200){
      categories <- c(categories, "Average")
    } else {
      categories <- c(categories, "Poor")
    }
  }
  
  return(categories)
}

# Data
sales_data <- runif(100, 100, 1000)

# Kategorisasi
categories <- categorize_performance(sales_data)

# Hitung frekuensi
counts <- as.data.frame(table(categories))

# Pie interaktif
plot_ly(counts,
        labels = ~categories,
        values = ~Freq,
        type = 'pie',
        textinfo = 'label+percent') %>%
  layout(
    title = "Distribusi Kategori Performa Penjualan",
    width = 500,
    height = 400
  )

3.1 Interpretasi

The performance category distribution provides an overview of sales quality. The proportion of specific categories can serve as an indicator for evaluating organizational performance. Visualizations in the form of bar charts and pie charts significantly enhance the readability of the information.

4 Multi-Company Dataset Simulation

knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
library(knitr)

generate_company_data <- function(n_company, n_employees){
  data <- data.frame()
  
  for(c in 1:n_company){
    for(e in 1:n_employees){
      salary <- runif(1, 3000, 10000)
      perf <- runif(1, 50, 100)
      kpi <- runif(1, 50, 100)
      
      data <- rbind(data, data.frame(
        company=c, employee=e,
        salary=salary,
        performance=perf,
        KPI=kpi
      ))
    }
  }
  return(data)
}

company_data <- generate_company_data(3, 50)

summary_table <- company_data %>%
  summarise(
    across(
      c(salary, performance, KPI),
      list(
        Min = min,
        Q1 = ~quantile(. , 0.25),
        Median = median,
        Mean = mean,
        Q3 = ~quantile(. , 0.75),
        Max = max
      )
    )
  )

kable(summary_table, caption = "Summary Statistik Data Perusahaan")

Summary Statistik Data Perusahaan
salary_Min	salary_Q1	salary_Median	salary_Mean	salary_Q3	salary_Max	performance_Min	performance_Q1	performance_Median	performance_Mean	performance_Q3	performance_Max	KPI_Min	KPI_Q1	KPI_Median	KPI_Mean	KPI_Q3	KPI_Max
3113.713	4888.721	6741.415	6631.755	8369.591	9991.177	50.23825	64.14734	77.66848	75.9003	88.35374	99.94609	50.19821	60.44406	72.90977	74.16575	87.98709	99.91977

4.1 Interpretasi

This analysis enables the identification of top-performing companies based on indicators such as average KPIs and salaries. Furthermore, determining top performers based on specific KPI thresholds reflects performance evaluation practices in modern organizations.

5 Monte Carlo Simulation: Estimasi π dan Probabilitas

library(plotly)

monte_carlo_pi <- function(n){
  x <- runif(n)
  y <- runif(n)
  
  inside <- (x^2 + y^2) <= 1
  pi_est <- 4 * mean(inside)
  
  data <- data.frame(
    x = x,
    y = y,
    inside = ifelse(inside, "Inside", "Outside")
  )
  
  plot_ly(data,
          x = ~x,
          y = ~y,
          color = ~inside,
          colors = c("blue", "red"),
          type = "scatter",
          mode = "markers") %>%
    layout(
      title = paste("Estimasi Pi (Monte Carlo):", round(pi_est,4)),
      xaxis = list(title = "X"),
      yaxis = list(title = "Y")
    )
}

monte_carlo_pi(1000)

5.1 Interpretasi

This approach demonstrates that mathematical problems can be solved through probability-based simulations. This concept is widely used in various fields, such as finance, artificial intelligence, and risk analysis.

6 Advanced Data Transformation & Feature Engineering

normalize_columns <- function(df){
  for(col in names(df)){
    if(is.numeric(df[[col]])){
      df[[col]] <- (df[[col]] - min(df[[col]])) /
                   (max(df[[col]]) - min(df[[col]]))
    }
  }
  return(df)
}

norm_data <- normalize_columns(company_data)
hist(norm_data$salary, main="Normalized Salary")

6.1 Interpretasi

The comparison of distributions before and after transformation shows a significant change in data spread. This is essential as a preprocessing step prior to the application of machine learning models.

7 Mini Project: Company KPI Dashboard & Simulation

summary_company <- company_data %>%
  group_by(company) %>%
  summarise(avg_salary=mean(salary),
            avg_KPI=mean(KPI))

ggplot(summary_company, aes(factor(company), avg_salary)) +
  geom_bar(stat="identity") +
  ggtitle("Avg Salary per Company")

# Load library
library(ggplot2)
library(plotly)

# 1. Menyiapkan data (sesuai kode Anda)
set.seed(123)
df_company <- data.frame(
  salary = runif(100, 3000, 10000),
  KPI_score = runif(100, 50, 100),
  company_id = sample(1:3, 100, replace = TRUE)
)

# 2. Membuat ggplot statis dan menyimpannya dalam variabel 'p'
p <- ggplot(df_company, aes(x = salary, y = KPI_score, color = factor(company_id))) +
  geom_point(size = 2, alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Hubungan Salary dengan KPI Score (Interaktif)",
    x = "Salary",
    y = "KPI Score",
    color = "Company ID"
  ) +
  theme_minimal()

# 3. Mengubah menjadi interaktif
ggplotly(p)

## `geom_smooth()` using formula = 'y ~ x'

7.1 Interpretasi

The resulting dashboard provides a comprehensive overview of company performance. Visualizations such as bar charts and scatter plots enable the identification of patterns and relationships between variables, such as the correlation between salary and KPI.

8 Automated Report Generation

library(ggplot2)
library(dplyr)

# Pastikan sudah ada data dari Task 4
# Jika belum, generate ulang:
set.seed(123)
df_company <- data.frame(
  company_id = sample(1:3, 150, replace = TRUE),
  salary = runif(150, 3000, 10000),
  KPI_score = runif(150, 50, 100),
  performance_score = runif(150, 50, 100),
  department = sample(c("IT","HR","Finance","Marketing"), 150, replace = TRUE)
)

# Loop otomatis per company
for(c in unique(df_company$company_id)){
  
  cat("## 📌 Company", c, "\n")
  
  data_subset <- df_company %>% filter(company_id == c)
  
  # 1. Summary Table
  print(summary(data_subset))
  
  # 2. Bar Plot: Department Distribution
  p1 <- ggplot(data_subset, aes(x = department, fill = department)) +
    geom_bar() +
    labs(title = paste("Department Distribution - Company", c),
         x = "Department", y = "Count") +
    theme_minimal()
  
  print(p1)
  
  # 3. Scatter Plot: Salary vs KPI
  p2 <- ggplot(data_subset, aes(x = salary, y = KPI_score)) +
    geom_point(color = "blue", alpha = 0.6) +
    geom_smooth(method = "lm", se = FALSE, color = "red") +
    labs(title = paste("Salary vs KPI - Company", c),
         x = "Salary", y = "KPI Score") +
    theme_minimal()
  
  print(p2)
  
  # 4. Histogram Salary
  p3 <- ggplot(data_subset, aes(x = salary)) +
    geom_histogram(bins = 15, fill = "skyblue", color = "black") +
    labs(title = paste("Salary Distribution - Company", c),
         x = "Salary", y = "Frequency") +
    theme_minimal()
  
  print(p3)
  
}

## ## 📌 Company 3 
##    company_id     salary       KPI_score     performance_score
##  Min.   :3    Min.   :3140   Min.   :51.40   Min.   :50.81    
##  1st Qu.:3    1st Qu.:5074   1st Qu.:62.17   1st Qu.:64.35    
##  Median :3    Median :6283   Median :74.04   Median :74.19    
##  Mean   :3    Mean   :6641   Mean   :75.04   Mean   :74.41    
##  3rd Qu.:3    3rd Qu.:8592   3rd Qu.:87.16   3rd Qu.:87.10    
##  Max.   :3    Max.   :9803   Max.   :99.30   Max.   :99.65    
##   department       
##  Length:54         
##  Class :character  
##  Mode  :character  
##                    
##                    
##

## ## 📌 Company 2 
##    company_id     salary       KPI_score     performance_score
##  Min.   :2    Min.   :3044   Min.   :52.91   Min.   :50.92    
##  1st Qu.:2    1st Qu.:4523   1st Qu.:59.86   1st Qu.:61.67    
##  Median :2    Median :6370   Median :67.70   Median :75.95    
##  Mean   :2    Mean   :6260   Mean   :71.20   Mean   :75.71    
##  3rd Qu.:2    3rd Qu.:7841   3rd Qu.:82.89   3rd Qu.:90.64    
##  Max.   :2    Max.   :9996   Max.   :99.83   Max.   :98.96    
##   department       
##  Length:54         
##  Class :character  
##  Mode  :character  
##                    
##                    
##

## ## 📌 Company 1 
##    company_id     salary       KPI_score     performance_score
##  Min.   :1    Min.   :3236   Min.   :50.02   Min.   :51.87    
##  1st Qu.:1    1st Qu.:5007   1st Qu.:65.52   1st Qu.:65.85    
##  Median :1    Median :6811   Median :76.56   Median :78.52    
##  Mean   :1    Mean   :6524   Mean   :77.18   Mean   :78.40    
##  3rd Qu.:1    3rd Qu.:8083   3rd Qu.:89.15   3rd Qu.:91.57    
##  Max.   :1    Max.   :9785   Max.   :99.56   Max.   :98.45    
##   department       
##  Length:42         
##  Class :character  
##  Mode  :character  
##                    
##                    
##

8.1 Interpretasi

The resulting dashboard provides a comprehensive overview of company performance. Visualizations such as bar charts and scatter plots enable the identification of relationship patterns between variables, for example, between salary and KPI.

Referensi

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer.
VanderPlas, J. (2016). Python Data Science Handbook. O’Reilly.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer.