Functions & Loops + Data Science

Assignment ~ Week 5

Adinda Maiza Ishfahani

Data Science Undergraduate at ITSB

NIM : 52250074

1 Dynamic Multi-Formula Function

In this task, a dynamic function was developed that is capable of handling various types of mathematical equations, specifically linear, quadratic, cubic, and exponential equations.

library(ggplot2)
library(tidyr)
library(plotly)

compute_formula <- function(x, formulas){
  results <- data.frame(x=x)
  
  for(f in formulas){
    if(f=="linear"){
      results[[f]] <- 2*x + 3
    } else if(f=="quadratic"){
      results[[f]] <- x^2 + 2*x + 1
    } else if(f=="cubic"){
      results[[f]] <- x^3
    } else if(f=="exponential"){
      results[[f]] <- exp(x)
    } else {
      stop("Invalid formula")
    }
  }
  return(results)
}

x <- 1:20
df <- compute_formula(x, c("linear","quadratic","cubic"))

df_long <- tidyr::pivot_longer(df, -x)

p <- ggplot(df_long, aes(x, value, color=name)) +
  geom_line() +
  ggtitle("Multi Formula Plot")

# Ubah jadi interaktif
ggplotly(p)

1.1 Interpretasi

The resulting visualization illustrates differences in growth characteristics among functions. Linear functions show constant growth, while polynomial and exponential functions demonstrate significantly accelerated growth. This is relevant in various applications, such as modeling economic growth, population dynamics, and trend analysis in data.

2 Nested Simulation: Multi-Sales & Discounts

sales data is simulated using a nested loop approach, representing the interaction between multiple salespeople and specific time periods.

library(ggplot2)
library(dplyr)
library(plotly)

simulate_sales <- function(n_salesperson, days){
  data <- data.frame()
  
  for(i in 1:n_salesperson){
    for(d in 1:days){
      sales <- runif(1, 100, 1000)
      discount <- ifelse(sales > 700, 0.2, 0.1)
      
      data <- rbind(data, data.frame(
        salesperson=i, day=d,
        sales=sales, discount=discount
      ))
    }
  }
  
  data <- data %>%
    group_by(salesperson) %>%
    mutate(cumulative = cumsum(sales))
  
  return(data)
}

sales_data <- simulate_sales(3,10)

p <- ggplot(sales_data, aes(day, cumulative, color=factor(salesperson))) +
  geom_line() +
  ggtitle("Cumulative Sales")

# Ubah jadi interaktif
ggplotly(p)

2.1 Interpretasi

The simulation results show that high-performing salespeople exhibit a steeper increase in their cumulative curves. Furthermore, the implementation of conditional discounts reflects a business strategy that is adaptive to sales volume.

3 Multi-Level Performance Categorization

library(plotly)

categorize_performance <- function(sales){
  categories <- c()
  
  for(s in sales){
    if(s > 800) categories <- c(categories,"Excellent")
    else if(s > 600) categories <- c(categories,"Very Good")
    else if(s > 400) categories <- c(categories,"Good")
    else if(s > 200) categories <- c(categories,"Average")
    else categories <- c(categories,"Poor")
  }
  
  return(categories)
}

cats <- categorize_performance(sales_data$sales)
perf_table <- as.data.frame(table(cats))

# Plot interaktif
plot_ly(perf_table,
        x = ~cats,
        y = ~Freq,
        type = "bar") %>%
  
  layout(
    title = "Performance Distribution",
    width = 500,   # lebar (px)
    height = 350   # tinggi (px)
  )
library(plotly)

# Fungsi kategorisasi performa
categorize_performance <- function(sales){
  categories <- c()
  
  for(s in sales){
    if(s > 800){
      categories <- c(categories, "Excellent")
    } else if(s > 600){
      categories <- c(categories, "Very Good")
    } else if(s > 400){
      categories <- c(categories, "Good")
    } else if(s > 200){
      categories <- c(categories, "Average")
    } else {
      categories <- c(categories, "Poor")
    }
  }
  
  return(categories)
}

# Data
sales_data <- runif(100, 100, 1000)

# Kategorisasi
categories <- categorize_performance(sales_data)

# Hitung frekuensi
counts <- as.data.frame(table(categories))

# Pie interaktif
plot_ly(counts,
        labels = ~categories,
        values = ~Freq,
        type = 'pie',
        textinfo = 'label+percent') %>%
  layout(
    title = "Distribusi Kategori Performa Penjualan",
    width = 500,
    height = 400
  )

3.1 Interpretasi

The performance category distribution provides an overview of sales quality. The proportion of specific categories can serve as an indicator for evaluating organizational performance. Visualizations in the form of bar charts and pie charts significantly enhance the readability of the information.

4 Multi-Company Dataset Simulation

knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
library(knitr)

generate_company_data <- function(n_company, n_employees){
  data <- data.frame()
  
  for(c in 1:n_company){
    for(e in 1:n_employees){
      salary <- runif(1, 3000, 10000)
      perf <- runif(1, 50, 100)
      kpi <- runif(1, 50, 100)
      
      data <- rbind(data, data.frame(
        company=c, employee=e,
        salary=salary,
        performance=perf,
        KPI=kpi
      ))
    }
  }
  return(data)
}

company_data <- generate_company_data(3, 50)

summary_table <- company_data %>%
  summarise(
    across(
      c(salary, performance, KPI),
      list(
        Min = min,
        Q1 = ~quantile(. , 0.25),
        Median = median,
        Mean = mean,
        Q3 = ~quantile(. , 0.75),
        Max = max
      )
    )
  )

kable(summary_table, caption = "Summary Statistik Data Perusahaan")
Summary Statistik Data Perusahaan
salary_Min salary_Q1 salary_Median salary_Mean salary_Q3 salary_Max performance_Min performance_Q1 performance_Median performance_Mean performance_Q3 performance_Max KPI_Min KPI_Q1 KPI_Median KPI_Mean KPI_Q3 KPI_Max
3113.713 4888.721 6741.415 6631.755 8369.591 9991.177 50.23825 64.14734 77.66848 75.9003 88.35374 99.94609 50.19821 60.44406 72.90977 74.16575 87.98709 99.91977

4.1 Interpretasi

This analysis enables the identification of top-performing companies based on indicators such as average KPIs and salaries. Furthermore, determining top performers based on specific KPI thresholds reflects performance evaluation practices in modern organizations.

5 Monte Carlo Simulation: Estimasi π dan Probabilitas

library(plotly)

monte_carlo_pi <- function(n){
  x <- runif(n)
  y <- runif(n)
  
  inside <- (x^2 + y^2) <= 1
  pi_est <- 4 * mean(inside)
  
  data <- data.frame(
    x = x,
    y = y,
    inside = ifelse(inside, "Inside", "Outside")
  )
  
  plot_ly(data,
          x = ~x,
          y = ~y,
          color = ~inside,
          colors = c("blue", "red"),
          type = "scatter",
          mode = "markers") %>%
    layout(
      title = paste("Estimasi Pi (Monte Carlo):", round(pi_est,4)),
      xaxis = list(title = "X"),
      yaxis = list(title = "Y")
    )
}

monte_carlo_pi(1000)

5.1 Interpretasi

This approach demonstrates that mathematical problems can be solved through probability-based simulations. This concept is widely used in various fields, such as finance, artificial intelligence, and risk analysis.

6 Advanced Data Transformation & Feature Engineering

normalize_columns <- function(df){
  for(col in names(df)){
    if(is.numeric(df[[col]])){
      df[[col]] <- (df[[col]] - min(df[[col]])) /
                   (max(df[[col]]) - min(df[[col]]))
    }
  }
  return(df)
}

norm_data <- normalize_columns(company_data)
hist(norm_data$salary, main="Normalized Salary")

6.1 Interpretasi

The comparison of distributions before and after transformation shows a significant change in data spread. This is essential as a preprocessing step prior to the application of machine learning models.

7 Mini Project: Company KPI Dashboard & Simulation

summary_company <- company_data %>%
  group_by(company) %>%
  summarise(avg_salary=mean(salary),
            avg_KPI=mean(KPI))

ggplot(summary_company, aes(factor(company), avg_salary)) +
  geom_bar(stat="identity") +
  ggtitle("Avg Salary per Company")

# Load library
library(ggplot2)
library(plotly)

# 1. Menyiapkan data (sesuai kode Anda)
set.seed(123)
df_company <- data.frame(
  salary = runif(100, 3000, 10000),
  KPI_score = runif(100, 50, 100),
  company_id = sample(1:3, 100, replace = TRUE)
)

# 2. Membuat ggplot statis dan menyimpannya dalam variabel 'p'
p <- ggplot(df_company, aes(x = salary, y = KPI_score, color = factor(company_id))) +
  geom_point(size = 2, alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Hubungan Salary dengan KPI Score (Interaktif)",
    x = "Salary",
    y = "KPI Score",
    color = "Company ID"
  ) +
  theme_minimal()

# 3. Mengubah menjadi interaktif
ggplotly(p)
## `geom_smooth()` using formula = 'y ~ x'

7.1 Interpretasi

The resulting dashboard provides a comprehensive overview of company performance. Visualizations such as bar charts and scatter plots enable the identification of patterns and relationships between variables, such as the correlation between salary and KPI.

8 Automated Report Generation

library(ggplot2)
library(dplyr)

# Pastikan sudah ada data dari Task 4
# Jika belum, generate ulang:
set.seed(123)
df_company <- data.frame(
  company_id = sample(1:3, 150, replace = TRUE),
  salary = runif(150, 3000, 10000),
  KPI_score = runif(150, 50, 100),
  performance_score = runif(150, 50, 100),
  department = sample(c("IT","HR","Finance","Marketing"), 150, replace = TRUE)
)

# Loop otomatis per company
for(c in unique(df_company$company_id)){
  
  cat("## 📌 Company", c, "\n")
  
  data_subset <- df_company %>% filter(company_id == c)
  
  # 1. Summary Table
  print(summary(data_subset))
  
  # 2. Bar Plot: Department Distribution
  p1 <- ggplot(data_subset, aes(x = department, fill = department)) +
    geom_bar() +
    labs(title = paste("Department Distribution - Company", c),
         x = "Department", y = "Count") +
    theme_minimal()
  
  print(p1)
  
  # 3. Scatter Plot: Salary vs KPI
  p2 <- ggplot(data_subset, aes(x = salary, y = KPI_score)) +
    geom_point(color = "blue", alpha = 0.6) +
    geom_smooth(method = "lm", se = FALSE, color = "red") +
    labs(title = paste("Salary vs KPI - Company", c),
         x = "Salary", y = "KPI Score") +
    theme_minimal()
  
  print(p2)
  
  # 4. Histogram Salary
  p3 <- ggplot(data_subset, aes(x = salary)) +
    geom_histogram(bins = 15, fill = "skyblue", color = "black") +
    labs(title = paste("Salary Distribution - Company", c),
         x = "Salary", y = "Frequency") +
    theme_minimal()
  
  print(p3)
  
}
## ## 📌 Company 3 
##    company_id     salary       KPI_score     performance_score
##  Min.   :3    Min.   :3140   Min.   :51.40   Min.   :50.81    
##  1st Qu.:3    1st Qu.:5074   1st Qu.:62.17   1st Qu.:64.35    
##  Median :3    Median :6283   Median :74.04   Median :74.19    
##  Mean   :3    Mean   :6641   Mean   :75.04   Mean   :74.41    
##  3rd Qu.:3    3rd Qu.:8592   3rd Qu.:87.16   3rd Qu.:87.10    
##  Max.   :3    Max.   :9803   Max.   :99.30   Max.   :99.65    
##   department       
##  Length:54         
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

## ## 📌 Company 2 
##    company_id     salary       KPI_score     performance_score
##  Min.   :2    Min.   :3044   Min.   :52.91   Min.   :50.92    
##  1st Qu.:2    1st Qu.:4523   1st Qu.:59.86   1st Qu.:61.67    
##  Median :2    Median :6370   Median :67.70   Median :75.95    
##  Mean   :2    Mean   :6260   Mean   :71.20   Mean   :75.71    
##  3rd Qu.:2    3rd Qu.:7841   3rd Qu.:82.89   3rd Qu.:90.64    
##  Max.   :2    Max.   :9996   Max.   :99.83   Max.   :98.96    
##   department       
##  Length:54         
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

## ## 📌 Company 1 
##    company_id     salary       KPI_score     performance_score
##  Min.   :1    Min.   :3236   Min.   :50.02   Min.   :51.87    
##  1st Qu.:1    1st Qu.:5007   1st Qu.:65.52   1st Qu.:65.85    
##  Median :1    Median :6811   Median :76.56   Median :78.52    
##  Mean   :1    Mean   :6524   Mean   :77.18   Mean   :78.40    
##  3rd Qu.:1    3rd Qu.:8083   3rd Qu.:89.15   3rd Qu.:91.57    
##  Max.   :1    Max.   :9785   Max.   :99.56   Max.   :98.45    
##   department       
##  Length:42         
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

8.1 Interpretasi

The resulting dashboard provides a comprehensive overview of company performance. Visualizations such as bar charts and scatter plots enable the identification of relationship patterns between variables, for example, between salary and KPI.

Referensi

  • Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer.

  • VanderPlas, J. (2016). Python Data Science Handbook. O’Reilly.

  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer.