Tugas Week 5 ~ Functions & Loops

Fifi Muthia Pitaloka

NIM: 52250038

Dosen Pengampu: Bakti Siregar, M.Sc., CDS.

Mata Kuliah: Data Science Programming I

Program Studi: Sains Data

Institut Teknologi Sains Bandung

Introduction

In data science, functions and loops are key for automating calculations and data processing efficiently. This practicum aims to apply concepts like functions, nested loops, and conditional logic in various data simulations, statistical analysis, and interactive visualizations using R. Through the given tasks, students will learn to build structured and systematic data analysis workflows.

Task 1

Dynamic Multi-Formula Function

In this task, create a function to calculate various mathematical formulas such as linear, quadratic, cubic, and exponential. Use nested loops and conditional logic for the calculations, then visualize the results in a single interactive plot.

library(ggplot2)
library(plotly)
library(dplyr)

compute_formula <- function(x, formulas) {
  
  results <- data.frame()
  
  for (f in formulas) {
    
    if (!(f %in% c("linear", "quadratic", "cubic", "exponential"))) {
      next
    }
    
    for (val in x) {
      
      if (f == "linear") {
        y <- 2*val + 3
      }
      
      else if (f == "quadratic") {
        y <- val^2 + 2*val + 1
      }
      
      else if (f == "cubic") {
        y <- val^3 - 3*val^2 + val
      }
      
      else if (f == "exponential") {
        y <- 2^val
      }
      
      results <- rbind(results,
                       data.frame(
                         x = val,
                         y = y,
                         formula = f
                       ))
    }
  }
  
  return(results)
}

x_values <- 1:20

formula_list <- c("linear", "quadratic", "cubic", "exponential")

formula_data <- compute_formula(x_values, formula_list)

p <- ggplot(formula_data,
            aes(x = x,
                y = y,
                color = formula,
                group = formula)) +
  
  geom_line() +        
  geom_point() +       
  
  labs(
    title = "Multiple Mathematical Formulas Plot",
    x = "X Values",
    y = "Y Values"
  ) +
  
  theme_minimal()

interactive_plot <- ggplotly(p)

interactive_plot

Interpretation

The graph compares the growth of values from four types of formulas linear, quadratic, cubic, and exponential against x values from 1 to 20. The linear formula shows steady, constant growth, while quadratic and cubic increase more rapidly as x gets larger. The exponential formula has the sharpest rise, indicating that exponential growth far outpaces polynomial growth.

Task 2

Nested Simulation: Multi-Sales & Discounts

In this task, create a function to simulate sales data for multiple salespeople over several days. Use nested loops and conditional logic to apply discounts based on sales volume, calculate cumulative sales, and display an interactive visualization of the results.

library(dplyr)
library(ggplot2)
library(plotly)
library(DT)

df <- read.csv("sales_simulation_fixed.csv")

datatable(df,
          caption  = "Dataset: Sales Simulation",
          options  = list(pageLength = 10, scrollX = TRUE),
          rownames = FALSE)
PASTEL <- c(
  SP1 = '#FFB3C1',
  SP2 = '#FFD6A5',
  SP3 = '#CAFFBF',
  SP4 = '#9BF6FF',
  SP5 = '#BDB2FF'
)

simulate_sales <- function(data) {

  apply_discount <- function(x) {
    if      (x > 900) return(0.20)
    else if (x > 700) return(0.15)
    else if (x > 500) return(0.10)
    else if (x > 300) return(0.05)
    else               return(0.00)
  }

  cumulative_sales_func <- function(sales_vec) {
    total  <- 0
    result <- c()
    for (s in sales_vec) {
      total  <- total + s
      result <- c(result, total)
    }
    return(result)
  }

  sales_ids  <- unique(data$salesperson)
  final_data <- data.frame()

  for (sp in sales_ids) {

    temp <- data %>% filter(salesperson == sp)

    temp$discount_rate <- sapply(temp$sales_amount, apply_discount)
    temp$net_sales     <- temp$sales_amount * (1 - temp$discount_rate)

    temp$cumulative_sales <- cumulative_sales_func(temp$net_sales)

    final_data <- rbind(final_data, temp)
  }

  return(final_data)
}

result <- simulate_sales(df)

summary_sales <- result %>%
  group_by(salesperson) %>%
  summarise(
    total_net_sales = round(sum(net_sales),     2),
    avg_sales       = round(mean(sales_amount), 2),
    max_sales       = round(max(sales_amount),  2),
    avg_discount    = scales::percent(mean(discount_rate), accuracy = 0.1),
    .groups = "drop"
  )

datatable(
  summary_sales,
  caption  = "Summary Statistics per Salesperson",
  options  = list(pageLength = 10, scrollX = TRUE),
  rownames = FALSE
)
p <- ggplot(result,
            aes(x      = day,
                y      = cumulative_sales,
                color  = salesperson,
                group  = salesperson,
                # extra info shown in tooltip
                text   = paste0(
                  "Salesperson: ", salesperson,
                  "<br>Day: ",     day,
                  "<br>Net Sales: $", round(net_sales, 2),
                  "<br>Cumulative: $", round(cumulative_sales, 2),
                  "<br>Discount: ", scales::percent(discount_rate)
                ))) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 2.5) +
  scale_color_manual(values = PASTEL) +
  scale_x_continuous(breaks = 1:10) +
  labs(
    title  = "Cumulative Net Sales per Salesperson",
    x      = "Day",
    y      = "Cumulative Net Sales ($)",
    color  = "Salesperson"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.background  = element_rect(fill = "#FFF9F9", color = NA),
    panel.background = element_rect(fill = "#FFF9F9", color = NA),
    panel.grid.major = element_line(color = "#F0E6EE"),
    axis.text        = element_text(color = "#9B7BB8"),
    axis.title       = element_text(color = "#7B5EA7"),
    plot.title       = element_text(color = "#7B5EA7", face = "bold"),
    legend.title     = element_text(color = "#7B5EA7"),
    legend.text      = element_text(color = "#9B7BB8")
  )

ggplotly(p, tooltip = "text") %>%
  layout(
    hovermode = "x unified",
    legend    = list(title = list(text = "<b>Salesperson</b>"))
  )

Interpretation

The graph shows the progression of cumulative sales for each salesperson over several days. Cumulative sales steadily increase day by day due to the accumulation of daily sales. Differences in line heights indicate that some salespeople have higher sales performance than others.

Task 3

Multi-Level Performance Categorization

In this task, create a function to group sales data into five performance categories: Excellent, Very Good, Good, Average, and Poor. Use loops to calculate the count and percentage for each category, then visualize the results with an bar plot and pie chart.

library(dplyr)
library(ggplot2)
library(plotly)

sales_dataset <- read.csv("sales_simulation_fixed.csv")

categorize_performance <- function(sales_amount) {
  
  categories <- c()
  
  for (val in sales_amount) {
    if      (val >= 800) cat <- "Excellent"
    else if (val >= 600) cat <- "Very Good"
    else if (val >= 400) cat <- "Good"
    else if (val >= 200) cat <- "Average"
    else                 cat <- "Poor"
    
    categories <- c(categories, cat)
  }
  
  return(categories)
}

sales_dataset$performance_category <- categorize_performance(
  sales_dataset$sales_amount
)

category_levels <- c("Excellent", "Very Good", "Good", "Average", "Poor")

sales_dataset$performance_category <- factor(
  sales_dataset$performance_category,
  levels = category_levels
)

category_summary <- sales_dataset %>%
  group_by(performance_category) %>%
  summarise(count = n(), .groups = "drop") %>%
  tidyr::complete(
    performance_category = category_levels,
    fill = list(count = 0)
  ) %>%
  mutate(
    percentage = round(count / sum(count) * 100, 2),
    label      = paste0(percentage, "%")
  ) %>%
  arrange(desc(count))

knitr::kable(
  category_summary %>% select(-label),
  caption = "Performance Category Distribution"
)
Performance Category Distribution
performance_category count percentage
Average 12 24
Poor 11 22
Very Good 11 22
Excellent 9 18
Good 7 14
PASTEL <- c(
  "Excellent" = "#BDB2FF",
  "Very Good" = "#9BF6FF",
  "Good"      = "#CAFFBF",
  "Average"   = "#FFD6A5",
  "Poor"      = "#FFB3C1"
)

p_bar <- ggplot(category_summary,
                aes(x    = performance_category,
                    y    = count,
                    fill = performance_category,
                    text = paste0("Category: ", performance_category,
                                  "<br>Count: ", count,
                                  "<br>Percentage: ", label))) +
  geom_col(width = 0.6, show.legend = FALSE) +
  geom_text(aes(label = label), vjust = -0.5,
            color = "#7B5EA7", fontface = "bold", size = 3.5) +
  scale_fill_manual(values = PASTEL) +
  scale_x_discrete(limits = category_levels) +
  labs(
    title = "Sales Performance Category Distribution",
    x     = "Performance Category",
    y     = "Count"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.background  = element_rect(fill = "#FFF9F9", color = NA),
    panel.background = element_rect(fill = "#FFF9F9", color = NA),
    panel.grid.major = element_line(color = "#F0E6EE"),
    axis.text        = element_text(color = "#9B7BB8"),
    axis.title       = element_text(color = "#7B5EA7"),
    plot.title       = element_text(color = "#7B5EA7", face = "bold")
  )

ggplotly(p_bar, tooltip = "text")
plot_ly(
  category_summary,
  labels  = ~performance_category,
  values  = ~count,
  type    = "pie",
  marker  = list(colors = unname(PASTEL[category_levels])),
  textinfo      = "label+percent",
  hovertemplate = "<b>%{label}</b><br>Count: %{value}<br>Percentage: %{percent}<extra></extra>"
) %>%
  layout(
    title = list(
      text = "Sales Performance Category Distribution",
      font = list(color = "#7B5EA7", size = 16)
    ),
    paper_bgcolor = "#FFF9F9",
    showlegend    = TRUE
  )

Interpretation

Based on the data, the Average category has the highest number of sales at 12 entries (24%), showing that most sales fall into the moderate performance level. Poor and Very Good categories each have 11 entries (22%), indicating a balance between low and high performance. Excellent has 9 entries (18%), meaning a good portion of sales achieve top performance. Meanwhile, Good has the fewest at 7 entries (14%). Overall, sales performance is spread across various categories.

Task 4

Multi-Company Dataset Simulation

In this task, create a company dataset simulation using a function with nested loops, including company_id, employee_id, salary, department, performance_score, and KPI_score. Use conditional logic to identify top performers (KPI_score > 90), compute per-company summaries like average salary, average performance, and max KPI, and visualize them interactively.

library(ggplot2)
library(dplyr)
library(plotly)
library(DT)
library(htmltools)
library(readr)

company_data <- read_csv("company_dataset.csv")
## Rows: 60 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): company_id, employee_id, department, top_performer
## dbl (3): salary, performance_score, KPI_score
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
datatable(
  company_data,
  caption = "Table: Company Employee Dataset",
  options = list(
    pageLength = 10,      
    lengthMenu = c(5,10,15,20),
    autoWidth = TRUE
  )
)
library(ggplot2)
library(dplyr)
library(plotly)
library(DT)
library(htmltools)
library(readr)

company_data <- read_csv("company_dataset.csv")
## Rows: 60 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): company_id, employee_id, department, top_performer
## dbl (3): salary, performance_score, KPI_score
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary_company <- company_data %>%
  group_by(company_id) %>%
  summarise(
    avg_salary = round(mean(salary),0),
    avg_performance = round(mean(performance_score),2),
    max_KPI = max(KPI_score),
    total_top_performer = sum(top_performer == "Yes")
  )

datatable(
  summary_company,
  caption = "Table: Company Summary Statistics",
  options = list(
    pageLength = 10,
    lengthMenu = c(5,10,15,20),
    autoWidth = TRUE
  )
)
summary_df <- summary_company %>%
  arrange(desc(avg_salary))

p1 <- ggplot(summary_df,
             aes(x=reorder(factor(company_id), -avg_salary),
                 y=avg_salary)) +
geom_bar(stat="identity", fill="pink") +

geom_text(aes(label=round(avg_salary,0)),
          nudge_y=200) +

labs(
title="Average Salary per Company",
x="Company ID",
y="Average Salary"
) +
theme_minimal() +
theme(
plot.title=element_text(
hjust=0.5,
face="bold",
size=18)
)

ggplotly(p1)
summary_df2 <- summary_company %>%
  arrange(desc(avg_performance))

p2 <- ggplot(summary_df2,
             aes(x=reorder(factor(company_id), -avg_performance),
                 y=avg_performance)) +
geom_bar(stat="identity", fill="purple") +

geom_text(aes(label=round(avg_performance,1)),
          nudge_y=1) +

labs(
title="Average Performance per Company",
x="Company ID",
y="Average Performance Score"
) +
theme_minimal() +
theme(
plot.title=element_text(
hjust=0.5,
face="bold",
size=18)
)

ggplotly(p2)
summary_df2 <- summary_company %>%
  arrange(desc(avg_performance))

p2 <- ggplot(summary_df2,
             aes(x=reorder(factor(company_id), -avg_performance),
                 y=avg_performance)) +

geom_bar(stat="identity", fill="lightblue") +

geom_text(aes(label=round(avg_performance,1)),
          nudge_y=1) +

labs(
title="Maximum KPI per Company",
x="Company ID",
y="Average Performance Score"
) +

theme_minimal() +

theme(
plot.title=element_text(
hjust=0.5,
face="bold",
size=18)
)

ggplotly(p2)

Interpretation

Based on the company data simulation, average salaries vary across companies, with the highest indicating better compensation. Average performance remains stable and solid overall, despite minor differences. Maximum KPI graphs show high peaks in each company, highlighting top performers (KPI > 90) who drive success. Overall, salary, performance, and KPI distributions are balanced, with some companies excelling in specific areas.

Task 5

Monte Carlo Simulation: Pi & Probability

In this task, Monte Carlo simulation approximates π by generating random points in a 2D plane, checking if they fall inside a unit circle, and using the ratio of points inside to total points; it also analyzes the probability of points landing in specific sub-squares, with results visualized as a scatter plot distinguishing inside and outside areas.

library(dplyr)
library(ggplot2)
library(plotly)
library(knitr)

monte_carlo_pi <- function(n_points){

  x <- runif(n_points, 0, 1)
  y <- runif(n_points, 0, 1)

  inside_circle <- (x^2 + y^2) <= 1

  inside_subsquare <- (x <= 0.5 & y <= 0.5)

  count_inside_circle <- sum(inside_circle)
  count_subsquare <- sum(inside_subsquare)

  pi_estimate <- 4 * (count_inside_circle / n_points)

  prob_subsquare <- count_subsquare / n_points

  result_table <- data.frame(
    Total_Points = n_points,
    Points_Inside_Circle = count_inside_circle,
    Pi_Estimate = round(pi_estimate,4),
    Subsquare_Points = count_subsquare,
    Probability_Subsquare = round(prob_subsquare,4)
  )

  point_data <- data.frame(
    x = x,
    y = y,
    inside_circle = inside_circle
  )

  return(list(result_table, point_data))

}

set.seed(123)

simulation <- monte_carlo_pi(1000)

result <- simulation[[1]]
points_data <- simulation[[2]]

kable(
  result,
  caption = "Tabel Hasil Monte Carlo Simulation (Pi & Probability)"
)
Tabel Hasil Monte Carlo Simulation (Pi & Probability)
Total_Points Points_Inside_Circle Pi_Estimate Subsquare_Points Probability_Subsquare
1000 805 3.22 241 0.241
points_data$position <- ifelse(
  points_data$inside_circle == TRUE,
  "Inside Circle",
  "Outside Circle"
)

p_mc <- ggplot(points_data,
               aes(x=x, y=y, color=position)) +

geom_point(alpha=0.6, size=1.5) +

labs(
title="Monte Carlo Simulation: Points Inside vs Outside Circle",
x="X Coordinate",
y="Y Coordinate",
color="Point Position"
) +

theme_minimal() +

theme(
plot.title=element_text(
hjust=0.5,
face="bold",
size=16)
)

ggplotly(p_mc)

Interpretation

The Monte Carlo simulation yields a π estimate of 3.22 (close to 3.14) and sub-square probability of 0.241 (near 0.25), with minor differences due to randomness; more points improve accuracy toward theoretical area values.

The scatter plot shows inside-circle points forming a quarter-circle pattern and outside points scattered, demonstrating Monte Carlo’s effectiveness for estimating π and probabilities despite random variations.

Task 6

Advanced Data Transformation & Feature Engineering

In this task, focuses on data transformation and feature engineering using normalization (0-1) and z-score standardization for salary, performance_score, and KPI_score, plus new features like performance_category and salary_bracket; pre- and post-transformation distributions are compared via histograms and boxplots to understand data structure changes.

library(dplyr)
library(readr)
library(DT)

df <- read_csv("company_dataset.csv")
## Rows: 60 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): company_id, employee_id, department, top_performer
## dbl (3): salary, performance_score, KPI_score
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df <- df %>%
  mutate(
    salary_norm = (salary - min(salary, na.rm=TRUE)) / (max(salary, na.rm=TRUE) - min(salary, na.rm=TRUE)),
    performance_norm = (performance_score - min(performance_score, na.rm=TRUE)) / (max(performance_score, na.rm=TRUE) - min(performance_score, na.rm=TRUE)),
    KPI_norm = (KPI_score - min(KPI_score, na.rm=TRUE)) / (max(KPI_score, na.rm=TRUE) - min(KPI_score, na.rm=TRUE)),
    
    salary_z = (salary - mean(salary, na.rm=TRUE)) / sd(salary, na.rm=TRUE),
    performance_z = (performance_score - mean(performance_score, na.rm=TRUE)) / sd(performance_score, na.rm=TRUE),
    KPI_z = (KPI_score - mean(KPI_score, na.rm=TRUE)) / sd(KPI_score, na.rm=TRUE),
    
    performance_category = case_when(
      performance_score >= 85 ~ "Excellent",
      performance_score >= 70 ~ "Good",
      performance_score >= 55 ~ "Average",
      performance_score >= 40 ~ "Below Average",
      TRUE ~ "Poor"
    ),
    
    salary_bracket = cut(salary, breaks = 3, labels = c("Low", "Medium", "High"))
  )

datatable(
  df,
  caption = "Table: Employee Dataset with Transformation & Feature Engineering",
  options = list(
    pageLength = 10,
    lengthMenu = c(5,10,20,50),
    scrollX = TRUE,
    autoWidth = TRUE
  )
)
library(dplyr)
library(DT)

summary_dept <- df %>%
  group_by(department) %>%
  summarise(
    avg_salary = round(mean(salary, na.rm=TRUE),0),
    avg_salary_norm = round(mean(salary_norm, na.rm=TRUE),2),
    
    avg_performance = round(mean(performance_score, na.rm=TRUE),2),
    avg_performance_norm = round(mean(performance_norm, na.rm=TRUE),2),
    
    avg_KPI = round(mean(KPI_score, na.rm=TRUE),2),
    avg_KPI_norm = round(mean(KPI_norm, na.rm=TRUE),2),
    
    total_employee = n()
  )

datatable(
  summary_dept,
  caption = "Table: Department Summary After Transformation",
  options = list(
    pageLength = 10,
    scrollX = TRUE,      
    autoWidth = TRUE
  )
)
library(ggplot2)
library(plotly)

company <- read.csv("company_dataset.csv")

normalize <- function(x){
  (x - min(x, na.rm=TRUE)) / (max(x, na.rm=TRUE) - min(x, na.rm=TRUE))
}

company$salary_norm <- normalize(company$salary)

df_plot <- data.frame(
  value = c(company$salary, company$salary_norm * max(company$salary)),
  type = c(rep("Before", nrow(company)),
           rep("After", nrow(company)))
)

p <- ggplot(df_plot, aes(x = value, fill = type)) +
  geom_histogram(alpha = 0.5, bins = 30, position = "identity") +
  
  scale_fill_manual(values = c(
    "Before" = "purple",  
    "After"  = "pink"   
  )) +
  
  labs(
    title = "Salary Distribution: Before vs After Normalization",
    x = "Salary",
    y = "Count",
    fill = "Condition"
  ) +
  
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    legend.position = "right"
  )

ggplotly(p)
library(ggplot2)
library(plotly)

company <- read.csv("company_dataset.csv")

normalize <- function(x){
  (x - min(x, na.rm=TRUE)) / (max(x, na.rm=TRUE) - min(x, na.rm=TRUE))
}

company$salary_norm <- normalize(company$salary)

df_box <- data.frame(
  value = c(company$salary, company$salary_norm * max(company$salary)),
  type = c(rep("Before", nrow(company)),
           rep("After", nrow(company)))
)

p <- ggplot(df_box, aes(x = type, y = value, fill = type)) +
  geom_boxplot(alpha = 0.7) +
  
  scale_fill_manual(values = c(
    "Before" = "purple",  
    "After"  = "pink"   
  )) +
  
  labs(
    title = "Salary Distribution: Before vs After Normalization (Boxplot)",
    x = "Condition",
    y = "Salary",
    fill = "Condition"
  ) +
  
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    legend.position = "none"
  )

ggplotly(p)

Interpretation

Data transformation via normalization and z-score standardizes variable scales for fairer analysis; histograms show salary distribution retains its shape post-normalization, only rescaled, while boxplots indicate stable medians but more standardized spread. New features like performance_category and salary_bracket simplify grouping for performance analysis, enhancing data quality without losing key information.

Task 7

Mini Project: Company KPI Dashboard & Simulation

Introduction

Task 7 is a mini project focused on building a Company KPI Dashboard & Simulation using employee data from multiple companies. The data is analyzed to generate performance summaries like average salary, average KPI, and top performers per company. Employees are grouped by KPI levels and analyzed by department for deeper insights. Results are presented in tables and interactive visualizations to simplify interpretation and support data-driven decisions.

library(ggplot2)
library(dplyr)
library(plotly)
library(DT)
library(htmltools)
library(readr)

company_data <- read_csv("company_dataset7.csv")
## Rows: 411 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): company_id, department
## dbl (4): employee_id, salary, performance_score, KPI_score
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
datatable(
  company_data,
  caption = "Table: Employee Data by Company",
  options = list(
    pageLength = 10,      
    lengthMenu = c(5,10,15,20),
    autoWidth = TRUE
  )
)
library(dplyr)
library(readr)
library(DT)

company_data <- read_csv("company_dataset7.csv")
## Rows: 411 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): company_id, department
## dbl (4): employee_id, salary, performance_score, KPI_score
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
KPI_tier <- c()  

for(i in 1:nrow(company_data)){
  
  if(company_data$KPI_score[i] >= 90){
    KPI_tier[i] <- "High"
    
  } else if(company_data$KPI_score[i] >= 75){
    KPI_tier[i] <- "Medium"
    
  } else {
    KPI_tier[i] <- "Low"
  }
}

company_data$KPI_tier <- KPI_tier

datatable(company_data)
library(DT)
library(htmltools)

summary_kpi <- company_data %>%
  group_by(company_id) %>%
  summarise(
    avg_salary = round(mean(salary, na.rm=TRUE),0),
    avg_KPI = round(mean(KPI_score, na.rm=TRUE),2),
    total_employee = n(),
    total_top_performer = sum(KPI_score >= 90)
  )

datatable(
  summary_kpi,
  caption = tags$caption(
    style = 'caption-side: top; text-align: center; font-weight: bold;',
    'Table: Company KPI Summary (With Loop)'
  ),
  options = list(scrollX = TRUE),
  rownames = FALSE
)
library(ggplot2)
library(plotly)
library(dplyr)

top_perf <- company_data %>%
  filter(KPI_score >= 90) %>%
  count(company_id) %>%
  arrange(desc(n))

p1 <- ggplot(top_perf, aes(x = reorder(company_id, -n), y = n, fill = company_id)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Top Performers per Company",
    x = "Company",
    y = "Number of Top Performers"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

ggplotly(p1)
avg_salary_dept <- company_data %>%
  group_by(department) %>%
  summarise(avg_salary = mean(salary, na.rm=TRUE)) %>%
  arrange(desc(avg_salary))

p2 <- ggplot(avg_salary_dept, aes(x = reorder(department, -avg_salary), y = avg_salary, fill = department)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Average Salary per Department",
    x = "Department",
    y = "Average Salary"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

ggplotly(p2)
company_data$salary_bracket <- cut(
  company_data$salary,
  breaks = 3,
  labels = c("Low", "Medium", "High")
)

salary_dist <- company_data %>%
  count(salary_bracket) %>%
  arrange(desc(n))

p3 <- ggplot(salary_dist, aes(x = reorder(salary_bracket, -n), y = n, fill = salary_bracket)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Salary Distribution",
    x = "Salary Bracket",
    y = "Number of Employees"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

ggplotly(p3)
p4 <- ggplot(company_data, aes(x = salary, y = performance_score)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Salary vs Performance Score",
    x = "Salary",
    y = "Performance Score"
  ) +
  theme_minimal()

ggplotly(p4)
## `geom_smooth()` using formula = 'y ~ x'

Interpretation

From the analysis, different companies have varying numbers of top performers, showing gaps in overall KPI success. The bar chart for average salary by department highlights that some departments get higher pay, likely due to tougher roles or more responsibility.

Most employees land in the mid-range salary bracket, pointing to a fairly even pay setup. The scatter plot of salary vs. performance hints at a positive link, but it’s not super strong—higher pay doesn’t always mean better results.

All in all, pay and performance connect somewhat, but other things clearly play a role too.

Conclusion

This analysis shows that functions, loops, and nested loops are super effective for processing sales data in a structured, efficient way. Using conditional logic for discounts makes net sales calculations more realistic, while cumulative sales makes it easy to track each salesperson’s performance day by day. Statistical summaries also quickly highlight performance differences between individuals.

Overall, the visualizations reveal a steady upward trend in cumulative sales once data is properly processed. This underscores how crucial preprocessing—like sorting and data type adjustments—is for accurate, actionable insights to drive data-based decisions.

Reference

[1] Siregar, B. (2025). Data Science Programming: Study Case Using R and Python. Online module. bookdown.org. Retrieved from https://bookdown.org/dsciencelabs/data_science_programming/03-Functions-and-Loops.html