Chapter 1: From Visualization to Numerical Summaries

What You Will Master Today

Building on our visualization skills from Lecture 1, today we dive deep into the mathematical heart of statistics:

Measures of Central Tendency: Mean, median, and mode - finding the “typical” value
Mathematical Formulas: Understanding the precise calculations behind each measure
Data Type Applications: When to use which measure for different variable types
Practical Calculations: Hand calculations and R implementations
Business Decision Making: Using statistical measures for strategic insights
Distribution Analysis: Understanding what the numbers reveal about data shape

Chapter 2: The Foundation - Understanding Our Data Context

Connecting to Real Business Scenarios

Imagine you’re the lead analyst for a European automotive consortium. Your dataset contains critical intelligence about 190 car models that will inform billion-dollar investment decisions. Every statistical measure we calculate today has direct implications for:

Product Development Strategy: Which price points to target
Market Positioning: Understanding competitive landscapes
Performance Benchmarking: Setting engineering targets
Regional Expansion: Geographic market opportunities

# Load our analytical environment
library(UBStats)

## Package UBStats (0.2.2) loaded.
## To cite, type citation("UBStats")

## Please report improvements and bugs to: https://github.com/raffaellapiccarreta/UBStats/issues

# Load the cars dataset from Lecture 1
# Create the cars dataset for statistical analysis
# This code reproduces the dataset from Lecture 1

# Load required packages
library(UBStats)

# Set seed for reproducible results
set.seed(123)  
n <- 190

# Generate sales data with realistic distribution
low_sales <- sample(500:3000, round(n*0.6), replace = TRUE)
mid_sales <- sample(3000:8000, round(n*0.25), replace = TRUE) 
high_sales <- sample(8000:50000, n - length(low_sales) - length(mid_sales), replace = TRUE)
all_sales <- c(low_sales, mid_sales, high_sales)

# Create the complete cars dataset
cars <- data.frame(
  model = paste("Model", 1:n),
  sales = sample(all_sales),  # Shuffle the sales values
  bestselling = sample(0:1, n, replace = TRUE, prob = c(0.9, 0.1)),
  price_num = round(rnorm(n, 25000, 15000)),
  price_classes = sample(c("low", "mid", "high"), n, replace = TRUE, prob = c(0.27, 0.55, 0.18)),
  maxspeed = round(rnorm(n, 180, 30)),
  acceleration = round(rnorm(n, 11, 3), 1),
  urban_fuelcons = round(rnorm(n, 8, 2), 1),
  fueltank = round(rnorm(n, 60, 15)),
  weight = round(rnorm(n, 1400, 300)),
  n_doors_min = sample(c(2,3,4,5,7), n, replace = TRUE, prob = c(0.09, 0.14, 0.05, 0.71, 0.01)),
  country = sample(c("Germany", "Japan", "France", "Italy", "United States", "Europe - others", "Asia - others"), 
                   n, replace = TRUE, prob = c(0.26, 0.19, 0.15, 0.11, 0.09, 0.14, 0.06))
)

# Clean up unrealistic values
cars$price_num[cars$price_num < 5000] <- cars$price_num[cars$price_num < 5000] + 10000
cars$maxspeed[cars$maxspeed < 100] <- cars$maxspeed[cars$maxspeed < 100] + 50
cars$acceleration[cars$acceleration < 3] <- abs(cars$acceleration[cars$acceleration < 3]) + 5

# Check the data structure
str(cars)

## 'data.frame':    190 obs. of  12 variables:
##  $ model         : chr  "Model 1" "Model 2" "Model 3" "Model 4" ...
##  $ sales         : int  12712 873 1528 23023 2956 1346 3711 2726 37652 6123 ...
##  $ bestselling   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ price_num     : num  32656 42939 8741 7823 27328 ...
##  $ price_classes : chr  "low" "mid" "low" "mid" ...
##  $ maxspeed      : num  169 137 153 214 158 194 171 205 177 159 ...
##  $ acceleration  : num  4.7 11 18.2 9.8 15 12.1 11.8 10.7 9 10.8 ...
##  $ urban_fuelcons: num  6.2 4.7 5.4 7.4 9.1 11.6 6 7 9.9 7.5 ...
##  $ fueltank      : num  65 46 87 39 85 62 60 40 72 79 ...
##  $ weight        : num  1445 1228 1499 1321 1585 ...
##  $ n_doors_min   : num  5 5 5 4 5 2 5 2 5 5 ...
##  $ country       : chr  "Japan" "Italy" "Italy" "Japan" ...

head(cars)

# Save the dataset
save(cars, file = "stat_datasets_cl17.Rdata")

# Confirm the file was created
cat("✅ Dataset created successfully!\n")

## ✅ Dataset created successfully!

cat("📊 Dataset contains", nrow(cars), "car models with", ncol(cars), "variables\n")

## 📊 Dataset contains 190 car models with 12 variables

cat("💾 Saved as: stat_datasets_cl17.Rdata\n")

## 💾 Saved as: stat_datasets_cl17.Rdata

cat("📁 Location:", getwd(), "\n")

## 📁 Location: C:/Users/ENDRI/Desktop/Virtus

# Quick preview of the data
cat("\n📋 Dataset Summary:\n")

## 
## 📋 Dataset Summary:

summary(cars[c("price_num", "maxspeed", "acceleration")])

##    price_num        maxspeed      acceleration   
##  Min.   :-1860   Min.   :103.0   Min.   : 3.200  
##  1st Qu.:13574   1st Qu.:159.2   1st Qu.: 8.825  
##  Median :24177   Median :178.0   Median :10.900  
##  Mean   :24267   Mean   :180.6   Mean   :10.837  
##  3rd Qu.:33195   3rd Qu.:203.0   3rd Qu.:12.975  
##  Max.   :61570   Max.   :279.0   Max.   :18.200

# Quick reminder of our data structure
str(cars)

## 'data.frame':    190 obs. of  12 variables:
##  $ model         : chr  "Model 1" "Model 2" "Model 3" "Model 4" ...
##  $ sales         : int  12712 873 1528 23023 2956 1346 3711 2726 37652 6123 ...
##  $ bestselling   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ price_num     : num  32656 42939 8741 7823 27328 ...
##  $ price_classes : chr  "low" "mid" "low" "mid" ...
##  $ maxspeed      : num  169 137 153 214 158 194 171 205 177 159 ...
##  $ acceleration  : num  4.7 11 18.2 9.8 15 12.1 11.8 10.7 9 10.8 ...
##  $ urban_fuelcons: num  6.2 4.7 5.4 7.4 9.1 11.6 6 7 9.9 7.5 ...
##  $ fueltank      : num  65 46 87 39 85 62 60 40 72 79 ...
##  $ weight        : num  1445 1228 1499 1321 1585 ...
##  $ n_doors_min   : num  5 5 5 4 5 2 5 2 5 5 ...
##  $ country       : chr  "Japan" "Italy" "Italy" "Japan" ...

head(cars, 5)

cat("📊 Dataset Overview:\n")

## 📊 Dataset Overview:

cat("   Total Models:", nrow(cars), "\n")

##    Total Models: 190

cat("   Variables:", ncol(cars), "\n")

##    Variables: 12

cat("   Geographic Coverage:", length(unique(cars$country)), "countries\n")

##    Geographic Coverage: 7 countries

Chapter 3: Measures of Central Tendency - Finding the “Typical” Value

The concept of central tendency answers the fundamental question: “What is the typical value in our dataset?” However, “typical” can mean different things depending on context and data characteristics.

3.1 The Arithmetic Mean (x̄) - The Mathematical Center

📐 Mathematical Foundation

The arithmetic mean represents the mathematical center of gravity for your data.

For Raw Data (Ungrouped): \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{x_1 + x_2 + ... + x_n}{n}\]

For Frequency Data (Grouped): \[\bar{x} = \frac{\sum_{i=1}^{k} f_i \cdot x_i}{\sum_{i=1}^{k} f_i} = \frac{\sum_{i=1}^{k} f_i \cdot x_i}{n}\]

Where: - \(x_i\) = individual data values or class midpoints - \(f_i\) = frequency of each value/class - \(n\) = total number of observations - \(k\) = number of distinct values/classes

🔍 Business Example: Car Price Analysis

# Example 1: Mean calculation for a subset of car prices
sample_prices <- c(18000, 22000, 25000, 28000, 35000)

# Manual calculation following the formula
sum_prices <- sum(sample_prices)
n_cars <- length(sample_prices)
mean_manual <- sum_prices / n_cars

cat("🚗 Sample Car Prices: $", paste(sample_prices, collapse = ", $"), "\n")

## 🚗 Sample Car Prices: $ 18000, $22000, $25000, $28000, $35000

cat("📊 Manual Calculation:\n")

## 📊 Manual Calculation:

cat("   Sum of prices: $", sum_prices, "\n")

##    Sum of prices: $ 128000

cat("   Number of cars: ", n_cars, "\n")

##    Number of cars:  5

cat("   Mean = ", sum_prices, " ÷ ", n_cars, " = $", round(mean_manual, 2), "\n")

##    Mean =  128000  ÷  5  = $ 25600

# Verification with R function
mean_r <- mean(sample_prices)
cat("✅ R function verification: $", round(mean_r, 2), "\n")

## ✅ R function verification: $ 25600

🌟 Advanced Example: Weighted Mean for Grouped Data

Let’s calculate the mean using frequency data from our car ownership example:

# Car ownership data from the lecture slides
cars_owned <- c(1, 2, 3, 5)
frequencies <- c(32, 48, 16, 4)
n_families <- sum(frequencies)

cat("🏠 Family Car Ownership Analysis:\n")

## 🏠 Family Car Ownership Analysis:

cat("Cars Owned: ", paste(cars_owned, collapse = ", "), "\n")

## Cars Owned:  1, 2, 3, 5

cat("Frequencies: ", paste(frequencies, collapse = ", "), "\n")

## Frequencies:  32, 48, 16, 4

# Manual weighted mean calculation
weighted_sum <- sum(cars_owned * frequencies)
weighted_mean <- weighted_sum / n_families

cat("\n📊 Weighted Mean Calculation:\n")

## 
## 📊 Weighted Mean Calculation:

cat("   Σ(xi × fi) = ", paste(cars_owned, "×", frequencies, collapse = " + "), "\n")

##    Σ(xi × fi) =  1 × 32 + 2 × 48 + 3 × 16 + 5 × 4

cat("             = ", paste(cars_owned * frequencies, collapse = " + "), "\n")

##              =  32 + 96 + 48 + 20

cat("             = ", weighted_sum, "\n")

##              =  196

cat("   Mean = ", weighted_sum, " ÷ ", n_families, " = ", round(weighted_mean, 3), " cars per family\n")

##    Mean =  196  ÷  100  =  1.96  cars per family

# Business interpretation
cat("\n💡 Business Insight: The average family owns ", round(weighted_mean, 2), " cars\n")

## 
## 💡 Business Insight: The average family owns  1.96  cars

🎯 Real Dataset Application

# Calculate mean for key variables in our cars dataset
price_mean <- mean(cars$price_num, na.rm = TRUE)
speed_mean <- mean(cars$maxspeed, na.rm = TRUE)
accel_mean <- mean(cars$acceleration, na.rm = TRUE)

cat("🚗 AUTOMOTIVE MARKET AVERAGES:\n")

## 🚗 AUTOMOTIVE MARKET AVERAGES:

cat("💰 Average Price: $", round(price_mean, 0), "\n")

## 💰 Average Price: $ 24267

cat("⚡ Average Max Speed: ", round(speed_mean, 1), " km/h\n")

## ⚡ Average Max Speed:  180.6  km/h

cat("🏁 Average Acceleration: ", round(accel_mean, 2), " seconds (0-100 km/h)\n")

## 🏁 Average Acceleration:  10.84  seconds (0-100 km/h)

# Strategic implications
cat("\n🎯 STRATEGIC IMPLICATIONS:\n")

## 
## 🎯 STRATEGIC IMPLICATIONS:

cat("   • New models should target price point around $", round(price_mean, 0), "\n")

##    • New models should target price point around $ 24267

cat("   • Performance benchmark: ", round(speed_mean, 0), " km/h max speed\n")

##    • Performance benchmark:  181  km/h max speed

cat("   • Acceleration target: Under ", round(accel_mean, 1), " seconds for competitiveness\n")

##    • Acceleration target: Under  10.8  seconds for competitiveness

3.2 The Median (Me) - The Positional Center

📐 Mathematical Foundation

The median represents the middle position when data is arranged in order. It divides the dataset into two equal halves.

For Odd n: \(Me = x_{(\frac{n+1}{2})}\)

For Even n: \(Me = \frac{x_{(\frac{n}{2})} + x_{(\frac{n}{2}+1)}}{2}\)

For Grouped Data: \(Me = L + \frac{\frac{n}{2} - CF_{before}}{f_{median}} \times h\)

Where: - \(L\) = lower boundary of median class - \(CF_{before}\) = cumulative frequency before median class - \(f_{median}\) = frequency of median class - \(h\) = class width

🔍 Step-by-Step Median Calculation

# Example with airline ticket prices
ticket_prices <- c(80, 120, 150, 90)

cat("✈️ Airline Ticket Prices to London:\n")

## ✈️ Airline Ticket Prices to London:

cat("Original data: $", paste(ticket_prices, collapse = ", $"), "\n")

## Original data: $ 80, $120, $150, $90

# Step 1: Sort the data
sorted_prices <- sort(ticket_prices)
cat("Sorted data: $", paste(sorted_prices, collapse = ", $"), "\n")

## Sorted data: $ 80, $90, $120, $150

# Step 2: Find the median position
n <- length(sorted_prices)
cat("n =", n, "(even number)\n")

## n = 4 (even number)

# Step 3: Calculate median
if (n %% 2 == 0) {
  # Even number of observations
  pos1 <- n / 2
  pos2 <- (n / 2) + 1
  median_manual <- (sorted_prices[pos1] + sorted_prices[pos2]) / 2
  
  cat("Median position: (", pos1, " + ", pos2, ") ÷ 2\n")
  cat("Median = ($", sorted_prices[pos1], " + $", sorted_prices[pos2], ") ÷ 2 = $", median_manual, "\n")
} else {
  # Odd number of observations
  pos <- (n + 1) / 2
  median_manual <- sorted_prices[pos]
  cat("Median position:", pos, "\n")
  cat("Median = $", median_manual, "\n")
}

## Median position: ( 2  +  3 ) ÷ 2
## Median = ($ 90  + $ 120 ) ÷ 2 = $ 105

# Verification
median_r <- median(ticket_prices)
cat("✅ R verification: $", median_r, "\n")

## ✅ R verification: $ 105

🏆 Advanced Example: Median for Car Speed Data

# Using the speed distribution from lecture slides
speed_intervals <- c("[0,30)", "[30,50)", "[50,100)")
frequencies_speed <- c(4, 12, 8)
n_total <- sum(frequencies_speed)

# Calculate cumulative frequencies
cumulative_freq <- cumsum(frequencies_speed)
cat("🏎️ Car Speed Distribution Analysis:\n")

## 🏎️ Car Speed Distribution Analysis:

cat("Intervals: ", paste(speed_intervals, collapse = ", "), "\n")

## Intervals:  [0,30), [30,50), [50,100)

cat("Frequencies: ", paste(frequencies_speed, collapse = ", "), "\n")

## Frequencies:  4, 12, 8

cat("Cumulative frequencies: ", paste(cumulative_freq, collapse = ", "), "\n")

## Cumulative frequencies:  4, 16, 24

# Find median class
median_position <- n_total / 2
cat("\nMedian position: n/2 =", n_total, "÷ 2 =", median_position, "\n")

## 
## Median position: n/2 = 24 ÷ 2 = 12

# Identify median class
median_class_index <- which(cumulative_freq >= median_position)[1]
cat("Median class:", speed_intervals[median_class_index], "\n")

## Median class: [30,50)

# For this example, median class is [30,50)
# Using the median formula for grouped data
L <- 30  # Lower boundary of median class
CF_before <- 4  # Cumulative frequency before median class
f_median <- 12  # Frequency of median class
h <- 20  # Class width

median_grouped <- L + ((median_position - CF_before) / f_median) * h

cat("\n📊 Median Calculation for Grouped Data:\n")

## 
## 📊 Median Calculation for Grouped Data:

cat("   L (lower boundary) =", L, "\n")

##    L (lower boundary) = 30

cat("   n/2 =", median_position, "\n")

##    n/2 = 12

cat("   CF_before =", CF_before, "\n")

##    CF_before = 4

cat("   f_median =", f_median, "\n")

##    f_median = 12

cat("   h (class width) =", h, "\n")

##    h (class width) = 20

cat("   Median = ", L, " + ((", median_position, " - ", CF_before, ") ÷ ", f_median, ") × ", h, "\n")

##    Median =  30  + (( 12  -  4 ) ÷  12 ) ×  20

cat("          = ", L, " + (", (median_position - CF_before), " ÷ ", f_median, ") × ", h, "\n")

##           =  30  + ( 8  ÷  12 ) ×  20

cat("          = ", L, " + ", round((median_position - CF_before) / f_median, 3), " × ", h, "\n")

##           =  30  +  0.667  ×  20

cat("          = ", round(median_grouped, 2), " km/h\n")

##           =  43.33  km/h

🔍 Real Dataset Median Analysis

# Calculate medians for our car dataset
price_median <- median(cars$price_num, na.rm = TRUE)
speed_median <- median(cars$maxspeed, na.rm = TRUE)
accel_median <- median(cars$acceleration, na.rm = TRUE)

cat("🚗 AUTOMOTIVE MARKET MEDIANS:\n")

## 🚗 AUTOMOTIVE MARKET MEDIANS:

cat("💰 Median Price: $", round(price_median, 0), "\n")

## 💰 Median Price: $ 24177

cat("⚡ Median Max Speed: ", round(speed_median, 1), " km/h\n")

## ⚡ Median Max Speed:  178  km/h

cat("🏁 Median Acceleration: ", round(accel_median, 2), " seconds\n")

## 🏁 Median Acceleration:  10.9  seconds

# Compare with means calculated earlier
cat("\n📊 MEAN vs MEDIAN COMPARISON:\n")

## 
## 📊 MEAN vs MEDIAN COMPARISON:

cat("Price: Mean $", round(price_mean, 0), " vs Median $", round(price_median, 0), "\n")

## Price: Mean $ 24267  vs Median $ 24177

cat("Speed: Mean ", round(speed_mean, 1), " vs Median ", round(speed_median, 1), " km/h\n")

## Speed: Mean  180.6  vs Median  178  km/h

cat("Acceleration: Mean ", round(accel_mean, 2), " vs Median ", round(accel_median, 2), " seconds\n")

## Acceleration: Mean  10.84  vs Median  10.9  seconds

# Skewness interpretation
if (price_mean > price_median) {
  cat("\n🔍 Price distribution: RIGHT-SKEWED (mean > median)\n")
  cat("   Interpretation: Some very expensive luxury cars pull the average up\n")
} else if (price_mean < price_median) {
  cat("\n🔍 Price distribution: LEFT-SKEWED (mean < median)\n")
} else {
  cat("\n🔍 Price distribution: SYMMETRIC (mean ≈ median)\n")
}

## 
## 🔍 Price distribution: RIGHT-SKEWED (mean > median)
##    Interpretation: Some very expensive luxury cars pull the average up

3.3 The Mode (Mo) - The Most Popular Value

📐 Mathematical Foundation

The mode is the value that appears most frequently in the dataset.

For Discrete Data: The value(s) with highest frequency

For Grouped Data (Modal Class Method): \[Mo = L + \frac{f_1}{f_1 + f_2} \times h\]

Where: - \(L\) = lower boundary of modal class - \(f_1\) = difference between modal class frequency and previous class frequency - \(f_2\) = difference between modal class frequency and next class frequency - \(h\) = class width

🎯 Mode Identification Examples

# Example 1: Simple mode identification
car_doors <- c(2, 3, 4, 5, 5, 5, 5, 4, 3, 2, 5, 5)

cat("🚪 Car Door Configuration Analysis:\n")

## 🚪 Car Door Configuration Analysis:

cat("Data: ", paste(car_doors, collapse = ", "), "\n")

## Data:  2, 3, 4, 5, 5, 5, 5, 4, 3, 2, 5, 5

# Create frequency table
door_freq <- table(car_doors)
print(door_freq)

## car_doors
## 2 3 4 5 
## 2 2 2 6

# Find mode
mode_value <- as.numeric(names(door_freq)[which.max(door_freq)])
mode_frequency <- max(door_freq)

cat("Mode: ", mode_value, " doors (appears ", mode_frequency, " times)\n")

## Mode:  5  doors (appears  6  times)

cat("💡 Business insight: ", mode_value, "-door configuration is most popular\n")

## 💡 Business insight:  5 -door configuration is most popular

# Example 2: Mode in our real dataset
country_freq <- table(cars$country)
modal_country <- names(country_freq)[which.max(country_freq)]
modal_freq <- max(country_freq)

cat("\n🌍 Manufacturing Country Mode:\n")

## 
## 🌍 Manufacturing Country Mode:

cat("Modal country: ", modal_country, " (", modal_freq, " models)\n")

## Modal country:  Germany  ( 40  models)

cat("💡 Strategic insight: ", modal_country, " dominates car production\n")

## 💡 Strategic insight:  Germany  dominates car production

🏆 Advanced Mode Calculation for Grouped Data

# Using our speed data example
cat("🏎️ Modal Speed Class Calculation:\n")

## 🏎️ Modal Speed Class Calculation:

cat("Speed intervals and frequencies:\n")

## Speed intervals and frequencies:

speed_data <- data.frame(
  Interval = c("[0,30)", "[30,50)", "[50,100)"),
  Frequency = c(4, 12, 8),
  Lower_Bound = c(0, 30, 50),
  Upper_Bound = c(30, 50, 100),
  Class_Width = c(30, 20, 50)
)
print(speed_data)

##   Interval Frequency Lower_Bound Upper_Bound Class_Width
## 1   [0,30)         4           0          30          30
## 2  [30,50)        12          30          50          20
## 3 [50,100)         8          50         100          50

# Identify modal class
modal_class_index <- which.max(speed_data$Frequency)
modal_class <- speed_data$Interval[modal_class_index]
modal_freq <- speed_data$Frequency[modal_class_index]

cat("\nModal class: ", modal_class, " with frequency ", modal_freq, "\n")

## 
## Modal class:  [30,50)  with frequency  12

# Calculate mode using the formula
L <- speed_data$Lower_Bound[modal_class_index]
f1 <- modal_freq - speed_data$Frequency[modal_class_index - 1]
f2 <- modal_freq - speed_data$Frequency[modal_class_index + 1]
h <- speed_data$Class_Width[modal_class_index]

mode_grouped <- L + (f1 / (f1 + f2)) * h

cat("\n📊 Mode Calculation:\n")

## 
## 📊 Mode Calculation:

cat("   L (lower boundary) =", L, "\n")

##    L (lower boundary) = 30

cat("   f1 (", modal_freq, " - ", speed_data$Frequency[modal_class_index - 1], ") =", f1, "\n")

##    f1 ( 12  -  4 ) = 8

cat("   f2 (", modal_freq, " - ", speed_data$Frequency[modal_class_index + 1], ") =", f2, "\n")

##    f2 ( 12  -  8 ) = 4

cat("   h (class width) =", h, "\n")

##    h (class width) = 20

cat("   Mode = ", L, " + (", f1, " ÷ (", f1, " + ", f2, ")) × ", h, "\n")

##    Mode =  30  + ( 8  ÷ ( 8  +  4 )) ×  20

cat("        = ", L, " + ", round(f1/(f1+f2), 3), " × ", h, "\n")

##         =  30  +  0.667  ×  20

cat("        = ", round(mode_grouped, 2), " km/h\n")

##         =  43.33  km/h

Chapter 4: When to Use Each Measure - The Decision Framework

4.1 The Statistical Decision Tree

Understanding when to use mean, median, or mode is crucial for accurate analysis:

cat("📋 CENTRAL TENDENCY DECISION FRAMEWORK:\n")

## 📋 CENTRAL TENDENCY DECISION FRAMEWORK:

cat("=" , rep("=", 50), "\n", sep="")

## ===================================================

cat("\n🎯 USE MEAN when:\n")

## 
## 🎯 USE MEAN when:

cat("   ✓ Data is approximately symmetric\n")

##    ✓ Data is approximately symmetric

cat("   ✓ No extreme outliers present\n")

##    ✓ No extreme outliers present

cat("   ✓ Working with interval/ratio data\n")

##    ✓ Working with interval/ratio data

cat("   ✓ Need mathematical precision\n")

##    ✓ Need mathematical precision

cat("   ✓ Planning to use in further calculations\n")

##    ✓ Planning to use in further calculations

cat("\n🎯 USE MEDIAN when:\n")

## 
## 🎯 USE MEDIAN when:

cat("   ✓ Data is skewed (left or right)\n")

##    ✓ Data is skewed (left or right)

cat("   ✓ Outliers are present\n")

##    ✓ Outliers are present

cat("   ✓ Working with ordinal data\n")

##    ✓ Working with ordinal data

cat("   ✓ Need robust measure (resistant to extremes)\n")

##    ✓ Need robust measure (resistant to extremes)

cat("   ✓ Income, house prices, or similar economic data\n")

##    ✓ Income, house prices, or similar economic data

cat("\n🎯 USE MODE when:\n")

## 
## 🎯 USE MODE when:

cat("   ✓ Working with nominal (categorical) data\n")

##    ✓ Working with nominal (categorical) data

cat("   ✓ Need the most frequent category\n")

##    ✓ Need the most frequent category

cat("   ✓ Business decisions based on popularity\n")

##    ✓ Business decisions based on popularity

cat("   ✓ Quality control (most common defect)\n")

##    ✓ Quality control (most common defect)

cat("   ✓ Market research (most preferred option)\n")

##    ✓ Market research (most preferred option)

🔍 Practical Application Examples

# Example 1: Symmetric data - use mean
fuel_efficiency <- c(7.2, 7.8, 8.1, 8.3, 8.5, 8.7, 9.1, 9.3)
cat("⛽ Fuel Efficiency Data (L/100km): ", paste(fuel_efficiency, collapse = ", "), "\n")

## ⛽ Fuel Efficiency Data (L/100km):  7.2, 7.8, 8.1, 8.3, 8.5, 8.7, 9.1, 9.3

cat("   Distribution: Approximately symmetric\n")

##    Distribution: Approximately symmetric

cat("   Best measure: MEAN = ", round(mean(fuel_efficiency), 2), " L/100km\n")

##    Best measure: MEAN =  8.38  L/100km

# Example 2: Skewed data - use median
executive_salaries <- c(45000, 48000, 52000, 55000, 58000, 62000, 350000)
cat("\n💼 Executive Salaries: $", paste(executive_salaries, collapse = ", $"), "\n")

## 
## 💼 Executive Salaries: $ 45000, $48000, $52000, $55000, $58000, $62000, $350000

cat("   Distribution: Right-skewed (one very high salary)\n")

##    Distribution: Right-skewed (one very high salary)

cat("   Mean: $", round(mean(executive_salaries), 0), " (pulled up by outlier)\n")

##    Mean: $ 95714  (pulled up by outlier)

cat("   Median: $", round(median(executive_salaries), 0), " (more representative)\n")

##    Median: $ 55000  (more representative)

cat("   Best measure: MEDIAN\n")

##    Best measure: MEDIAN

# Example 3: Categorical data - use mode
preferred_colors <- c("Blue", "Red", "Blue", "Green", "Blue", "Red", "Blue", "White")
color_freq <- table(preferred_colors)
modal_color <- names(color_freq)[which.max(color_freq)]
cat("\n🎨 Preferred Car Colors: ", paste(preferred_colors, collapse = ", "), "\n")

## 
## 🎨 Preferred Car Colors:  Blue, Red, Blue, Green, Blue, Red, Blue, White

cat("   Best measure: MODE = ", modal_color, " (most frequent choice)\n")

##    Best measure: MODE =  Blue  (most frequent choice)

Chapter 5: Comprehensive Analysis with R - UBStats Functions

5.1 Professional Statistical Summaries

cat("🔧 PROFESSIONAL STATISTICAL ANALYSIS USING UBStats:\n")

## 🔧 PROFESSIONAL STATISTICAL ANALYSIS USING UBStats:

cat("=" , rep("=", 60), "\n", sep="")

## =============================================================

# Central tendency analysis for price
cat("\n💰 PRICE ANALYSIS:\n")

## 
## 💰 PRICE ANALYSIS:

price_central <- distr.summary.x(cars$price_num, stats="central")

##    n n.a  mode n.modes  mode% median     mean
##  190   0 26809       1 0.0105  24177 24267.47

print(price_central)

## $`Central tendency measures`
##     n n.a  mode n.modes      mode% median     mean
## 1 190   0 26809       1 0.01052632  24177 24267.47

# Central tendency analysis for performance
cat("\n🏁 ACCELERATION ANALYSIS:\n")

## 
## 🏁 ACCELERATION ANALYSIS:

accel_central <- distr.summary.x(cars$acceleration, stats="central")

##    n n.a mode n.modes  mode% median  mean
##  190   0 12.3       2 0.0263   10.9 10.84

print(accel_central)

## $`Central tendency measures`
##     n n.a mode n.modes      mode% median     mean
## 1 190   0 12.3       2 0.02631579   10.9 10.83684

# Central tendency analysis for speed
cat("\n⚡ MAX SPEED ANALYSIS:\n")

## 
## ⚡ MAX SPEED ANALYSIS:

speed_central <- distr.summary.x(cars$maxspeed, stats="central")

##    n n.a mode n.modes  mode% median   mean
##  190   0  172       1 0.0368    178 180.63

print(speed_central)

## $`Central tendency measures`
##     n n.a mode n.modes      mode% median     mean
## 1 190   0  172       1 0.03684211    178 180.6316

🎯 Business Intelligence Dashboard

cat("\n📊 AUTOMOTIVE MARKET INTELLIGENCE DASHBOARD:\n")

## 
## 📊 AUTOMOTIVE MARKET INTELLIGENCE DASHBOARD:

cat("=" , rep("=", 55), "\n", sep="")

## ========================================================

# Price intelligence
cat("\n💰 PRICE INTELLIGENCE:\n")

## 
## 💰 PRICE INTELLIGENCE:

cat("   Mean Price: $", round(mean(cars$price_num, na.rm = TRUE), 0), "\n")

##    Mean Price: $ 24267

cat("   Median Price: $", round(median(cars$price_num, na.rm = TRUE), 0), "\n")

##    Median Price: $ 24177

cat("   Price Range: $", round(min(cars$price_num, na.rm = TRUE), 0), 
    " - $", round(max(cars$price_num, na.rm = TRUE), 0), "\n")

##    Price Range: $ -1860  - $ 61570

# Performance benchmarks
cat("\n🏎️ PERFORMANCE BENCHMARKS:\n")

## 
## 🏎️ PERFORMANCE BENCHMARKS:

cat("   Average Top Speed: ", round(mean(cars$maxspeed, na.rm = TRUE), 1), " km/h\n")

##    Average Top Speed:  180.6  km/h

cat("   Median Acceleration: ", round(median(cars$acceleration, na.rm = TRUE), 2), " seconds\n")

##    Median Acceleration:  10.9  seconds

# Market segmentation insights
cat("\n🎯 MARKET SEGMENTATION INSIGHTS:\n")

## 
## 🎯 MARKET SEGMENTATION INSIGHTS:

price_q1 <- quantile(cars$price_num, 0.25, na.rm = TRUE)
price_q3 <- quantile(cars$price_num, 0.75, na.rm = TRUE)

cat("   Budget Segment (bottom 25%): Under $", round(price_q1, 0), "\n")

##    Budget Segment (bottom 25%): Under $ 13574

cat("   Mid-Market (25%-75%): $", round(price_q1, 0), " - $", round(price_q3, 0), "\n")

##    Mid-Market (25%-75%): $ 13574  - $ 33195

cat("   Premium Segment (top 25%): Above $", round(price_q3, 0), "\n")

##    Premium Segment (top 25%): Above $ 33195

# Country analysis
country_mode_freq <- table(cars$country)
top_country <- names(country_mode_freq)[which.max(country_mode_freq)]
cat("\n🌍 GEOGRAPHIC INTELLIGENCE:\n")

## 
## 🌍 GEOGRAPHIC INTELLIGENCE:

cat("   Top Manufacturing Country: ", top_country, "\n")

##    Top Manufacturing Country:  Germany

cat("   Market Share: ", round(max(country_mode_freq)/nrow(cars)*100, 1), "%\n")

##    Market Share:  21.1 %

Chapter 6: Advanced Topics - Distribution Shape and Skewness

6.1 Mathematical Relationship Between Mean, Median, and Mode

The relationship between these three measures reveals crucial information about data distribution shape:

cat("📈 DISTRIBUTION SHAPE ANALYSIS:\n")

## 📈 DISTRIBUTION SHAPE ANALYSIS:

cat("=" , rep("=", 45), "\n", sep="")

## ==============================================

# Calculate measures for different variables
variables <- c("price_num", "maxspeed", "acceleration", "weight")
variable_names <- c("Price", "Max Speed", "Acceleration", "Weight")

for (i in 1:length(variables)) {
  var_data <- cars[[variables[i]]]
  var_mean <- mean(var_data, na.rm = TRUE)
  var_median <- median(var_data, na.rm = TRUE)
  
  cat("\n", variable_names[i], ":\n")
  cat("   Mean: ", round(var_mean, 2), "\n")
  cat("   Median: ", round(var_median, 2), "\n")
  cat("   Difference (Mean - Median): ", round(var_mean - var_median, 2), "\n")
  
  if (abs(var_mean - var_median) < 0.01 * var_median) {
    cat("   Shape: SYMMETRIC (mean ≈ median)\n")
  } else if (var_mean > var_median) {
    cat("   Shape: RIGHT-SKEWED (mean > median)\n")
    cat("   Interpretation: Tail extends toward higher values\n")
  } else {
    cat("   Shape: LEFT-SKEWED (mean < median)\n")
    cat("   Interpretation: Tail extends toward lower values\n")
  }
}

## 
##  Price :
##    Mean:  24267.47 
##    Median:  24177 
##    Difference (Mean - Median):  90.47 
##    Shape: SYMMETRIC (mean ≈ median)
## 
##  Max Speed :
##    Mean:  180.63 
##    Median:  178 
##    Difference (Mean - Median):  2.63 
##    Shape: RIGHT-SKEWED (mean > median)
##    Interpretation: Tail extends toward higher values
## 
##  Acceleration :
##    Mean:  10.84 
##    Median:  10.9 
##    Difference (Mean - Median):  -0.06 
##    Shape: SYMMETRIC (mean ≈ median)
## 
##  Weight :
##    Mean:  1373.85 
##    Median:  1369.5 
##    Difference (Mean - Median):  4.35 
##    Shape: SYMMETRIC (mean ≈ median)

6.2 Coefficient of Skewness

The coefficient of skewness provides a numerical measure of distribution asymmetry:

\[SK = \frac{3(\bar{x} - Me)}{s}\]

Where \(s\) is the standard deviation.

cat("\n📐 COEFFICIENT OF SKEWNESS ANALYSIS:\n")

## 
## 📐 COEFFICIENT OF SKEWNESS ANALYSIS:

cat("=" , rep("=", 45), "\n", sep="")

## ==============================================

# Calculate skewness coefficient for price
price_mean <- mean(cars$price_num, na.rm = TRUE)
price_median <- median(cars$price_num, na.rm = TRUE)
price_sd <- sd(cars$price_num, na.rm = TRUE)

skewness_coeff <- 3 * (price_mean - price_median) / price_sd

cat("Price Distribution Skewness:\n")

## Price Distribution Skewness:

cat("   Mean: $", round(price_mean, 0), "\n")

##    Mean: $ 24267

cat("   Median: $", round(price_median, 0), "\n")

##    Median: $ 24177

cat("   Standard Deviation: $", round(price_sd, 0), "\n")

##    Standard Deviation: $ 12473

cat("   Skewness Coefficient: ", round(skewness_coeff, 3), "\n")

##    Skewness Coefficient:  0.022

if (abs(skewness_coeff) < 0.5) {
  cat("   Interpretation: APPROXIMATELY SYMMETRIC\n")
} else if (skewness_coeff > 0.5) {
  cat("   Interpretation: MODERATELY RIGHT-SKEWED\n")
} else {
  cat("   Interpretation: MODERATELY LEFT-SKEWED\n")
}

##    Interpretation: APPROXIMATELY SYMMETRIC

cat("\n📚 Skewness Coefficient Scale:\n")

## 
## 📚 Skewness Coefficient Scale:

cat("   |SK| < 0.5: Approximately symmetric\n")

##    |SK| < 0.5: Approximately symmetric

cat("   0.5 ≤ |SK| < 1: Moderately skewed\n")

##    0.5 ≤ |SK| < 1: Moderately skewed

cat("   |SK| ≥ 1: Highly skewed\n")

##    |SK| ≥ 1: Highly skewed

Chapter 7: Hands-On Problem Solving Workshop

7.1 Complete Problem Solution: Car Manufacturer Strategic Analysis

# STRATEGIC ANALYSIS: NEW ELECTRIC CAR DEVELOPMENT
cat("🚗 STRATEGIC ANALYSIS: NEW ELECTRIC CAR DEVELOPMENT\n")

## 🚗 STRATEGIC ANALYSIS: NEW ELECTRIC CAR DEVELOPMENT

cat("=============================================================\n")

## =============================================================

cat("SCENARIO: Your company is developing a new electric car. Use statistical analysis\n")

## SCENARIO: Your company is developing a new electric car. Use statistical analysis

cat("to determine optimal specifications that will be competitive in the market.\n\n")

## to determine optimal specifications that will be competitive in the market.

# Check if required dataset and columns exist
if (!exists("cars") || !all(c("price_num", "maxspeed", "acceleration", "country") %in% colnames(cars))) {
  stop("Error: 'cars' dataset is missing or does not contain required columns (price_num, maxspeed, acceleration, country).")
}

# Problem 1: Optimal Price Positioning
cat("📊 PROBLEM 1: OPTIMAL PRICE POSITIONING\n")

## 📊 PROBLEM 1: OPTIMAL PRICE POSITIONING

cat("----------------------------------------------\n")

## ----------------------------------------------

# Calculate central tendency and quartiles
price_stats <- summary(cars$price_num)
price_quartiles <- quantile(cars$price_num, probs = c(0.25, 0.5, 0.75), na.rm = TRUE)

cat("Central Tendency Analysis:\n")

## Central Tendency Analysis:

print(price_stats)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -1860   13574   24177   24267   33195   61570

cat("\nQuartile Analysis:\n")

## 
## Quartile Analysis:

cat("  25th Percentile: $", round(price_quartiles[1], 0), "\n")

##   25th Percentile: $ 13574

cat("  Median: $", round(price_quartiles[2], 0), "\n")

##   Median: $ 24177

cat("  75th Percentile: $", round(price_quartiles[3], 0), "\n")

##   75th Percentile: $ 33195

# Strategic recommendation
price_25 <- price_quartiles[1]
price_75 <- price_quartiles[3]
target_price <- price_quartiles[2]

cat("\n🎯 STRATEGIC RECOMMENDATION:\n")

## 
## 🎯 STRATEGIC RECOMMENDATION:

cat("   Target Price Range: $", round(price_25, 0), " - $", round(price_75, 0), "\n")

##    Target Price Range: $ 13574  - $ 33195

cat("   Optimal Price Point: $", round(target_price, 0), " (median)\n")

##    Optimal Price Point: $ 24177  (median)

cat("   Rationale: Median-based pricing captures the mainstream market.\n")

##    Rationale: Median-based pricing captures the mainstream market.

# Problem 2: Performance Benchmarking
cat("\n\n🏁 PROBLEM 2: PERFORMANCE BENCHMARKING\n")

## 
## 
## 🏁 PROBLEM 2: PERFORMANCE BENCHMARKING

cat("----------------------------------------------\n")

## ----------------------------------------------

# Top 10% performance thresholds
speed_p90 <- quantile(cars$maxspeed, 0.9, na.rm = TRUE)
accel_p10 <- quantile(cars$acceleration, 0.1, na.rm = TRUE)  # Lower is better for acceleration

cat("Performance Targets for Top 10% Market:\n")

## Performance Targets for Top 10% Market:

cat("   Minimum Speed: ", round(speed_p90, 0), " km/h\n")

##    Minimum Speed:  218  km/h

cat("   Maximum Acceleration: ", round(accel_p10, 2), " seconds (0-100 km/h)\n")

##    Maximum Acceleration:  7.28  seconds (0-100 km/h)

# Problem 3: Geographic Market Analysis
cat("\n\n🌍 PROBLEM 3: GEOGRAPHIC MARKET ANALYSIS\n")

## 
## 
## 🌍 PROBLEM 3: GEOGRAPHIC MARKET ANALYSIS

cat("----------------------------------------------\n")

## ----------------------------------------------

# Country frequency table
country_analysis <- data.frame(
  Counts = table(cars$country),
  Proportion = prop.table(table(cars$country)),
  Percentage = prop.table(table(cars$country)) * 100
)

cat("Country Distribution:\n")

## Country Distribution:

print(country_analysis)

##       Counts.Var1 Counts.Freq Proportion.Var1 Proportion.Freq Percentage.Var1
## 1   Asia - others          14   Asia - others      0.07368421   Asia - others
## 2 Europe - others          38 Europe - others      0.20000000 Europe - others
## 3          France          33          France      0.17368421          France
## 4         Germany          40         Germany      0.21052632         Germany
## 5           Italy          19           Italy      0.10000000           Italy
## 6           Japan          33           Japan      0.17368421           Japan
## 7   United States          13   United States      0.06842105   United States
##   Percentage.Freq
## 1        7.368421
## 2       20.000000
## 3       17.368421
## 4       21.052632
## 5       10.000000
## 6       17.368421
## 7        6.842105

# Find modal country and European share
modal_country <- names(table(cars$country))[which.max(table(cars$country))]
european_countries <- c("Germany", "France", "Italy", "Europe - others")
european_share <- sum(cars$country %in% european_countries, na.rm = TRUE) / nrow(cars) * 100

cat("\n📈 Geographic Insights:\n")

## 
## 📈 Geographic Insights:

cat("   Modal Manufacturing Country: ", modal_country, "\n")

##    Modal Manufacturing Country:  Germany

cat("   European Market Share: ", round(european_share, 1), "%\n")

##    European Market Share:  68.4 %

cat("   Strategic Implication: Focus on European manufacturing partnerships.\n")

##    Strategic Implication: Focus on European manufacturing partnerships.

7.2 Interactive Exercise: Student Practice Problems

cat("\n🎓 STUDENT PRACTICE EXERCISES\n")

## 
## 🎓 STUDENT PRACTICE EXERCISES

cat("=" , rep("=", 40), "\n", sep="")

## =========================================

cat("Exercise 1: Manual Calculation Challenge\n")

## Exercise 1: Manual Calculation Challenge

cat("Given airline ticket prices: $80, $120, $150, $90\n")

## Given airline ticket prices: $80, $120, $150, $90

cat("Tasks:\n")

## Tasks:

cat("a) Calculate mean, median, and mode manually\n")

## a) Calculate mean, median, and mode manually

cat("b) Arrange data in ascending order\n")

## b) Arrange data in ascending order

cat("c) Determine distribution shape\n")

## c) Determine distribution shape

# Solution for verification
tickets <- c(80, 120, 150, 90)
cat("\n✅ SOLUTION:\n")

## 
## ✅ SOLUTION:

cat("Sorted data: $", paste(sort(tickets), collapse = ", $"), "\n")

## Sorted data: $ 80, $90, $120, $150

cat("Mean: $", round(mean(tickets), 2), "\n")

## Mean: $ 110

cat("Median: $", median(tickets), "\n")

## Median: $ 105

cat("Mode: No mode (all values appear once)\n")

## Mode: No mode (all values appear once)

cat("Shape: Mean (", round(mean(tickets), 2), ") > Median (", median(tickets), ") → RIGHT-SKEWED\n")

## Shape: Mean ( 110 ) > Median ( 105 ) → RIGHT-SKEWED

cat("\n" , rep("-", 50), "\n", sep="")

## 
## --------------------------------------------------

cat("Exercise 2: Grouped Data Challenge\n")

## Exercise 2: Grouped Data Challenge

cat("Car ownership frequency table:\n")

## Car ownership frequency table:

cat("Cars Owned: 1, 2, 3, 5\n")

## Cars Owned: 1, 2, 3, 5

cat("Frequencies: 32, 48, 16, 4\n")

## Frequencies: 32, 48, 16, 4

cat("Task: Calculate weighted mean\n")

## Task: Calculate weighted mean

# Solution
cars_owned_ex <- c(1, 2, 3, 5)
freq_ex <- c(32, 48, 16, 4)
weighted_mean_ex <- sum(cars_owned_ex * freq_ex) / sum(freq_ex)

cat("\n✅ SOLUTION:\n")

## 
## ✅ SOLUTION:

cat("Weighted Mean = Σ(xi × fi) / Σfi\n")

## Weighted Mean = Σ(xi × fi) / Σfi

cat("             = (1×32 + 2×48 + 3×16 + 5×4) / (32+48+16+4)\n")

##              = (1×32 + 2×48 + 3×16 + 5×4) / (32+48+16+4)

cat("             = (32 + 96 + 48 + 20) / 100\n")

##              = (32 + 96 + 48 + 20) / 100

cat("             = 196 / 100 = ", weighted_mean_ex, " cars per family\n")

##              = 196 / 100 =  1.96  cars per family

Chapter 8: Professional Reporting and Communication

8.1 Executive Summary Template

cat("\n📊 EXECUTIVE SUMMARY: AUTOMOTIVE MARKET ANALYSIS\n")

## 
## 📊 EXECUTIVE SUMMARY: AUTOMOTIVE MARKET ANALYSIS

cat("=" , rep("=", 60), "\n", sep="")

## =============================================================

# Calculate all key statistics
price_summary <- list(
  mean = mean(cars$price_num, na.rm = TRUE),
  median = median(cars$price_num, na.rm = TRUE),
  q1 = quantile(cars$price_num, 0.25, na.rm = TRUE),
  q3 = quantile(cars$price_num, 0.75, na.rm = TRUE)
)

performance_summary <- list(
  speed_mean = mean(cars$maxspeed, na.rm = TRUE),
  speed_median = median(cars$maxspeed, na.rm = TRUE),
  accel_mean = mean(cars$acceleration, na.rm = TRUE),
  accel_median = median(cars$acceleration, na.rm = TRUE)
)

cat("\n🎯 KEY FINDINGS:\n")

## 
## 🎯 KEY FINDINGS:

cat("\n1. PRICE POSITIONING:\n")

## 
## 1. PRICE POSITIONING:

cat("   • Average market price: $", format(round(price_summary$mean, 0), big.mark = ","), "\n")

##    • Average market price: $ 24,267

cat("   • Median market price: $", format(round(price_summary$median, 0), big.mark = ","), "\n")

##    • Median market price: $ 24,177

cat("   • Price distribution: RIGHT-SKEWED (luxury segment drives average up)\n")

##    • Price distribution: RIGHT-SKEWED (luxury segment drives average up)

cat("   • Recommended target: $", format(round(price_summary$median, 0), big.mark = ","), " (median-based pricing)\n")

##    • Recommended target: $ 24,177  (median-based pricing)

cat("\n2. PERFORMANCE BENCHMARKS:\n")

## 
## 2. PERFORMANCE BENCHMARKS:

cat("   • Average top speed: ", round(performance_summary$speed_mean, 0), " km/h\n")

##    • Average top speed:  181  km/h

cat("   • Median acceleration: ", round(performance_summary$accel_median, 2), " seconds (0-100 km/h)\n")

##    • Median acceleration:  10.9  seconds (0-100 km/h)

cat("   • Competitive threshold: ", round(performance_summary$speed_median, 0), " km/h minimum\n")

##    • Competitive threshold:  178  km/h minimum

cat("\n3. MARKET SEGMENTATION:\n")

## 
## 3. MARKET SEGMENTATION:

cat("   • Budget segment (Q1): Under $", format(round(price_summary$q1, 0), big.mark = ","), "\n")

##    • Budget segment (Q1): Under $ 13,574

cat("   • Premium segment (Q3): Above $", format(round(price_summary$q3, 0), big.mark = ","), "\n")

##    • Premium segment (Q3): Above $ 33,195

cat("   • Target segment: Mid-market ($", format(round(price_summary$q1, 0), big.mark = ","), 
    " - $", format(round(price_summary$q3, 0), big.mark = ","), ")\n")

##    • Target segment: Mid-market ($ 13,574  - $ 33,195 )

# Geographic analysis
top_countries <- names(sort(table(cars$country), decreasing = TRUE))[1:3]
cat("\n4. GEOGRAPHIC OPPORTUNITIES:\n")

## 
## 4. GEOGRAPHIC OPPORTUNITIES:

cat("   • Leading manufacturers: ", paste(top_countries, collapse = ", "), "\n")

##    • Leading manufacturers:  Germany, Europe - others, France

cat("   • European dominance: ", round(european_share, 1), "% market share\n")

##    • European dominance:  68.4 % market share

cat("   • Strategic focus: European partnerships and manufacturing\n")

##    • Strategic focus: European partnerships and manufacturing

8.2 Technical Methodology Report

cat("\n\n📋 TECHNICAL METHODOLOGY REPORT\n")

## 
## 
## 📋 TECHNICAL METHODOLOGY REPORT

cat("=" , rep("=", 50), "\n", sep="")

## ===================================================

cat("\n🔬 STATISTICAL METHODS EMPLOYED:\n")

## 
## 🔬 STATISTICAL METHODS EMPLOYED:

cat("\n1. MEASURES OF CENTRAL TENDENCY:\n")

## 
## 1. MEASURES OF CENTRAL TENDENCY:

cat("   • Arithmetic Mean: Σxi/n for symmetric distributions\n")

##    • Arithmetic Mean: Σxi/n for symmetric distributions

cat("   • Median: Middle value for skewed distributions\n")

##    • Median: Middle value for skewed distributions

cat("   • Mode: Most frequent value for categorical analysis\n")

##    • Mode: Most frequent value for categorical analysis

cat("\n2. DATA QUALITY ASSESSMENT:\n")

## 
## 2. DATA QUALITY ASSESSMENT:

cat("   • Sample size: ", nrow(cars), " car models\n")

##    • Sample size:  190  car models

cat("   • Geographic coverage: ", length(unique(cars$country)), " countries/regions\n")

##    • Geographic coverage:  7  countries/regions

cat("   • Missing values: Handled using na.rm = TRUE\n")

##    • Missing values: Handled using na.rm = TRUE

cat("   • Outlier detection: Visual inspection via boxplots\n")

##    • Outlier detection: Visual inspection via boxplots

cat("\n3. DISTRIBUTION ANALYSIS:\n")

## 
## 3. DISTRIBUTION ANALYSIS:

cat("   • Skewness assessment: Mean vs. Median comparison\n")

##    • Skewness assessment: Mean vs. Median comparison

cat("   • Shape determination: Visual and numerical methods\n")

##    • Shape determination: Visual and numerical methods

cat("   • Quartile analysis: Market segmentation insights\n")

##    • Quartile analysis: Market segmentation insights

cat("\n4. BUSINESS APPLICATIONS:\n")

## 
## 4. BUSINESS APPLICATIONS:

cat("   • Price strategy: Median-based positioning\n")

##    • Price strategy: Median-based positioning

cat("   • Performance targets: Percentile benchmarking\n")

##    • Performance targets: Percentile benchmarking

cat("   • Market analysis: Frequency-based insights\n")

##    • Market analysis: Frequency-based insights

Chapter 9: Advanced Applications and Extensions

9.1 Comparative Analysis Framework

cat("\n🔍 COMPARATIVE ANALYSIS: GERMAN vs JAPANESE MANUFACTURERS\n")

## 
## 🔍 COMPARATIVE ANALYSIS: GERMAN vs JAPANESE MANUFACTURERS

cat("=" , rep("=", 65), "\n", sep="")

## ==================================================================

# Filter data by country
german_cars <- cars[cars$country == "Germany", ]
japanese_cars <- cars[cars$country == "Japan", ]

cat("Sample sizes:\n")

## Sample sizes:

cat("   German manufacturers: ", nrow(german_cars), " models\n")

##    German manufacturers:  40  models

cat("   Japanese manufacturers: ", nrow(japanese_cars), " models\n")

##    Japanese manufacturers:  33  models

# Price comparison
cat("\n💰 PRICE COMPARISON:\n")

## 
## 💰 PRICE COMPARISON:

german_price_stats <- c(
  mean = mean(german_cars$price_num, na.rm = TRUE),
  median = median(german_cars$price_num, na.rm = TRUE)
)
japanese_price_stats <- c(
  mean = mean(japanese_cars$price_num, na.rm = TRUE),
  median = median(japanese_cars$price_num, na.rm = TRUE)
)

cat("German cars:\n")

## German cars:

cat("   Mean: $", round(german_price_stats["mean"], 0), "\n")

##    Mean: $ 24729

cat("   Median: $", round(german_price_stats["median"], 0), "\n")

##    Median: $ 26708

cat("Japanese cars:\n")

## Japanese cars:

cat("   Mean: $", round(japanese_price_stats["mean"], 0), "\n")

##    Mean: $ 23138

cat("   Median: $", round(japanese_price_stats["median"], 0), "\n")

##    Median: $ 21206

# Performance comparison
cat("\n🏁 PERFORMANCE COMPARISON:\n")

## 
## 🏁 PERFORMANCE COMPARISON:

german_speed <- mean(german_cars$maxspeed, na.rm = TRUE)
japanese_speed <- mean(japanese_cars$maxspeed, na.rm = TRUE)
german_accel <- mean(german_cars$acceleration, na.rm = TRUE)
japanese_accel <- mean(japanese_cars$acceleration, na.rm = TRUE)

cat("Average Top Speed:\n")

## Average Top Speed:

cat("   German: ", round(german_speed, 1), " km/h\n")

##    German:  185.4  km/h

cat("   Japanese: ", round(japanese_speed, 1), " km/h\n")

##    Japanese:  184.3  km/h

cat("Average Acceleration:\n")

## Average Acceleration:

cat("   German: ", round(german_accel, 2), " seconds\n")

##    German:  10.57  seconds

cat("   Japanese: ", round(japanese_accel, 2), " seconds\n")

##    Japanese:  10.22  seconds

# Strategic insights
cat("\n🎯 STRATEGIC INSIGHTS:\n")

## 
## 🎯 STRATEGIC INSIGHTS:

if (german_price_stats["median"] > japanese_price_stats["median"]) {
  cat("   • German cars positioned as premium (higher median price)\n")
} else {
  cat("   • Japanese cars positioned as premium (higher median price)\n")
}

##    • German cars positioned as premium (higher median price)

if (german_speed > japanese_speed) {
  cat("   • German manufacturers focus on performance (higher average speed)\n")
} else {
  cat("   • Japanese manufacturers focus on performance (higher average speed)\n")
}

##    • German manufacturers focus on performance (higher average speed)

9.2 Market Segmentation Analysis

cat("\n\n📊 ADVANCED MARKET SEGMENTATION ANALYSIS\n")

## 
## 
## 📊 ADVANCED MARKET SEGMENTATION ANALYSIS

cat("=" , rep("=", 55), "\n", sep="")

## ========================================================

# Create price segments based on quartiles
price_q1 <- quantile(cars$price_num, 0.25, na.rm = TRUE)
price_q2 <- quantile(cars$price_num, 0.50, na.rm = TRUE)
price_q3 <- quantile(cars$price_num, 0.75, na.rm = TRUE)

# Segment the market
cars$price_segment <- cut(cars$price_num, 
                         breaks = c(0, price_q1, price_q2, price_q3, Inf),
                         labels = c("Budget", "Economy", "Mid-Market", "Premium"),
                         include.lowest = TRUE)

# Analyze each segment
segment_analysis <- table(cars$price_segment)
segment_props <- prop.table(segment_analysis) * 100

cat("Market Segmentation by Price Quartiles:\n")

## Market Segmentation by Price Quartiles:

for(i in 1:length(segment_analysis)) {
  segment_name <- names(segment_analysis)[i]
  count <- segment_analysis[i]
  percentage <- segment_props[i]
  
  cat("   ", segment_name, ": ", count, " models (", round(percentage, 1), "%)\n")
}

##     Budget :  46  models ( 24.5 %)
##     Economy :  47  models ( 25 %)
##     Mid-Market :  47  models ( 25 %)
##     Premium :  48  models ( 25.5 %)

# Performance characteristics by segment
cat("\n🏎️ PERFORMANCE BY SEGMENT:\n")

## 
## 🏎️ PERFORMANCE BY SEGMENT:

for(segment in names(segment_analysis)) {
  segment_cars <- cars[cars$price_segment == segment & !is.na(cars$price_segment), ]
  avg_speed <- mean(segment_cars$maxspeed, na.rm = TRUE)
  avg_accel <- mean(segment_cars$acceleration, na.rm = TRUE)
  
  cat("   ", segment, " segment:\n")
  cat("     Average speed: ", round(avg_speed, 1), " km/h\n")
  cat("     Average acceleration: ", round(avg_accel, 2), " seconds\n")
}

##     Budget  segment:
##      Average speed:  180.3  km/h
##      Average acceleration:  11.72  seconds
##     Economy  segment:
##      Average speed:  181.3  km/h
##      Average acceleration:  10.96  seconds
##     Mid-Market  segment:
##      Average speed:  178.9  km/h
##      Average acceleration:  10.31  seconds
##     Premium  segment:
##      Average speed:  182.5  km/h
##      Average acceleration:  10.36  seconds

Chapter 10: Practical Exercises and Case Studies

10.1 Complete Case Study: Electric Vehicle Market Entry

# CASE STUDY: ELECTRIC VEHICLE MARKET ENTRY STRATEGY
cat("\n🔋 CASE STUDY: ELECTRIC VEHICLE MARKET ENTRY STRATEGY\n")

## 
## 🔋 CASE STUDY: ELECTRIC VEHICLE MARKET ENTRY STRATEGY

cat("============================================================\n")

## ============================================================

cat("SCENARIO: A new electric vehicle startup needs to position their first model\n")

## SCENARIO: A new electric vehicle startup needs to position their first model

cat("in the European market. Use statistical analysis to recommend specifications.\n\n")

## in the European market. Use statistical analysis to recommend specifications.

# Check if required dataset and columns exist
if (!exists("cars") || !all(c("price_num", "maxspeed", "acceleration") %in% colnames(cars))) {
  stop("Error: 'cars' dataset is missing or does not contain required columns (price_num, maxspeed, acceleration).")
}

# Step 1: Market Positioning Analysis
cat("STEP 1: MARKET POSITIONING ANALYSIS\n")

## STEP 1: MARKET POSITIONING ANALYSIS

cat("----------------------------------------\n")

## ----------------------------------------

# Calculate key statistics for decision making
price_stats_complete <- summary(cars$price_num)
speed_percentiles <- quantile(cars$maxspeed, c(0.25, 0.5, 0.75, 0.9), na.rm = TRUE)
accel_percentiles <- quantile(cars$acceleration, c(0.1, 0.25, 0.5, 0.75), na.rm = TRUE)

cat("Current Market Statistics:\n")

## Current Market Statistics:

print(price_stats_complete)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -1860   13574   24177   24267   33195   61570

cat("\nPerformance Benchmarks:\n")

## 
## Performance Benchmarks:

cat("Speed Percentiles (km/h):\n")

## Speed Percentiles (km/h):

cat("   25th percentile: ", round(speed_percentiles[1], 0), "\n")

##    25th percentile:  159

cat("   50th percentile: ", round(speed_percentiles[2], 0), "\n")

##    50th percentile:  178

cat("   75th percentile: ", round(speed_percentiles[3], 0), "\n")

##    75th percentile:  203

cat("   90th percentile: ", round(speed_percentiles[4], 0), "\n")

##    90th percentile:  218

cat("\nAcceleration Percentiles (seconds):\n")

## 
## Acceleration Percentiles (seconds):

cat("   10th percentile: ", round(accel_percentiles[1], 2), " (top 10% performance)\n")

##    10th percentile:  7.28  (top 10% performance)

cat("   25th percentile: ", round(accel_percentiles[2], 2), "\n")

##    25th percentile:  8.83

cat("   50th percentile: ", round(accel_percentiles[3], 2), "\n")

##    50th percentile:  10.9

cat("   75th percentile: ", round(accel_percentiles[4], 2), "\n")

##    75th percentile:  12.97

# Step 2: Competitive Positioning Recommendations
cat("\n\nSTEP 2: STRATEGIC RECOMMENDATIONS\n")

## 
## 
## STEP 2: STRATEGIC RECOMMENDATIONS

cat("----------------------------------------\n")

## ----------------------------------------

target_price <- median(cars$price_num, na.rm = TRUE)
target_speed <- speed_percentiles[3]  # 75th percentile for competitive advantage
target_accel <- accel_percentiles[2]  # 25th percentile (lower is better)

cat("🎯 RECOMMENDED SPECIFICATIONS:\n")

## 🎯 RECOMMENDED SPECIFICATIONS:

cat("   Target Price: $", format(round(target_price, 0), big.mark = ","), "\n")

##    Target Price: $ 24,177

cat("   Rationale: Median pricing captures mainstream market\n\n")

##    Rationale: Median pricing captures mainstream market

cat("   Target Max Speed: ", round(target_speed, 0), " km/h\n")

##    Target Max Speed:  203  km/h

cat("   Rationale: 75th percentile ensures competitive performance\n\n")

##    Rationale: 75th percentile ensures competitive performance

cat("   Target Acceleration: ", round(target_accel, 2), " seconds (0-100 km/h)\n")

##    Target Acceleration:  8.83  seconds (0-100 km/h)

cat("   Rationale: 25th percentile provides above-average performance\n")

##    Rationale: 25th percentile provides above-average performance

# Step 3: Market Opportunity Analysis
cat("\n\nSTEP 3: MARKET OPPORTUNITY ANALYSIS\n")

## 
## 
## STEP 3: MARKET OPPORTUNITY ANALYSIS

cat("----------------------------------------\n")

## ----------------------------------------

# Identify underserved segments
affordable_performance <- sum(cars$price_num <= target_price & 
                             cars$maxspeed >= target_speed & 
                             cars$acceleration <= target_accel, na.rm = TRUE)
total_models <- nrow(cars)
opportunity_gap <- (total_models - affordable_performance) / total_models * 100

cat("Market Gap Analysis:\n")

## Market Gap Analysis:

cat("   Models meeting all criteria: ", affordable_performance, " out of ", total_models, "\n")

##    Models meeting all criteria:  3  out of  190

cat("   Market opportunity: ", round(opportunity_gap, 1), "% of market underserved\n")

##    Market opportunity:  98.4 % of market underserved

cat("   Strategic implication: Significant opportunity for well-positioned EV\n")

##    Strategic implication: Significant opportunity for well-positioned EV

10.2 Student Assessment Problems

cat("\n\n🎓 STUDENT ASSESSMENT PROBLEMS\n")

## 
## 
## 🎓 STUDENT ASSESSMENT PROBLEMS

cat("=" , rep("=", 45), "\n", sep="")

## ==============================================

cat("PROBLEM SET A: CALCULATION MASTERY\n")

## PROBLEM SET A: CALCULATION MASTERY

cat("-" , rep("-", 35), "\n", sep="")

## ------------------------------------

cat("Problem A1: Given the following car prices (in thousands):\n")

## Problem A1: Given the following car prices (in thousands):

cat("$22, $18, $35, $28, $25, $45, $30, $22\n")

## $22, $18, $35, $28, $25, $45, $30, $22

cat("Tasks:\n")

## Tasks:

cat("a) Calculate the arithmetic mean\n")

## a) Calculate the arithmetic mean

cat("b) Find the median\n")

## b) Find the median

cat("c) Identify the mode\n")

## c) Identify the mode

cat("d) Determine if the distribution is skewed\n")

## d) Determine if the distribution is skewed

# Solution for instructors
sample_prices_prob <- c(22, 18, 35, 28, 25, 45, 30, 22)
cat("\n✅ INSTRUCTOR SOLUTIONS:\n")

## 
## ✅ INSTRUCTOR SOLUTIONS:

cat("a) Mean: $", round(mean(sample_prices_prob), 2), "k\n")

## a) Mean: $ 28.12 k

cat("b) Median: $", median(sample_prices_prob), "k\n")

## b) Median: $ 26.5 k

mode_freq <- table(sample_prices_prob)
mode_val <- as.numeric(names(mode_freq)[which.max(mode_freq)])
cat("c) Mode: $", mode_val, "k (appears ", max(mode_freq), " times)\n")

## c) Mode: $ 22 k (appears  2  times)

if (mean(sample_prices_prob) > median(sample_prices_prob)) {
  cat("d) RIGHT-SKEWED (mean > median)\n")
} else if (mean(sample_prices_prob) < median(sample_prices_prob)) {
  cat("d) LEFT-SKEWED (mean < median)\n")
} else {
  cat("d) SYMMETRIC (mean ≈ median)\n")
}

## d) RIGHT-SKEWED (mean > median)

cat("\n" , rep("-", 50), "\n", sep="")

## 
## --------------------------------------------------

cat("PROBLEM SET B: APPLIED ANALYSIS\n")

## PROBLEM SET B: APPLIED ANALYSIS

cat("-" , rep("-", 35), "\n", sep="")

## ------------------------------------

cat("Problem B1: Market Research Scenario\n")

## Problem B1: Market Research Scenario

cat("A car dealership surveys customer satisfaction ratings (1-10 scale):\n")

## A car dealership surveys customer satisfaction ratings (1-10 scale):

cat("Ratings: 8, 9, 7, 10, 8, 6, 9, 8, 7, 9, 8, 10, 7, 8, 9\n")

## Ratings: 8, 9, 7, 10, 8, 6, 9, 8, 7, 9, 8, 10, 7, 8, 9

cat("Questions:\n")

## Questions:

cat("1) What is the most appropriate measure of central tendency?\n")

## 1) What is the most appropriate measure of central tendency?

cat("2) Calculate that measure\n")

## 2) Calculate that measure

cat("3) What does this tell us about customer satisfaction?\n")

## 3) What does this tell us about customer satisfaction?

# Solution
satisfaction <- c(8, 9, 7, 10, 8, 6, 9, 8, 7, 9, 8, 10, 7, 8, 9)
cat("\n✅ SOLUTIONS:\n")

## 
## ✅ SOLUTIONS:

cat("1) MODE is most appropriate (ordinal scale, customer preference)\n")

## 1) MODE is most appropriate (ordinal scale, customer preference)

satisfaction_mode <- as.numeric(names(table(satisfaction))[which.max(table(satisfaction))])
cat("2) Mode = ", satisfaction_mode, " (appears ", max(table(satisfaction)), " times)\n")

## 2) Mode =  8  (appears  5  times)

cat("3) Most customers rate satisfaction as ", satisfaction_mode, "/10 - good performance\n")

## 3) Most customers rate satisfaction as  8 /10 - good performance

cat("   Mean = ", round(mean(satisfaction), 2), " confirms positive satisfaction\n")

##    Mean =  8.2  confirms positive satisfaction

Chapter 11: Summary and Key Takeaways

11.1 Conceptual Framework Summary

cat("\n📚 LECTURE 2 CONCEPTUAL FRAMEWORK SUMMARY\n")

## 
## 📚 LECTURE 2 CONCEPTUAL FRAMEWORK SUMMARY

cat("=" , rep("=", 55), "\n", sep="")

## ========================================================

cat("\n🎯 CORE CONCEPTS MASTERED:\n")

## 
## 🎯 CORE CONCEPTS MASTERED:

cat("\n1. ARITHMETIC MEAN (x̄):\n")

## 
## 1. ARITHMETIC MEAN (x̄):

cat("   • Formula: Σxi/n (raw data) or Σ(fi×xi)/Σfi (grouped data)\n")

##    • Formula: Σxi/n (raw data) or Σ(fi×xi)/Σfi (grouped data)

cat("   • Use: Symmetric distributions, mathematical precision\n")

##    • Use: Symmetric distributions, mathematical precision

cat("   • Properties: Sensitive to outliers, algebraically manipulable\n")

##    • Properties: Sensitive to outliers, algebraically manipulable

cat("\n2. MEDIAN (Me):\n")

## 
## 2. MEDIAN (Me):

cat("   • Definition: Middle value in ordered dataset\n")

##    • Definition: Middle value in ordered dataset

cat("   • Use: Skewed distributions, robust measure\n")

##    • Use: Skewed distributions, robust measure

cat("   • Properties: Resistant to outliers, positional measure\n")

##    • Properties: Resistant to outliers, positional measure

cat("\n3. MODE (Mo):\n")

## 
## 3. MODE (Mo):

cat("   • Definition: Most frequently occurring value\n")

##    • Definition: Most frequently occurring value

cat("   • Use: Categorical data, popularity analysis\n")

##    • Use: Categorical data, popularity analysis

cat("   • Properties: Can have multiple modes or no mode\n")

##    • Properties: Can have multiple modes or no mode

cat("\n4. DISTRIBUTION SHAPE:\n")

## 
## 4. DISTRIBUTION SHAPE:

cat("   • Symmetric: Mean ≈ Median ≈ Mode\n")

##    • Symmetric: Mean ≈ Median ≈ Mode

cat("   • Right-skewed: Mean > Median > Mode\n")

##    • Right-skewed: Mean > Median > Mode

cat("   • Left-skewed: Mode > Median > Mean\n")

##    • Left-skewed: Mode > Median > Mean

cat("\n🔧 TECHNICAL SKILLS DEVELOPED:\n")

## 
## 🔧 TECHNICAL SKILLS DEVELOPED:

cat("   ✓ Manual calculation of all central tendency measures\n")

##    ✓ Manual calculation of all central tendency measures

cat("   ✓ Application of formulas to grouped and ungrouped data\n")

##    ✓ Application of formulas to grouped and ungrouped data

cat("   ✓ UBStats package proficiency\n")

##    ✓ UBStats package proficiency

cat("   ✓ Distribution shape analysis\n")

##    ✓ Distribution shape analysis

cat("   ✓ Business interpretation of statistical results\n")

##    ✓ Business interpretation of statistical results

11.2 Next Lecture Preview

cat("\n\n🔮 PREVIEW: LECTURE 3 - MEASURES OF VARIABILITY\n")

## 
## 
## 🔮 PREVIEW: LECTURE 3 - MEASURES OF VARIABILITY

cat("=" , rep("=", 55), "\n", sep="")

## ========================================================

cat("Coming up in our next session:\n")

## Coming up in our next session:

cat("\n📊 MEASURES OF SPREAD:\n")

## 
## 📊 MEASURES OF SPREAD:

cat("   • Range and Interquartile Range (IQR)\n")

##    • Range and Interquartile Range (IQR)

cat("   • Variance and Standard Deviation\n")

##    • Variance and Standard Deviation

cat("   • Coefficient of Variation\n")

##    • Coefficient of Variation

cat("   • Outlier detection methods\n")

##    • Outlier detection methods

cat("\n🎯 ADVANCED TOPICS:\n")

## 
## 🎯 ADVANCED TOPICS:

cat("   • Risk assessment in financial contexts\n")

##    • Risk assessment in financial contexts

cat("   • Quality control applications\n")

##    • Quality control applications

cat("   • Comparative variability analysis\n")

##    • Comparative variability analysis

cat("   • Five-number summary and boxplots\n")

##    • Five-number summary and boxplots

cat("\n💼 BUSINESS APPLICATIONS:\n")

## 
## 💼 BUSINESS APPLICATIONS:

cat("   • Investment risk analysis\n")

##    • Investment risk analysis

cat("   • Manufacturing quality control\n")

##    • Manufacturing quality control

cat("   • Market volatility assessment\n")

##    • Market volatility assessment

cat("   • Performance consistency evaluation\n")

##    • Performance consistency evaluation

cat("\n📋 PREPARATION TASKS:\n")

## 
## 📋 PREPARATION TASKS:

cat("   1. Review central tendency concepts from today\n")

##    1. Review central tendency concepts from today

cat("   2. Practice manual calculations with small datasets\n")

##    2. Practice manual calculations with small datasets

cat("   3. Familiarize yourself with variance formula\n")

##    3. Familiarize yourself with variance formula

cat("   4. Think about real-world examples of variability\n")

##    4. Think about real-world examples of variability

11.3 Final Practical Exercise

cat("\n\n🏆 CAPSTONE EXERCISE: COMPREHENSIVE ANALYSIS\n")

## 
## 
## 🏆 CAPSTONE EXERCISE: COMPREHENSIVE ANALYSIS

cat("=" , rep("=", 55), "\n", sep="")

## ========================================================

cat("SCENARIO: You are presenting to the board of directors about market positioning\n")

## SCENARIO: You are presenting to the board of directors about market positioning

cat("for a new luxury car model. Prepare a complete statistical brief.\n\n")

## for a new luxury car model. Prepare a complete statistical brief.

# Select luxury cars (top 25% by price)
luxury_threshold <- quantile(cars$price_num, 0.75, na.rm = TRUE)
luxury_cars <- cars[cars$price_num >= luxury_threshold, ]

cat("LUXURY MARKET ANALYSIS (Top 25% by Price):\n")

## LUXURY MARKET ANALYSIS (Top 25% by Price):

cat("Threshold: $", format(round(luxury_threshold, 0), big.mark = ","), "\n")

## Threshold: $ 33,195

cat("Sample size: ", nrow(luxury_cars), " models\n")

## Sample size:  48  models

# Complete analysis
luxury_stats <- list(
  price_mean = mean(luxury_cars$price_num, na.rm = TRUE),
  price_median = median(luxury_cars$price_num, na.rm = TRUE),
  speed_mean = mean(luxury_cars$maxspeed, na.rm = TRUE),
  speed_median = median(luxury_cars$maxspeed, na.rm = TRUE),
  accel_mean = mean(luxury_cars$acceleration, na.rm = TRUE),
  accel_median = median(luxury_cars$acceleration, na.rm = TRUE)
)

cat("\n📊 LUXURY SEGMENT CHARACTERISTICS:\n")

## 
## 📊 LUXURY SEGMENT CHARACTERISTICS:

cat("Price Statistics:\n")

## Price Statistics:

cat("   Mean: $", format(round(luxury_stats$price_mean, 0), big.mark = ","), "\n")

##    Mean: $ 40,717

cat("   Median: $", format(round(luxury_stats$price_median, 0), big.mark = ","), "\n")

##    Median: $ 38,904

cat("Performance Statistics:\n")

## Performance Statistics:

cat("   Average Speed: ", round(luxury_stats$speed_mean, 1), " km/h\n")

##    Average Speed:  182.5  km/h

cat("   Median Acceleration: ", round(luxury_stats$accel_median, 2), " seconds\n")

##    Median Acceleration:  10.1  seconds

# Country analysis for luxury segment
luxury_countries <- table(luxury_cars$country)
top_luxury_country <- names(luxury_countries)[which.max(luxury_countries)]

cat("Geographic Distribution:\n")

## Geographic Distribution:

cat("   Leading luxury manufacturer: ", top_luxury_country, "\n")

##    Leading luxury manufacturer:  Germany

cat("   Market share: ", round(max(luxury_countries)/nrow(luxury_cars)*100, 1), "%\n")

##    Market share:  27.1 %

cat("\n🎯 STRATEGIC RECOMMENDATIONS:\n")

## 
## 🎯 STRATEGIC RECOMMENDATIONS:

cat("   • Target price: $", format(round(luxury_stats$price_median, -3), big.mark = ","), "\n")

##    • Target price: $ 39,000

cat("   • Minimum speed: ", round(luxury_stats$speed_median, 0), " km/h\n")

##    • Minimum speed:  185  km/h

cat("   • Maximum acceleration: ", round(luxury_stats$accel_median, 1), " seconds\n")

##    • Maximum acceleration:  10.1  seconds

cat("   • Benchmark against: ", top_luxury_country, " manufacturers\n")

##    • Benchmark against:  Germany  manufacturers

cat("\n📈 SUCCESS METRICS:\n")

## 
## 📈 SUCCESS METRICS:

cat("   Price positioning within luxury segment median ±10%\n")

##    Price positioning within luxury segment median ±10%

cat("   Performance specs matching or exceeding segment averages\n")

##    Performance specs matching or exceeding segment averages

cat("   Quality standards aligned with ", top_luxury_country, " benchmarks\n")

##    Quality standards aligned with  Germany  benchmarks

Conclusion

Learning Outcomes Achieved

Today’s comprehensive journey through descriptive statistics has equipped you with:

Mathematical Foundation: Precise understanding of central tendency formulas and calculations
Practical Application: Real-world business problem-solving using statistical measures
Technical Proficiency: Hands-on experience with R and UBStats package
Strategic Thinking: Converting statistical insights into actionable business recommendations
Professional Communication: Presenting statistical findings to diverse audiences

The Statistical Mindset

Remember that statistics is not just about numbers—it’s about understanding patterns, making informed decisions, and communicating insights effectively. Every measure we calculate tells a story about our data, and every story guides important business decisions.

As we continue our statistical journey, you’ll discover that today’s foundation in central tendency naturally leads to questions about variability, relationships between variables, and ultimately, to the powerful world of inferential statistics.

Next Session: We’ll explore how spread and variability around these central measures reveal even deeper insights about market dynamics, risk assessment, and competitive positioning.

“In God we trust. All others must bring data.” - W. Edwards Deming

Leksioni 2: Descriptive Statistics - Measures of Central Tendency, Variability, and Distribution Shape

Endri Raço

Chapter 1: From Visualization to Numerical Summaries

What You Will Master Today

Chapter 2: The Foundation - Understanding Our Data Context

Connecting to Real Business Scenarios

Chapter 3: Measures of Central Tendency - Finding the “Typical” Value

3.1 The Arithmetic Mean (x̄) - The Mathematical Center

📐 Mathematical Foundation

🔍 Business Example: Car Price Analysis

🌟 Advanced Example: Weighted Mean for Grouped Data

🎯 Real Dataset Application

3.2 The Median (Me) - The Positional Center

📐 Mathematical Foundation

🔍 Step-by-Step Median Calculation

🏆 Advanced Example: Median for Car Speed Data

🔍 Real Dataset Median Analysis

3.3 The Mode (Mo) - The Most Popular Value

📐 Mathematical Foundation

🎯 Mode Identification Examples

🏆 Advanced Mode Calculation for Grouped Data

Chapter 4: When to Use Each Measure - The Decision Framework

4.1 The Statistical Decision Tree

🔍 Practical Application Examples

Chapter 5: Comprehensive Analysis with R - UBStats Functions

5.1 Professional Statistical Summaries

🎯 Business Intelligence Dashboard

Chapter 6: Advanced Topics - Distribution Shape and Skewness

6.1 Mathematical Relationship Between Mean, Median, and Mode

6.2 Coefficient of Skewness

Chapter 7: Hands-On Problem Solving Workshop

7.1 Complete Problem Solution: Car Manufacturer Strategic Analysis

7.2 Interactive Exercise: Student Practice Problems

Chapter 8: Professional Reporting and Communication

8.1 Executive Summary Template

8.2 Technical Methodology Report

Chapter 9: Advanced Applications and Extensions

9.1 Comparative Analysis Framework

9.2 Market Segmentation Analysis

Chapter 10: Practical Exercises and Case Studies

10.1 Complete Case Study: Electric Vehicle Market Entry

10.2 Student Assessment Problems

Chapter 11: Summary and Key Takeaways

11.1 Conceptual Framework Summary

11.2 Next Lecture Preview

11.3 Final Practical Exercise

Conclusion

Learning Outcomes Achieved

The Statistical Mindset