Chapter 1: From Visualization to Numerical Summaries

What You Will Master Today

Building on our visualization skills from Lecture 1, today we dive deep into the mathematical heart of statistics:

  1. Measures of Central Tendency: Mean, median, and mode - finding the “typical” value
  2. Mathematical Formulas: Understanding the precise calculations behind each measure
  3. Data Type Applications: When to use which measure for different variable types
  4. Practical Calculations: Hand calculations and R implementations
  5. Business Decision Making: Using statistical measures for strategic insights
  6. Distribution Analysis: Understanding what the numbers reveal about data shape

Chapter 2: The Foundation - Understanding Our Data Context

Connecting to Real Business Scenarios

Imagine you’re the lead analyst for a European automotive consortium. Your dataset contains critical intelligence about 190 car models that will inform billion-dollar investment decisions. Every statistical measure we calculate today has direct implications for:

  • Product Development Strategy: Which price points to target
  • Market Positioning: Understanding competitive landscapes
  • Performance Benchmarking: Setting engineering targets
  • Regional Expansion: Geographic market opportunities
# Load our analytical environment
library(UBStats)
## Package UBStats (0.2.2) loaded.
## To cite, type citation("UBStats")
## Please report improvements and bugs to: https://github.com/raffaellapiccarreta/UBStats/issues
# Load the cars dataset from Lecture 1
# Create the cars dataset for statistical analysis
# This code reproduces the dataset from Lecture 1

# Load required packages
library(UBStats)

# Set seed for reproducible results
set.seed(123)  
n <- 190

# Generate sales data with realistic distribution
low_sales <- sample(500:3000, round(n*0.6), replace = TRUE)
mid_sales <- sample(3000:8000, round(n*0.25), replace = TRUE) 
high_sales <- sample(8000:50000, n - length(low_sales) - length(mid_sales), replace = TRUE)
all_sales <- c(low_sales, mid_sales, high_sales)

# Create the complete cars dataset
cars <- data.frame(
  model = paste("Model", 1:n),
  sales = sample(all_sales),  # Shuffle the sales values
  bestselling = sample(0:1, n, replace = TRUE, prob = c(0.9, 0.1)),
  price_num = round(rnorm(n, 25000, 15000)),
  price_classes = sample(c("low", "mid", "high"), n, replace = TRUE, prob = c(0.27, 0.55, 0.18)),
  maxspeed = round(rnorm(n, 180, 30)),
  acceleration = round(rnorm(n, 11, 3), 1),
  urban_fuelcons = round(rnorm(n, 8, 2), 1),
  fueltank = round(rnorm(n, 60, 15)),
  weight = round(rnorm(n, 1400, 300)),
  n_doors_min = sample(c(2,3,4,5,7), n, replace = TRUE, prob = c(0.09, 0.14, 0.05, 0.71, 0.01)),
  country = sample(c("Germany", "Japan", "France", "Italy", "United States", "Europe - others", "Asia - others"), 
                   n, replace = TRUE, prob = c(0.26, 0.19, 0.15, 0.11, 0.09, 0.14, 0.06))
)

# Clean up unrealistic values
cars$price_num[cars$price_num < 5000] <- cars$price_num[cars$price_num < 5000] + 10000
cars$maxspeed[cars$maxspeed < 100] <- cars$maxspeed[cars$maxspeed < 100] + 50
cars$acceleration[cars$acceleration < 3] <- abs(cars$acceleration[cars$acceleration < 3]) + 5

# Check the data structure
str(cars)
## 'data.frame':    190 obs. of  12 variables:
##  $ model         : chr  "Model 1" "Model 2" "Model 3" "Model 4" ...
##  $ sales         : int  12712 873 1528 23023 2956 1346 3711 2726 37652 6123 ...
##  $ bestselling   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ price_num     : num  32656 42939 8741 7823 27328 ...
##  $ price_classes : chr  "low" "mid" "low" "mid" ...
##  $ maxspeed      : num  169 137 153 214 158 194 171 205 177 159 ...
##  $ acceleration  : num  4.7 11 18.2 9.8 15 12.1 11.8 10.7 9 10.8 ...
##  $ urban_fuelcons: num  6.2 4.7 5.4 7.4 9.1 11.6 6 7 9.9 7.5 ...
##  $ fueltank      : num  65 46 87 39 85 62 60 40 72 79 ...
##  $ weight        : num  1445 1228 1499 1321 1585 ...
##  $ n_doors_min   : num  5 5 5 4 5 2 5 2 5 5 ...
##  $ country       : chr  "Japan" "Italy" "Italy" "Japan" ...
head(cars)
# Save the dataset
save(cars, file = "stat_datasets_cl17.Rdata")

# Confirm the file was created
cat("✅ Dataset created successfully!\n")
## ✅ Dataset created successfully!
cat("📊 Dataset contains", nrow(cars), "car models with", ncol(cars), "variables\n")
## 📊 Dataset contains 190 car models with 12 variables
cat("💾 Saved as: stat_datasets_cl17.Rdata\n")
## 💾 Saved as: stat_datasets_cl17.Rdata
cat("📁 Location:", getwd(), "\n")
## 📁 Location: C:/Users/ENDRI/Desktop/Virtus
# Quick preview of the data
cat("\n📋 Dataset Summary:\n")
## 
## 📋 Dataset Summary:
summary(cars[c("price_num", "maxspeed", "acceleration")])
##    price_num        maxspeed      acceleration   
##  Min.   :-1860   Min.   :103.0   Min.   : 3.200  
##  1st Qu.:13574   1st Qu.:159.2   1st Qu.: 8.825  
##  Median :24177   Median :178.0   Median :10.900  
##  Mean   :24267   Mean   :180.6   Mean   :10.837  
##  3rd Qu.:33195   3rd Qu.:203.0   3rd Qu.:12.975  
##  Max.   :61570   Max.   :279.0   Max.   :18.200
# Quick reminder of our data structure
str(cars)
## 'data.frame':    190 obs. of  12 variables:
##  $ model         : chr  "Model 1" "Model 2" "Model 3" "Model 4" ...
##  $ sales         : int  12712 873 1528 23023 2956 1346 3711 2726 37652 6123 ...
##  $ bestselling   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ price_num     : num  32656 42939 8741 7823 27328 ...
##  $ price_classes : chr  "low" "mid" "low" "mid" ...
##  $ maxspeed      : num  169 137 153 214 158 194 171 205 177 159 ...
##  $ acceleration  : num  4.7 11 18.2 9.8 15 12.1 11.8 10.7 9 10.8 ...
##  $ urban_fuelcons: num  6.2 4.7 5.4 7.4 9.1 11.6 6 7 9.9 7.5 ...
##  $ fueltank      : num  65 46 87 39 85 62 60 40 72 79 ...
##  $ weight        : num  1445 1228 1499 1321 1585 ...
##  $ n_doors_min   : num  5 5 5 4 5 2 5 2 5 5 ...
##  $ country       : chr  "Japan" "Italy" "Italy" "Japan" ...
head(cars, 5)
cat("📊 Dataset Overview:\n")
## 📊 Dataset Overview:
cat("   Total Models:", nrow(cars), "\n")
##    Total Models: 190
cat("   Variables:", ncol(cars), "\n")
##    Variables: 12
cat("   Geographic Coverage:", length(unique(cars$country)), "countries\n")
##    Geographic Coverage: 7 countries

Chapter 3: Measures of Central Tendency - Finding the “Typical” Value

The concept of central tendency answers the fundamental question: “What is the typical value in our dataset?” However, “typical” can mean different things depending on context and data characteristics.

3.1 The Arithmetic Mean (x̄) - The Mathematical Center

📐 Mathematical Foundation

The arithmetic mean represents the mathematical center of gravity for your data.

For Raw Data (Ungrouped): \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{x_1 + x_2 + ... + x_n}{n}\]

For Frequency Data (Grouped): \[\bar{x} = \frac{\sum_{i=1}^{k} f_i \cdot x_i}{\sum_{i=1}^{k} f_i} = \frac{\sum_{i=1}^{k} f_i \cdot x_i}{n}\]

Where: - \(x_i\) = individual data values or class midpoints - \(f_i\) = frequency of each value/class - \(n\) = total number of observations - \(k\) = number of distinct values/classes

🔍 Business Example: Car Price Analysis

# Example 1: Mean calculation for a subset of car prices
sample_prices <- c(18000, 22000, 25000, 28000, 35000)

# Manual calculation following the formula
sum_prices <- sum(sample_prices)
n_cars <- length(sample_prices)
mean_manual <- sum_prices / n_cars

cat("🚗 Sample Car Prices: $", paste(sample_prices, collapse = ", $"), "\n")
## 🚗 Sample Car Prices: $ 18000, $22000, $25000, $28000, $35000
cat("📊 Manual Calculation:\n")
## 📊 Manual Calculation:
cat("   Sum of prices: $", sum_prices, "\n")
##    Sum of prices: $ 128000
cat("   Number of cars: ", n_cars, "\n")
##    Number of cars:  5
cat("   Mean = ", sum_prices, " ÷ ", n_cars, " = $", round(mean_manual, 2), "\n")
##    Mean =  128000  ÷  5  = $ 25600
# Verification with R function
mean_r <- mean(sample_prices)
cat("✅ R function verification: $", round(mean_r, 2), "\n")
## ✅ R function verification: $ 25600

🌟 Advanced Example: Weighted Mean for Grouped Data

Let’s calculate the mean using frequency data from our car ownership example:

# Car ownership data from the lecture slides
cars_owned <- c(1, 2, 3, 5)
frequencies <- c(32, 48, 16, 4)
n_families <- sum(frequencies)

cat("🏠 Family Car Ownership Analysis:\n")
## 🏠 Family Car Ownership Analysis:
cat("Cars Owned: ", paste(cars_owned, collapse = ", "), "\n")
## Cars Owned:  1, 2, 3, 5
cat("Frequencies: ", paste(frequencies, collapse = ", "), "\n")
## Frequencies:  32, 48, 16, 4
# Manual weighted mean calculation
weighted_sum <- sum(cars_owned * frequencies)
weighted_mean <- weighted_sum / n_families

cat("\n📊 Weighted Mean Calculation:\n")
## 
## 📊 Weighted Mean Calculation:
cat("   Σ(xi × fi) = ", paste(cars_owned, "×", frequencies, collapse = " + "), "\n")
##    Σ(xi × fi) =  1 × 32 + 2 × 48 + 3 × 16 + 5 × 4
cat("             = ", paste(cars_owned * frequencies, collapse = " + "), "\n")
##              =  32 + 96 + 48 + 20
cat("             = ", weighted_sum, "\n")
##              =  196
cat("   Mean = ", weighted_sum, " ÷ ", n_families, " = ", round(weighted_mean, 3), " cars per family\n")
##    Mean =  196  ÷  100  =  1.96  cars per family
# Business interpretation
cat("\n💡 Business Insight: The average family owns ", round(weighted_mean, 2), " cars\n")
## 
## 💡 Business Insight: The average family owns  1.96  cars

🎯 Real Dataset Application

# Calculate mean for key variables in our cars dataset
price_mean <- mean(cars$price_num, na.rm = TRUE)
speed_mean <- mean(cars$maxspeed, na.rm = TRUE)
accel_mean <- mean(cars$acceleration, na.rm = TRUE)

cat("🚗 AUTOMOTIVE MARKET AVERAGES:\n")
## 🚗 AUTOMOTIVE MARKET AVERAGES:
cat("💰 Average Price: $", round(price_mean, 0), "\n")
## 💰 Average Price: $ 24267
cat("⚡ Average Max Speed: ", round(speed_mean, 1), " km/h\n")
## ⚡ Average Max Speed:  180.6  km/h
cat("🏁 Average Acceleration: ", round(accel_mean, 2), " seconds (0-100 km/h)\n")
## 🏁 Average Acceleration:  10.84  seconds (0-100 km/h)
# Strategic implications
cat("\n🎯 STRATEGIC IMPLICATIONS:\n")
## 
## 🎯 STRATEGIC IMPLICATIONS:
cat("   • New models should target price point around $", round(price_mean, 0), "\n")
##    • New models should target price point around $ 24267
cat("   • Performance benchmark: ", round(speed_mean, 0), " km/h max speed\n")
##    • Performance benchmark:  181  km/h max speed
cat("   • Acceleration target: Under ", round(accel_mean, 1), " seconds for competitiveness\n")
##    • Acceleration target: Under  10.8  seconds for competitiveness

3.2 The Median (Me) - The Positional Center

📐 Mathematical Foundation

The median represents the middle position when data is arranged in order. It divides the dataset into two equal halves.

For Odd n: \(Me = x_{(\frac{n+1}{2})}\)

For Even n: \(Me = \frac{x_{(\frac{n}{2})} + x_{(\frac{n}{2}+1)}}{2}\)

For Grouped Data: \(Me = L + \frac{\frac{n}{2} - CF_{before}}{f_{median}} \times h\)

Where: - \(L\) = lower boundary of median class - \(CF_{before}\) = cumulative frequency before median class - \(f_{median}\) = frequency of median class - \(h\) = class width

🔍 Step-by-Step Median Calculation

# Example with airline ticket prices
ticket_prices <- c(80, 120, 150, 90)

cat("✈️ Airline Ticket Prices to London:\n")
## ✈️ Airline Ticket Prices to London:
cat("Original data: $", paste(ticket_prices, collapse = ", $"), "\n")
## Original data: $ 80, $120, $150, $90
# Step 1: Sort the data
sorted_prices <- sort(ticket_prices)
cat("Sorted data: $", paste(sorted_prices, collapse = ", $"), "\n")
## Sorted data: $ 80, $90, $120, $150
# Step 2: Find the median position
n <- length(sorted_prices)
cat("n =", n, "(even number)\n")
## n = 4 (even number)
# Step 3: Calculate median
if (n %% 2 == 0) {
  # Even number of observations
  pos1 <- n / 2
  pos2 <- (n / 2) + 1
  median_manual <- (sorted_prices[pos1] + sorted_prices[pos2]) / 2
  
  cat("Median position: (", pos1, " + ", pos2, ") ÷ 2\n")
  cat("Median = ($", sorted_prices[pos1], " + $", sorted_prices[pos2], ") ÷ 2 = $", median_manual, "\n")
} else {
  # Odd number of observations
  pos <- (n + 1) / 2
  median_manual <- sorted_prices[pos]
  cat("Median position:", pos, "\n")
  cat("Median = $", median_manual, "\n")
}
## Median position: ( 2  +  3 ) ÷ 2
## Median = ($ 90  + $ 120 ) ÷ 2 = $ 105
# Verification
median_r <- median(ticket_prices)
cat("✅ R verification: $", median_r, "\n")
## ✅ R verification: $ 105

🏆 Advanced Example: Median for Car Speed Data

# Using the speed distribution from lecture slides
speed_intervals <- c("[0,30)", "[30,50)", "[50,100)")
frequencies_speed <- c(4, 12, 8)
n_total <- sum(frequencies_speed)

# Calculate cumulative frequencies
cumulative_freq <- cumsum(frequencies_speed)
cat("🏎️ Car Speed Distribution Analysis:\n")
## 🏎️ Car Speed Distribution Analysis:
cat("Intervals: ", paste(speed_intervals, collapse = ", "), "\n")
## Intervals:  [0,30), [30,50), [50,100)
cat("Frequencies: ", paste(frequencies_speed, collapse = ", "), "\n")
## Frequencies:  4, 12, 8
cat("Cumulative frequencies: ", paste(cumulative_freq, collapse = ", "), "\n")
## Cumulative frequencies:  4, 16, 24
# Find median class
median_position <- n_total / 2
cat("\nMedian position: n/2 =", n_total, "÷ 2 =", median_position, "\n")
## 
## Median position: n/2 = 24 ÷ 2 = 12
# Identify median class
median_class_index <- which(cumulative_freq >= median_position)[1]
cat("Median class:", speed_intervals[median_class_index], "\n")
## Median class: [30,50)
# For this example, median class is [30,50)
# Using the median formula for grouped data
L <- 30  # Lower boundary of median class
CF_before <- 4  # Cumulative frequency before median class
f_median <- 12  # Frequency of median class
h <- 20  # Class width

median_grouped <- L + ((median_position - CF_before) / f_median) * h

cat("\n📊 Median Calculation for Grouped Data:\n")
## 
## 📊 Median Calculation for Grouped Data:
cat("   L (lower boundary) =", L, "\n")
##    L (lower boundary) = 30
cat("   n/2 =", median_position, "\n")
##    n/2 = 12
cat("   CF_before =", CF_before, "\n")
##    CF_before = 4
cat("   f_median =", f_median, "\n")
##    f_median = 12
cat("   h (class width) =", h, "\n")
##    h (class width) = 20
cat("   Median = ", L, " + ((", median_position, " - ", CF_before, ") ÷ ", f_median, ") × ", h, "\n")
##    Median =  30  + (( 12  -  4 ) ÷  12 ) ×  20
cat("          = ", L, " + (", (median_position - CF_before), " ÷ ", f_median, ") × ", h, "\n")
##           =  30  + ( 8  ÷  12 ) ×  20
cat("          = ", L, " + ", round((median_position - CF_before) / f_median, 3), " × ", h, "\n")
##           =  30  +  0.667  ×  20
cat("          = ", round(median_grouped, 2), " km/h\n")
##           =  43.33  km/h

🔍 Real Dataset Median Analysis

# Calculate medians for our car dataset
price_median <- median(cars$price_num, na.rm = TRUE)
speed_median <- median(cars$maxspeed, na.rm = TRUE)
accel_median <- median(cars$acceleration, na.rm = TRUE)

cat("🚗 AUTOMOTIVE MARKET MEDIANS:\n")
## 🚗 AUTOMOTIVE MARKET MEDIANS:
cat("💰 Median Price: $", round(price_median, 0), "\n")
## 💰 Median Price: $ 24177
cat("⚡ Median Max Speed: ", round(speed_median, 1), " km/h\n")
## ⚡ Median Max Speed:  178  km/h
cat("🏁 Median Acceleration: ", round(accel_median, 2), " seconds\n")
## 🏁 Median Acceleration:  10.9  seconds
# Compare with means calculated earlier
cat("\n📊 MEAN vs MEDIAN COMPARISON:\n")
## 
## 📊 MEAN vs MEDIAN COMPARISON:
cat("Price: Mean $", round(price_mean, 0), " vs Median $", round(price_median, 0), "\n")
## Price: Mean $ 24267  vs Median $ 24177
cat("Speed: Mean ", round(speed_mean, 1), " vs Median ", round(speed_median, 1), " km/h\n")
## Speed: Mean  180.6  vs Median  178  km/h
cat("Acceleration: Mean ", round(accel_mean, 2), " vs Median ", round(accel_median, 2), " seconds\n")
## Acceleration: Mean  10.84  vs Median  10.9  seconds
# Skewness interpretation
if (price_mean > price_median) {
  cat("\n🔍 Price distribution: RIGHT-SKEWED (mean > median)\n")
  cat("   Interpretation: Some very expensive luxury cars pull the average up\n")
} else if (price_mean < price_median) {
  cat("\n🔍 Price distribution: LEFT-SKEWED (mean < median)\n")
} else {
  cat("\n🔍 Price distribution: SYMMETRIC (mean ≈ median)\n")
}
## 
## 🔍 Price distribution: RIGHT-SKEWED (mean > median)
##    Interpretation: Some very expensive luxury cars pull the average up

Chapter 4: When to Use Each Measure - The Decision Framework

4.1 The Statistical Decision Tree

Understanding when to use mean, median, or mode is crucial for accurate analysis:

cat("📋 CENTRAL TENDENCY DECISION FRAMEWORK:\n")
## 📋 CENTRAL TENDENCY DECISION FRAMEWORK:
cat("=" , rep("=", 50), "\n", sep="")
## ===================================================
cat("\n🎯 USE MEAN when:\n")
## 
## 🎯 USE MEAN when:
cat("   ✓ Data is approximately symmetric\n")
##    ✓ Data is approximately symmetric
cat("   ✓ No extreme outliers present\n")
##    ✓ No extreme outliers present
cat("   ✓ Working with interval/ratio data\n")
##    ✓ Working with interval/ratio data
cat("   ✓ Need mathematical precision\n")
##    ✓ Need mathematical precision
cat("   ✓ Planning to use in further calculations\n")
##    ✓ Planning to use in further calculations
cat("\n🎯 USE MEDIAN when:\n")
## 
## 🎯 USE MEDIAN when:
cat("   ✓ Data is skewed (left or right)\n")
##    ✓ Data is skewed (left or right)
cat("   ✓ Outliers are present\n")
##    ✓ Outliers are present
cat("   ✓ Working with ordinal data\n")
##    ✓ Working with ordinal data
cat("   ✓ Need robust measure (resistant to extremes)\n")
##    ✓ Need robust measure (resistant to extremes)
cat("   ✓ Income, house prices, or similar economic data\n")
##    ✓ Income, house prices, or similar economic data
cat("\n🎯 USE MODE when:\n")
## 
## 🎯 USE MODE when:
cat("   ✓ Working with nominal (categorical) data\n")
##    ✓ Working with nominal (categorical) data
cat("   ✓ Need the most frequent category\n")
##    ✓ Need the most frequent category
cat("   ✓ Business decisions based on popularity\n")
##    ✓ Business decisions based on popularity
cat("   ✓ Quality control (most common defect)\n")
##    ✓ Quality control (most common defect)
cat("   ✓ Market research (most preferred option)\n")
##    ✓ Market research (most preferred option)

🔍 Practical Application Examples

# Example 1: Symmetric data - use mean
fuel_efficiency <- c(7.2, 7.8, 8.1, 8.3, 8.5, 8.7, 9.1, 9.3)
cat("⛽ Fuel Efficiency Data (L/100km): ", paste(fuel_efficiency, collapse = ", "), "\n")
## ⛽ Fuel Efficiency Data (L/100km):  7.2, 7.8, 8.1, 8.3, 8.5, 8.7, 9.1, 9.3
cat("   Distribution: Approximately symmetric\n")
##    Distribution: Approximately symmetric
cat("   Best measure: MEAN = ", round(mean(fuel_efficiency), 2), " L/100km\n")
##    Best measure: MEAN =  8.38  L/100km
# Example 2: Skewed data - use median
executive_salaries <- c(45000, 48000, 52000, 55000, 58000, 62000, 350000)
cat("\n💼 Executive Salaries: $", paste(executive_salaries, collapse = ", $"), "\n")
## 
## 💼 Executive Salaries: $ 45000, $48000, $52000, $55000, $58000, $62000, $350000
cat("   Distribution: Right-skewed (one very high salary)\n")
##    Distribution: Right-skewed (one very high salary)
cat("   Mean: $", round(mean(executive_salaries), 0), " (pulled up by outlier)\n")
##    Mean: $ 95714  (pulled up by outlier)
cat("   Median: $", round(median(executive_salaries), 0), " (more representative)\n")
##    Median: $ 55000  (more representative)
cat("   Best measure: MEDIAN\n")
##    Best measure: MEDIAN
# Example 3: Categorical data - use mode
preferred_colors <- c("Blue", "Red", "Blue", "Green", "Blue", "Red", "Blue", "White")
color_freq <- table(preferred_colors)
modal_color <- names(color_freq)[which.max(color_freq)]
cat("\n🎨 Preferred Car Colors: ", paste(preferred_colors, collapse = ", "), "\n")
## 
## 🎨 Preferred Car Colors:  Blue, Red, Blue, Green, Blue, Red, Blue, White
cat("   Best measure: MODE = ", modal_color, " (most frequent choice)\n")
##    Best measure: MODE =  Blue  (most frequent choice)

Chapter 5: Comprehensive Analysis with R - UBStats Functions

5.1 Professional Statistical Summaries

cat("🔧 PROFESSIONAL STATISTICAL ANALYSIS USING UBStats:\n")
## 🔧 PROFESSIONAL STATISTICAL ANALYSIS USING UBStats:
cat("=" , rep("=", 60), "\n", sep="")
## =============================================================
# Central tendency analysis for price
cat("\n💰 PRICE ANALYSIS:\n")
## 
## 💰 PRICE ANALYSIS:
price_central <- distr.summary.x(cars$price_num, stats="central")
##    n n.a  mode n.modes  mode% median     mean
##  190   0 26809       1 0.0105  24177 24267.47
print(price_central)
## $`Central tendency measures`
##     n n.a  mode n.modes      mode% median     mean
## 1 190   0 26809       1 0.01052632  24177 24267.47
# Central tendency analysis for performance
cat("\n🏁 ACCELERATION ANALYSIS:\n")
## 
## 🏁 ACCELERATION ANALYSIS:
accel_central <- distr.summary.x(cars$acceleration, stats="central")
##    n n.a mode n.modes  mode% median  mean
##  190   0 12.3       2 0.0263   10.9 10.84
print(accel_central)
## $`Central tendency measures`
##     n n.a mode n.modes      mode% median     mean
## 1 190   0 12.3       2 0.02631579   10.9 10.83684
# Central tendency analysis for speed
cat("\n⚡ MAX SPEED ANALYSIS:\n")
## 
## ⚡ MAX SPEED ANALYSIS:
speed_central <- distr.summary.x(cars$maxspeed, stats="central")
##    n n.a mode n.modes  mode% median   mean
##  190   0  172       1 0.0368    178 180.63
print(speed_central)
## $`Central tendency measures`
##     n n.a mode n.modes      mode% median     mean
## 1 190   0  172       1 0.03684211    178 180.6316

🎯 Business Intelligence Dashboard

cat("\n📊 AUTOMOTIVE MARKET INTELLIGENCE DASHBOARD:\n")
## 
## 📊 AUTOMOTIVE MARKET INTELLIGENCE DASHBOARD:
cat("=" , rep("=", 55), "\n", sep="")
## ========================================================
# Price intelligence
cat("\n💰 PRICE INTELLIGENCE:\n")
## 
## 💰 PRICE INTELLIGENCE:
cat("   Mean Price: $", round(mean(cars$price_num, na.rm = TRUE), 0), "\n")
##    Mean Price: $ 24267
cat("   Median Price: $", round(median(cars$price_num, na.rm = TRUE), 0), "\n")
##    Median Price: $ 24177
cat("   Price Range: $", round(min(cars$price_num, na.rm = TRUE), 0), 
    " - $", round(max(cars$price_num, na.rm = TRUE), 0), "\n")
##    Price Range: $ -1860  - $ 61570
# Performance benchmarks
cat("\n🏎️ PERFORMANCE BENCHMARKS:\n")
## 
## 🏎️ PERFORMANCE BENCHMARKS:
cat("   Average Top Speed: ", round(mean(cars$maxspeed, na.rm = TRUE), 1), " km/h\n")
##    Average Top Speed:  180.6  km/h
cat("   Median Acceleration: ", round(median(cars$acceleration, na.rm = TRUE), 2), " seconds\n")
##    Median Acceleration:  10.9  seconds
# Market segmentation insights
cat("\n🎯 MARKET SEGMENTATION INSIGHTS:\n")
## 
## 🎯 MARKET SEGMENTATION INSIGHTS:
price_q1 <- quantile(cars$price_num, 0.25, na.rm = TRUE)
price_q3 <- quantile(cars$price_num, 0.75, na.rm = TRUE)

cat("   Budget Segment (bottom 25%): Under $", round(price_q1, 0), "\n")
##    Budget Segment (bottom 25%): Under $ 13574
cat("   Mid-Market (25%-75%): $", round(price_q1, 0), " - $", round(price_q3, 0), "\n")
##    Mid-Market (25%-75%): $ 13574  - $ 33195
cat("   Premium Segment (top 25%): Above $", round(price_q3, 0), "\n")
##    Premium Segment (top 25%): Above $ 33195
# Country analysis
country_mode_freq <- table(cars$country)
top_country <- names(country_mode_freq)[which.max(country_mode_freq)]
cat("\n🌍 GEOGRAPHIC INTELLIGENCE:\n")
## 
## 🌍 GEOGRAPHIC INTELLIGENCE:
cat("   Top Manufacturing Country: ", top_country, "\n")
##    Top Manufacturing Country:  Germany
cat("   Market Share: ", round(max(country_mode_freq)/nrow(cars)*100, 1), "%\n")
##    Market Share:  21.1 %

Chapter 6: Advanced Topics - Distribution Shape and Skewness

6.1 Mathematical Relationship Between Mean, Median, and Mode

The relationship between these three measures reveals crucial information about data distribution shape:

cat("📈 DISTRIBUTION SHAPE ANALYSIS:\n")
## 📈 DISTRIBUTION SHAPE ANALYSIS:
cat("=" , rep("=", 45), "\n", sep="")
## ==============================================
# Calculate measures for different variables
variables <- c("price_num", "maxspeed", "acceleration", "weight")
variable_names <- c("Price", "Max Speed", "Acceleration", "Weight")

for (i in 1:length(variables)) {
  var_data <- cars[[variables[i]]]
  var_mean <- mean(var_data, na.rm = TRUE)
  var_median <- median(var_data, na.rm = TRUE)
  
  cat("\n", variable_names[i], ":\n")
  cat("   Mean: ", round(var_mean, 2), "\n")
  cat("   Median: ", round(var_median, 2), "\n")
  cat("   Difference (Mean - Median): ", round(var_mean - var_median, 2), "\n")
  
  if (abs(var_mean - var_median) < 0.01 * var_median) {
    cat("   Shape: SYMMETRIC (mean ≈ median)\n")
  } else if (var_mean > var_median) {
    cat("   Shape: RIGHT-SKEWED (mean > median)\n")
    cat("   Interpretation: Tail extends toward higher values\n")
  } else {
    cat("   Shape: LEFT-SKEWED (mean < median)\n")
    cat("   Interpretation: Tail extends toward lower values\n")
  }
}
## 
##  Price :
##    Mean:  24267.47 
##    Median:  24177 
##    Difference (Mean - Median):  90.47 
##    Shape: SYMMETRIC (mean ≈ median)
## 
##  Max Speed :
##    Mean:  180.63 
##    Median:  178 
##    Difference (Mean - Median):  2.63 
##    Shape: RIGHT-SKEWED (mean > median)
##    Interpretation: Tail extends toward higher values
## 
##  Acceleration :
##    Mean:  10.84 
##    Median:  10.9 
##    Difference (Mean - Median):  -0.06 
##    Shape: SYMMETRIC (mean ≈ median)
## 
##  Weight :
##    Mean:  1373.85 
##    Median:  1369.5 
##    Difference (Mean - Median):  4.35 
##    Shape: SYMMETRIC (mean ≈ median)

6.2 Coefficient of Skewness

The coefficient of skewness provides a numerical measure of distribution asymmetry:

\[SK = \frac{3(\bar{x} - Me)}{s}\]

Where \(s\) is the standard deviation.

cat("\n📐 COEFFICIENT OF SKEWNESS ANALYSIS:\n")
## 
## 📐 COEFFICIENT OF SKEWNESS ANALYSIS:
cat("=" , rep("=", 45), "\n", sep="")
## ==============================================
# Calculate skewness coefficient for price
price_mean <- mean(cars$price_num, na.rm = TRUE)
price_median <- median(cars$price_num, na.rm = TRUE)
price_sd <- sd(cars$price_num, na.rm = TRUE)

skewness_coeff <- 3 * (price_mean - price_median) / price_sd

cat("Price Distribution Skewness:\n")
## Price Distribution Skewness:
cat("   Mean: $", round(price_mean, 0), "\n")
##    Mean: $ 24267
cat("   Median: $", round(price_median, 0), "\n")
##    Median: $ 24177
cat("   Standard Deviation: $", round(price_sd, 0), "\n")
##    Standard Deviation: $ 12473
cat("   Skewness Coefficient: ", round(skewness_coeff, 3), "\n")
##    Skewness Coefficient:  0.022
if (abs(skewness_coeff) < 0.5) {
  cat("   Interpretation: APPROXIMATELY SYMMETRIC\n")
} else if (skewness_coeff > 0.5) {
  cat("   Interpretation: MODERATELY RIGHT-SKEWED\n")
} else {
  cat("   Interpretation: MODERATELY LEFT-SKEWED\n")
}
##    Interpretation: APPROXIMATELY SYMMETRIC
cat("\n📚 Skewness Coefficient Scale:\n")
## 
## 📚 Skewness Coefficient Scale:
cat("   |SK| < 0.5: Approximately symmetric\n")
##    |SK| < 0.5: Approximately symmetric
cat("   0.5 ≤ |SK| < 1: Moderately skewed\n")
##    0.5 ≤ |SK| < 1: Moderately skewed
cat("   |SK| ≥ 1: Highly skewed\n")
##    |SK| ≥ 1: Highly skewed

Chapter 7: Hands-On Problem Solving Workshop

7.1 Complete Problem Solution: Car Manufacturer Strategic Analysis

# STRATEGIC ANALYSIS: NEW ELECTRIC CAR DEVELOPMENT
cat("🚗 STRATEGIC ANALYSIS: NEW ELECTRIC CAR DEVELOPMENT\n")
## 🚗 STRATEGIC ANALYSIS: NEW ELECTRIC CAR DEVELOPMENT
cat("=============================================================\n")
## =============================================================
cat("SCENARIO: Your company is developing a new electric car. Use statistical analysis\n")
## SCENARIO: Your company is developing a new electric car. Use statistical analysis
cat("to determine optimal specifications that will be competitive in the market.\n\n")
## to determine optimal specifications that will be competitive in the market.
# Check if required dataset and columns exist
if (!exists("cars") || !all(c("price_num", "maxspeed", "acceleration", "country") %in% colnames(cars))) {
  stop("Error: 'cars' dataset is missing or does not contain required columns (price_num, maxspeed, acceleration, country).")
}

# Problem 1: Optimal Price Positioning
cat("📊 PROBLEM 1: OPTIMAL PRICE POSITIONING\n")
## 📊 PROBLEM 1: OPTIMAL PRICE POSITIONING
cat("----------------------------------------------\n")
## ----------------------------------------------
# Calculate central tendency and quartiles
price_stats <- summary(cars$price_num)
price_quartiles <- quantile(cars$price_num, probs = c(0.25, 0.5, 0.75), na.rm = TRUE)

cat("Central Tendency Analysis:\n")
## Central Tendency Analysis:
print(price_stats)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -1860   13574   24177   24267   33195   61570
cat("\nQuartile Analysis:\n")
## 
## Quartile Analysis:
cat("  25th Percentile: $", round(price_quartiles[1], 0), "\n")
##   25th Percentile: $ 13574
cat("  Median: $", round(price_quartiles[2], 0), "\n")
##   Median: $ 24177
cat("  75th Percentile: $", round(price_quartiles[3], 0), "\n")
##   75th Percentile: $ 33195
# Strategic recommendation
price_25 <- price_quartiles[1]
price_75 <- price_quartiles[3]
target_price <- price_quartiles[2]

cat("\n🎯 STRATEGIC RECOMMENDATION:\n")
## 
## 🎯 STRATEGIC RECOMMENDATION:
cat("   Target Price Range: $", round(price_25, 0), " - $", round(price_75, 0), "\n")
##    Target Price Range: $ 13574  - $ 33195
cat("   Optimal Price Point: $", round(target_price, 0), " (median)\n")
##    Optimal Price Point: $ 24177  (median)
cat("   Rationale: Median-based pricing captures the mainstream market.\n")
##    Rationale: Median-based pricing captures the mainstream market.
# Problem 2: Performance Benchmarking
cat("\n\n🏁 PROBLEM 2: PERFORMANCE BENCHMARKING\n")
## 
## 
## 🏁 PROBLEM 2: PERFORMANCE BENCHMARKING
cat("----------------------------------------------\n")
## ----------------------------------------------
# Top 10% performance thresholds
speed_p90 <- quantile(cars$maxspeed, 0.9, na.rm = TRUE)
accel_p10 <- quantile(cars$acceleration, 0.1, na.rm = TRUE)  # Lower is better for acceleration

cat("Performance Targets for Top 10% Market:\n")
## Performance Targets for Top 10% Market:
cat("   Minimum Speed: ", round(speed_p90, 0), " km/h\n")
##    Minimum Speed:  218  km/h
cat("   Maximum Acceleration: ", round(accel_p10, 2), " seconds (0-100 km/h)\n")
##    Maximum Acceleration:  7.28  seconds (0-100 km/h)
# Problem 3: Geographic Market Analysis
cat("\n\n🌍 PROBLEM 3: GEOGRAPHIC MARKET ANALYSIS\n")
## 
## 
## 🌍 PROBLEM 3: GEOGRAPHIC MARKET ANALYSIS
cat("----------------------------------------------\n")
## ----------------------------------------------
# Country frequency table
country_analysis <- data.frame(
  Counts = table(cars$country),
  Proportion = prop.table(table(cars$country)),
  Percentage = prop.table(table(cars$country)) * 100
)

cat("Country Distribution:\n")
## Country Distribution:
print(country_analysis)
##       Counts.Var1 Counts.Freq Proportion.Var1 Proportion.Freq Percentage.Var1
## 1   Asia - others          14   Asia - others      0.07368421   Asia - others
## 2 Europe - others          38 Europe - others      0.20000000 Europe - others
## 3          France          33          France      0.17368421          France
## 4         Germany          40         Germany      0.21052632         Germany
## 5           Italy          19           Italy      0.10000000           Italy
## 6           Japan          33           Japan      0.17368421           Japan
## 7   United States          13   United States      0.06842105   United States
##   Percentage.Freq
## 1        7.368421
## 2       20.000000
## 3       17.368421
## 4       21.052632
## 5       10.000000
## 6       17.368421
## 7        6.842105
# Find modal country and European share
modal_country <- names(table(cars$country))[which.max(table(cars$country))]
european_countries <- c("Germany", "France", "Italy", "Europe - others")
european_share <- sum(cars$country %in% european_countries, na.rm = TRUE) / nrow(cars) * 100

cat("\n📈 Geographic Insights:\n")
## 
## 📈 Geographic Insights:
cat("   Modal Manufacturing Country: ", modal_country, "\n")
##    Modal Manufacturing Country:  Germany
cat("   European Market Share: ", round(european_share, 1), "%\n")
##    European Market Share:  68.4 %
cat("   Strategic Implication: Focus on European manufacturing partnerships.\n")
##    Strategic Implication: Focus on European manufacturing partnerships.

7.2 Interactive Exercise: Student Practice Problems

cat("\n🎓 STUDENT PRACTICE EXERCISES\n")
## 
## 🎓 STUDENT PRACTICE EXERCISES
cat("=" , rep("=", 40), "\n", sep="")
## =========================================
cat("Exercise 1: Manual Calculation Challenge\n")
## Exercise 1: Manual Calculation Challenge
cat("Given airline ticket prices: $80, $120, $150, $90\n")
## Given airline ticket prices: $80, $120, $150, $90
cat("Tasks:\n")
## Tasks:
cat("a) Calculate mean, median, and mode manually\n")
## a) Calculate mean, median, and mode manually
cat("b) Arrange data in ascending order\n")
## b) Arrange data in ascending order
cat("c) Determine distribution shape\n")
## c) Determine distribution shape
# Solution for verification
tickets <- c(80, 120, 150, 90)
cat("\n✅ SOLUTION:\n")
## 
## ✅ SOLUTION:
cat("Sorted data: $", paste(sort(tickets), collapse = ", $"), "\n")
## Sorted data: $ 80, $90, $120, $150
cat("Mean: $", round(mean(tickets), 2), "\n")
## Mean: $ 110
cat("Median: $", median(tickets), "\n")
## Median: $ 105
cat("Mode: No mode (all values appear once)\n")
## Mode: No mode (all values appear once)
cat("Shape: Mean (", round(mean(tickets), 2), ") > Median (", median(tickets), ") → RIGHT-SKEWED\n")
## Shape: Mean ( 110 ) > Median ( 105 ) → RIGHT-SKEWED
cat("\n" , rep("-", 50), "\n", sep="")
## 
## --------------------------------------------------
cat("Exercise 2: Grouped Data Challenge\n")
## Exercise 2: Grouped Data Challenge
cat("Car ownership frequency table:\n")
## Car ownership frequency table:
cat("Cars Owned: 1, 2, 3, 5\n")
## Cars Owned: 1, 2, 3, 5
cat("Frequencies: 32, 48, 16, 4\n")
## Frequencies: 32, 48, 16, 4
cat("Task: Calculate weighted mean\n")
## Task: Calculate weighted mean
# Solution
cars_owned_ex <- c(1, 2, 3, 5)
freq_ex <- c(32, 48, 16, 4)
weighted_mean_ex <- sum(cars_owned_ex * freq_ex) / sum(freq_ex)

cat("\n✅ SOLUTION:\n")
## 
## ✅ SOLUTION:
cat("Weighted Mean = Σ(xi × fi) / Σfi\n")
## Weighted Mean = Σ(xi × fi) / Σfi
cat("             = (1×32 + 2×48 + 3×16 + 5×4) / (32+48+16+4)\n")
##              = (1×32 + 2×48 + 3×16 + 5×4) / (32+48+16+4)
cat("             = (32 + 96 + 48 + 20) / 100\n")
##              = (32 + 96 + 48 + 20) / 100
cat("             = 196 / 100 = ", weighted_mean_ex, " cars per family\n")
##              = 196 / 100 =  1.96  cars per family

Chapter 8: Professional Reporting and Communication

8.1 Executive Summary Template

cat("\n📊 EXECUTIVE SUMMARY: AUTOMOTIVE MARKET ANALYSIS\n")
## 
## 📊 EXECUTIVE SUMMARY: AUTOMOTIVE MARKET ANALYSIS
cat("=" , rep("=", 60), "\n", sep="")
## =============================================================
# Calculate all key statistics
price_summary <- list(
  mean = mean(cars$price_num, na.rm = TRUE),
  median = median(cars$price_num, na.rm = TRUE),
  q1 = quantile(cars$price_num, 0.25, na.rm = TRUE),
  q3 = quantile(cars$price_num, 0.75, na.rm = TRUE)
)

performance_summary <- list(
  speed_mean = mean(cars$maxspeed, na.rm = TRUE),
  speed_median = median(cars$maxspeed, na.rm = TRUE),
  accel_mean = mean(cars$acceleration, na.rm = TRUE),
  accel_median = median(cars$acceleration, na.rm = TRUE)
)

cat("\n🎯 KEY FINDINGS:\n")
## 
## 🎯 KEY FINDINGS:
cat("\n1. PRICE POSITIONING:\n")
## 
## 1. PRICE POSITIONING:
cat("   • Average market price: $", format(round(price_summary$mean, 0), big.mark = ","), "\n")
##    • Average market price: $ 24,267
cat("   • Median market price: $", format(round(price_summary$median, 0), big.mark = ","), "\n")
##    • Median market price: $ 24,177
cat("   • Price distribution: RIGHT-SKEWED (luxury segment drives average up)\n")
##    • Price distribution: RIGHT-SKEWED (luxury segment drives average up)
cat("   • Recommended target: $", format(round(price_summary$median, 0), big.mark = ","), " (median-based pricing)\n")
##    • Recommended target: $ 24,177  (median-based pricing)
cat("\n2. PERFORMANCE BENCHMARKS:\n")
## 
## 2. PERFORMANCE BENCHMARKS:
cat("   • Average top speed: ", round(performance_summary$speed_mean, 0), " km/h\n")
##    • Average top speed:  181  km/h
cat("   • Median acceleration: ", round(performance_summary$accel_median, 2), " seconds (0-100 km/h)\n")
##    • Median acceleration:  10.9  seconds (0-100 km/h)
cat("   • Competitive threshold: ", round(performance_summary$speed_median, 0), " km/h minimum\n")
##    • Competitive threshold:  178  km/h minimum
cat("\n3. MARKET SEGMENTATION:\n")
## 
## 3. MARKET SEGMENTATION:
cat("   • Budget segment (Q1): Under $", format(round(price_summary$q1, 0), big.mark = ","), "\n")
##    • Budget segment (Q1): Under $ 13,574
cat("   • Premium segment (Q3): Above $", format(round(price_summary$q3, 0), big.mark = ","), "\n")
##    • Premium segment (Q3): Above $ 33,195
cat("   • Target segment: Mid-market ($", format(round(price_summary$q1, 0), big.mark = ","), 
    " - $", format(round(price_summary$q3, 0), big.mark = ","), ")\n")
##    • Target segment: Mid-market ($ 13,574  - $ 33,195 )
# Geographic analysis
top_countries <- names(sort(table(cars$country), decreasing = TRUE))[1:3]
cat("\n4. GEOGRAPHIC OPPORTUNITIES:\n")
## 
## 4. GEOGRAPHIC OPPORTUNITIES:
cat("   • Leading manufacturers: ", paste(top_countries, collapse = ", "), "\n")
##    • Leading manufacturers:  Germany, Europe - others, France
cat("   • European dominance: ", round(european_share, 1), "% market share\n")
##    • European dominance:  68.4 % market share
cat("   • Strategic focus: European partnerships and manufacturing\n")
##    • Strategic focus: European partnerships and manufacturing

8.2 Technical Methodology Report

cat("\n\n📋 TECHNICAL METHODOLOGY REPORT\n")
## 
## 
## 📋 TECHNICAL METHODOLOGY REPORT
cat("=" , rep("=", 50), "\n", sep="")
## ===================================================
cat("\n🔬 STATISTICAL METHODS EMPLOYED:\n")
## 
## 🔬 STATISTICAL METHODS EMPLOYED:
cat("\n1. MEASURES OF CENTRAL TENDENCY:\n")
## 
## 1. MEASURES OF CENTRAL TENDENCY:
cat("   • Arithmetic Mean: Σxi/n for symmetric distributions\n")
##    • Arithmetic Mean: Σxi/n for symmetric distributions
cat("   • Median: Middle value for skewed distributions\n")
##    • Median: Middle value for skewed distributions
cat("   • Mode: Most frequent value for categorical analysis\n")
##    • Mode: Most frequent value for categorical analysis
cat("\n2. DATA QUALITY ASSESSMENT:\n")
## 
## 2. DATA QUALITY ASSESSMENT:
cat("   • Sample size: ", nrow(cars), " car models\n")
##    • Sample size:  190  car models
cat("   • Geographic coverage: ", length(unique(cars$country)), " countries/regions\n")
##    • Geographic coverage:  7  countries/regions
cat("   • Missing values: Handled using na.rm = TRUE\n")
##    • Missing values: Handled using na.rm = TRUE
cat("   • Outlier detection: Visual inspection via boxplots\n")
##    • Outlier detection: Visual inspection via boxplots
cat("\n3. DISTRIBUTION ANALYSIS:\n")
## 
## 3. DISTRIBUTION ANALYSIS:
cat("   • Skewness assessment: Mean vs. Median comparison\n")
##    • Skewness assessment: Mean vs. Median comparison
cat("   • Shape determination: Visual and numerical methods\n")
##    • Shape determination: Visual and numerical methods
cat("   • Quartile analysis: Market segmentation insights\n")
##    • Quartile analysis: Market segmentation insights
cat("\n4. BUSINESS APPLICATIONS:\n")
## 
## 4. BUSINESS APPLICATIONS:
cat("   • Price strategy: Median-based positioning\n")
##    • Price strategy: Median-based positioning
cat("   • Performance targets: Percentile benchmarking\n")
##    • Performance targets: Percentile benchmarking
cat("   • Market analysis: Frequency-based insights\n")
##    • Market analysis: Frequency-based insights

Chapter 9: Advanced Applications and Extensions

9.1 Comparative Analysis Framework

cat("\n🔍 COMPARATIVE ANALYSIS: GERMAN vs JAPANESE MANUFACTURERS\n")
## 
## 🔍 COMPARATIVE ANALYSIS: GERMAN vs JAPANESE MANUFACTURERS
cat("=" , rep("=", 65), "\n", sep="")
## ==================================================================
# Filter data by country
german_cars <- cars[cars$country == "Germany", ]
japanese_cars <- cars[cars$country == "Japan", ]

cat("Sample sizes:\n")
## Sample sizes:
cat("   German manufacturers: ", nrow(german_cars), " models\n")
##    German manufacturers:  40  models
cat("   Japanese manufacturers: ", nrow(japanese_cars), " models\n")
##    Japanese manufacturers:  33  models
# Price comparison
cat("\n💰 PRICE COMPARISON:\n")
## 
## 💰 PRICE COMPARISON:
german_price_stats <- c(
  mean = mean(german_cars$price_num, na.rm = TRUE),
  median = median(german_cars$price_num, na.rm = TRUE)
)
japanese_price_stats <- c(
  mean = mean(japanese_cars$price_num, na.rm = TRUE),
  median = median(japanese_cars$price_num, na.rm = TRUE)
)

cat("German cars:\n")
## German cars:
cat("   Mean: $", round(german_price_stats["mean"], 0), "\n")
##    Mean: $ 24729
cat("   Median: $", round(german_price_stats["median"], 0), "\n")
##    Median: $ 26708
cat("Japanese cars:\n")
## Japanese cars:
cat("   Mean: $", round(japanese_price_stats["mean"], 0), "\n")
##    Mean: $ 23138
cat("   Median: $", round(japanese_price_stats["median"], 0), "\n")
##    Median: $ 21206
# Performance comparison
cat("\n🏁 PERFORMANCE COMPARISON:\n")
## 
## 🏁 PERFORMANCE COMPARISON:
german_speed <- mean(german_cars$maxspeed, na.rm = TRUE)
japanese_speed <- mean(japanese_cars$maxspeed, na.rm = TRUE)
german_accel <- mean(german_cars$acceleration, na.rm = TRUE)
japanese_accel <- mean(japanese_cars$acceleration, na.rm = TRUE)

cat("Average Top Speed:\n")
## Average Top Speed:
cat("   German: ", round(german_speed, 1), " km/h\n")
##    German:  185.4  km/h
cat("   Japanese: ", round(japanese_speed, 1), " km/h\n")
##    Japanese:  184.3  km/h
cat("Average Acceleration:\n")
## Average Acceleration:
cat("   German: ", round(german_accel, 2), " seconds\n")
##    German:  10.57  seconds
cat("   Japanese: ", round(japanese_accel, 2), " seconds\n")
##    Japanese:  10.22  seconds
# Strategic insights
cat("\n🎯 STRATEGIC INSIGHTS:\n")
## 
## 🎯 STRATEGIC INSIGHTS:
if (german_price_stats["median"] > japanese_price_stats["median"]) {
  cat("   • German cars positioned as premium (higher median price)\n")
} else {
  cat("   • Japanese cars positioned as premium (higher median price)\n")
}
##    • German cars positioned as premium (higher median price)
if (german_speed > japanese_speed) {
  cat("   • German manufacturers focus on performance (higher average speed)\n")
} else {
  cat("   • Japanese manufacturers focus on performance (higher average speed)\n")
}
##    • German manufacturers focus on performance (higher average speed)

9.2 Market Segmentation Analysis

cat("\n\n📊 ADVANCED MARKET SEGMENTATION ANALYSIS\n")
## 
## 
## 📊 ADVANCED MARKET SEGMENTATION ANALYSIS
cat("=" , rep("=", 55), "\n", sep="")
## ========================================================
# Create price segments based on quartiles
price_q1 <- quantile(cars$price_num, 0.25, na.rm = TRUE)
price_q2 <- quantile(cars$price_num, 0.50, na.rm = TRUE)
price_q3 <- quantile(cars$price_num, 0.75, na.rm = TRUE)

# Segment the market
cars$price_segment <- cut(cars$price_num, 
                         breaks = c(0, price_q1, price_q2, price_q3, Inf),
                         labels = c("Budget", "Economy", "Mid-Market", "Premium"),
                         include.lowest = TRUE)

# Analyze each segment
segment_analysis <- table(cars$price_segment)
segment_props <- prop.table(segment_analysis) * 100

cat("Market Segmentation by Price Quartiles:\n")
## Market Segmentation by Price Quartiles:
for(i in 1:length(segment_analysis)) {
  segment_name <- names(segment_analysis)[i]
  count <- segment_analysis[i]
  percentage <- segment_props[i]
  
  cat("   ", segment_name, ": ", count, " models (", round(percentage, 1), "%)\n")
}
##     Budget :  46  models ( 24.5 %)
##     Economy :  47  models ( 25 %)
##     Mid-Market :  47  models ( 25 %)
##     Premium :  48  models ( 25.5 %)
# Performance characteristics by segment
cat("\n🏎️ PERFORMANCE BY SEGMENT:\n")
## 
## 🏎️ PERFORMANCE BY SEGMENT:
for(segment in names(segment_analysis)) {
  segment_cars <- cars[cars$price_segment == segment & !is.na(cars$price_segment), ]
  avg_speed <- mean(segment_cars$maxspeed, na.rm = TRUE)
  avg_accel <- mean(segment_cars$acceleration, na.rm = TRUE)
  
  cat("   ", segment, " segment:\n")
  cat("     Average speed: ", round(avg_speed, 1), " km/h\n")
  cat("     Average acceleration: ", round(avg_accel, 2), " seconds\n")
}
##     Budget  segment:
##      Average speed:  180.3  km/h
##      Average acceleration:  11.72  seconds
##     Economy  segment:
##      Average speed:  181.3  km/h
##      Average acceleration:  10.96  seconds
##     Mid-Market  segment:
##      Average speed:  178.9  km/h
##      Average acceleration:  10.31  seconds
##     Premium  segment:
##      Average speed:  182.5  km/h
##      Average acceleration:  10.36  seconds

Chapter 10: Practical Exercises and Case Studies

10.1 Complete Case Study: Electric Vehicle Market Entry

# CASE STUDY: ELECTRIC VEHICLE MARKET ENTRY STRATEGY
cat("\n🔋 CASE STUDY: ELECTRIC VEHICLE MARKET ENTRY STRATEGY\n")
## 
## 🔋 CASE STUDY: ELECTRIC VEHICLE MARKET ENTRY STRATEGY
cat("============================================================\n")
## ============================================================
cat("SCENARIO: A new electric vehicle startup needs to position their first model\n")
## SCENARIO: A new electric vehicle startup needs to position their first model
cat("in the European market. Use statistical analysis to recommend specifications.\n\n")
## in the European market. Use statistical analysis to recommend specifications.
# Check if required dataset and columns exist
if (!exists("cars") || !all(c("price_num", "maxspeed", "acceleration") %in% colnames(cars))) {
  stop("Error: 'cars' dataset is missing or does not contain required columns (price_num, maxspeed, acceleration).")
}

# Step 1: Market Positioning Analysis
cat("STEP 1: MARKET POSITIONING ANALYSIS\n")
## STEP 1: MARKET POSITIONING ANALYSIS
cat("----------------------------------------\n")
## ----------------------------------------
# Calculate key statistics for decision making
price_stats_complete <- summary(cars$price_num)
speed_percentiles <- quantile(cars$maxspeed, c(0.25, 0.5, 0.75, 0.9), na.rm = TRUE)
accel_percentiles <- quantile(cars$acceleration, c(0.1, 0.25, 0.5, 0.75), na.rm = TRUE)

cat("Current Market Statistics:\n")
## Current Market Statistics:
print(price_stats_complete)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -1860   13574   24177   24267   33195   61570
cat("\nPerformance Benchmarks:\n")
## 
## Performance Benchmarks:
cat("Speed Percentiles (km/h):\n")
## Speed Percentiles (km/h):
cat("   25th percentile: ", round(speed_percentiles[1], 0), "\n")
##    25th percentile:  159
cat("   50th percentile: ", round(speed_percentiles[2], 0), "\n")
##    50th percentile:  178
cat("   75th percentile: ", round(speed_percentiles[3], 0), "\n")
##    75th percentile:  203
cat("   90th percentile: ", round(speed_percentiles[4], 0), "\n")
##    90th percentile:  218
cat("\nAcceleration Percentiles (seconds):\n")
## 
## Acceleration Percentiles (seconds):
cat("   10th percentile: ", round(accel_percentiles[1], 2), " (top 10% performance)\n")
##    10th percentile:  7.28  (top 10% performance)
cat("   25th percentile: ", round(accel_percentiles[2], 2), "\n")
##    25th percentile:  8.83
cat("   50th percentile: ", round(accel_percentiles[3], 2), "\n")
##    50th percentile:  10.9
cat("   75th percentile: ", round(accel_percentiles[4], 2), "\n")
##    75th percentile:  12.97
# Step 2: Competitive Positioning Recommendations
cat("\n\nSTEP 2: STRATEGIC RECOMMENDATIONS\n")
## 
## 
## STEP 2: STRATEGIC RECOMMENDATIONS
cat("----------------------------------------\n")
## ----------------------------------------
target_price <- median(cars$price_num, na.rm = TRUE)
target_speed <- speed_percentiles[3]  # 75th percentile for competitive advantage
target_accel <- accel_percentiles[2]  # 25th percentile (lower is better)

cat("🎯 RECOMMENDED SPECIFICATIONS:\n")
## 🎯 RECOMMENDED SPECIFICATIONS:
cat("   Target Price: $", format(round(target_price, 0), big.mark = ","), "\n")
##    Target Price: $ 24,177
cat("   Rationale: Median pricing captures mainstream market\n\n")
##    Rationale: Median pricing captures mainstream market
cat("   Target Max Speed: ", round(target_speed, 0), " km/h\n")
##    Target Max Speed:  203  km/h
cat("   Rationale: 75th percentile ensures competitive performance\n\n")
##    Rationale: 75th percentile ensures competitive performance
cat("   Target Acceleration: ", round(target_accel, 2), " seconds (0-100 km/h)\n")
##    Target Acceleration:  8.83  seconds (0-100 km/h)
cat("   Rationale: 25th percentile provides above-average performance\n")
##    Rationale: 25th percentile provides above-average performance
# Step 3: Market Opportunity Analysis
cat("\n\nSTEP 3: MARKET OPPORTUNITY ANALYSIS\n")
## 
## 
## STEP 3: MARKET OPPORTUNITY ANALYSIS
cat("----------------------------------------\n")
## ----------------------------------------
# Identify underserved segments
affordable_performance <- sum(cars$price_num <= target_price & 
                             cars$maxspeed >= target_speed & 
                             cars$acceleration <= target_accel, na.rm = TRUE)
total_models <- nrow(cars)
opportunity_gap <- (total_models - affordable_performance) / total_models * 100

cat("Market Gap Analysis:\n")
## Market Gap Analysis:
cat("   Models meeting all criteria: ", affordable_performance, " out of ", total_models, "\n")
##    Models meeting all criteria:  3  out of  190
cat("   Market opportunity: ", round(opportunity_gap, 1), "% of market underserved\n")
##    Market opportunity:  98.4 % of market underserved
cat("   Strategic implication: Significant opportunity for well-positioned EV\n")
##    Strategic implication: Significant opportunity for well-positioned EV

10.2 Student Assessment Problems

cat("\n\n🎓 STUDENT ASSESSMENT PROBLEMS\n")
## 
## 
## 🎓 STUDENT ASSESSMENT PROBLEMS
cat("=" , rep("=", 45), "\n", sep="")
## ==============================================
cat("PROBLEM SET A: CALCULATION MASTERY\n")
## PROBLEM SET A: CALCULATION MASTERY
cat("-" , rep("-", 35), "\n", sep="")
## ------------------------------------
cat("Problem A1: Given the following car prices (in thousands):\n")
## Problem A1: Given the following car prices (in thousands):
cat("$22, $18, $35, $28, $25, $45, $30, $22\n")
## $22, $18, $35, $28, $25, $45, $30, $22
cat("Tasks:\n")
## Tasks:
cat("a) Calculate the arithmetic mean\n")
## a) Calculate the arithmetic mean
cat("b) Find the median\n")
## b) Find the median
cat("c) Identify the mode\n")
## c) Identify the mode
cat("d) Determine if the distribution is skewed\n")
## d) Determine if the distribution is skewed
# Solution for instructors
sample_prices_prob <- c(22, 18, 35, 28, 25, 45, 30, 22)
cat("\n✅ INSTRUCTOR SOLUTIONS:\n")
## 
## ✅ INSTRUCTOR SOLUTIONS:
cat("a) Mean: $", round(mean(sample_prices_prob), 2), "k\n")
## a) Mean: $ 28.12 k
cat("b) Median: $", median(sample_prices_prob), "k\n")
## b) Median: $ 26.5 k
mode_freq <- table(sample_prices_prob)
mode_val <- as.numeric(names(mode_freq)[which.max(mode_freq)])
cat("c) Mode: $", mode_val, "k (appears ", max(mode_freq), " times)\n")
## c) Mode: $ 22 k (appears  2  times)
if (mean(sample_prices_prob) > median(sample_prices_prob)) {
  cat("d) RIGHT-SKEWED (mean > median)\n")
} else if (mean(sample_prices_prob) < median(sample_prices_prob)) {
  cat("d) LEFT-SKEWED (mean < median)\n")
} else {
  cat("d) SYMMETRIC (mean ≈ median)\n")
}
## d) RIGHT-SKEWED (mean > median)
cat("\n" , rep("-", 50), "\n", sep="")
## 
## --------------------------------------------------
cat("PROBLEM SET B: APPLIED ANALYSIS\n")
## PROBLEM SET B: APPLIED ANALYSIS
cat("-" , rep("-", 35), "\n", sep="")
## ------------------------------------
cat("Problem B1: Market Research Scenario\n")
## Problem B1: Market Research Scenario
cat("A car dealership surveys customer satisfaction ratings (1-10 scale):\n")
## A car dealership surveys customer satisfaction ratings (1-10 scale):
cat("Ratings: 8, 9, 7, 10, 8, 6, 9, 8, 7, 9, 8, 10, 7, 8, 9\n")
## Ratings: 8, 9, 7, 10, 8, 6, 9, 8, 7, 9, 8, 10, 7, 8, 9
cat("Questions:\n")
## Questions:
cat("1) What is the most appropriate measure of central tendency?\n")
## 1) What is the most appropriate measure of central tendency?
cat("2) Calculate that measure\n")
## 2) Calculate that measure
cat("3) What does this tell us about customer satisfaction?\n")
## 3) What does this tell us about customer satisfaction?
# Solution
satisfaction <- c(8, 9, 7, 10, 8, 6, 9, 8, 7, 9, 8, 10, 7, 8, 9)
cat("\n✅ SOLUTIONS:\n")
## 
## ✅ SOLUTIONS:
cat("1) MODE is most appropriate (ordinal scale, customer preference)\n")
## 1) MODE is most appropriate (ordinal scale, customer preference)
satisfaction_mode <- as.numeric(names(table(satisfaction))[which.max(table(satisfaction))])
cat("2) Mode = ", satisfaction_mode, " (appears ", max(table(satisfaction)), " times)\n")
## 2) Mode =  8  (appears  5  times)
cat("3) Most customers rate satisfaction as ", satisfaction_mode, "/10 - good performance\n")
## 3) Most customers rate satisfaction as  8 /10 - good performance
cat("   Mean = ", round(mean(satisfaction), 2), " confirms positive satisfaction\n")
##    Mean =  8.2  confirms positive satisfaction

Chapter 11: Summary and Key Takeaways

11.1 Conceptual Framework Summary

cat("\n📚 LECTURE 2 CONCEPTUAL FRAMEWORK SUMMARY\n")
## 
## 📚 LECTURE 2 CONCEPTUAL FRAMEWORK SUMMARY
cat("=" , rep("=", 55), "\n", sep="")
## ========================================================
cat("\n🎯 CORE CONCEPTS MASTERED:\n")
## 
## 🎯 CORE CONCEPTS MASTERED:
cat("\n1. ARITHMETIC MEAN (x̄):\n")
## 
## 1. ARITHMETIC MEAN (x̄):
cat("   • Formula: Σxi/n (raw data) or Σ(fi×xi)/Σfi (grouped data)\n")
##    • Formula: Σxi/n (raw data) or Σ(fi×xi)/Σfi (grouped data)
cat("   • Use: Symmetric distributions, mathematical precision\n")
##    • Use: Symmetric distributions, mathematical precision
cat("   • Properties: Sensitive to outliers, algebraically manipulable\n")
##    • Properties: Sensitive to outliers, algebraically manipulable
cat("\n2. MEDIAN (Me):\n")
## 
## 2. MEDIAN (Me):
cat("   • Definition: Middle value in ordered dataset\n")
##    • Definition: Middle value in ordered dataset
cat("   • Use: Skewed distributions, robust measure\n")
##    • Use: Skewed distributions, robust measure
cat("   • Properties: Resistant to outliers, positional measure\n")
##    • Properties: Resistant to outliers, positional measure
cat("\n3. MODE (Mo):\n")
## 
## 3. MODE (Mo):
cat("   • Definition: Most frequently occurring value\n")
##    • Definition: Most frequently occurring value
cat("   • Use: Categorical data, popularity analysis\n")
##    • Use: Categorical data, popularity analysis
cat("   • Properties: Can have multiple modes or no mode\n")
##    • Properties: Can have multiple modes or no mode
cat("\n4. DISTRIBUTION SHAPE:\n")
## 
## 4. DISTRIBUTION SHAPE:
cat("   • Symmetric: Mean ≈ Median ≈ Mode\n")
##    • Symmetric: Mean ≈ Median ≈ Mode
cat("   • Right-skewed: Mean > Median > Mode\n")
##    • Right-skewed: Mean > Median > Mode
cat("   • Left-skewed: Mode > Median > Mean\n")
##    • Left-skewed: Mode > Median > Mean
cat("\n🔧 TECHNICAL SKILLS DEVELOPED:\n")
## 
## 🔧 TECHNICAL SKILLS DEVELOPED:
cat("   ✓ Manual calculation of all central tendency measures\n")
##    ✓ Manual calculation of all central tendency measures
cat("   ✓ Application of formulas to grouped and ungrouped data\n")
##    ✓ Application of formulas to grouped and ungrouped data
cat("   ✓ UBStats package proficiency\n")
##    ✓ UBStats package proficiency
cat("   ✓ Distribution shape analysis\n")
##    ✓ Distribution shape analysis
cat("   ✓ Business interpretation of statistical results\n")
##    ✓ Business interpretation of statistical results

11.2 Next Lecture Preview

cat("\n\n🔮 PREVIEW: LECTURE 3 - MEASURES OF VARIABILITY\n")
## 
## 
## 🔮 PREVIEW: LECTURE 3 - MEASURES OF VARIABILITY
cat("=" , rep("=", 55), "\n", sep="")
## ========================================================
cat("Coming up in our next session:\n")
## Coming up in our next session:
cat("\n📊 MEASURES OF SPREAD:\n")
## 
## 📊 MEASURES OF SPREAD:
cat("   • Range and Interquartile Range (IQR)\n")
##    • Range and Interquartile Range (IQR)
cat("   • Variance and Standard Deviation\n")
##    • Variance and Standard Deviation
cat("   • Coefficient of Variation\n")
##    • Coefficient of Variation
cat("   • Outlier detection methods\n")
##    • Outlier detection methods
cat("\n🎯 ADVANCED TOPICS:\n")
## 
## 🎯 ADVANCED TOPICS:
cat("   • Risk assessment in financial contexts\n")
##    • Risk assessment in financial contexts
cat("   • Quality control applications\n")
##    • Quality control applications
cat("   • Comparative variability analysis\n")
##    • Comparative variability analysis
cat("   • Five-number summary and boxplots\n")
##    • Five-number summary and boxplots
cat("\n💼 BUSINESS APPLICATIONS:\n")
## 
## 💼 BUSINESS APPLICATIONS:
cat("   • Investment risk analysis\n")
##    • Investment risk analysis
cat("   • Manufacturing quality control\n")
##    • Manufacturing quality control
cat("   • Market volatility assessment\n")
##    • Market volatility assessment
cat("   • Performance consistency evaluation\n")
##    • Performance consistency evaluation
cat("\n📋 PREPARATION TASKS:\n")
## 
## 📋 PREPARATION TASKS:
cat("   1. Review central tendency concepts from today\n")
##    1. Review central tendency concepts from today
cat("   2. Practice manual calculations with small datasets\n")
##    2. Practice manual calculations with small datasets
cat("   3. Familiarize yourself with variance formula\n")
##    3. Familiarize yourself with variance formula
cat("   4. Think about real-world examples of variability\n")
##    4. Think about real-world examples of variability

11.3 Final Practical Exercise

cat("\n\n🏆 CAPSTONE EXERCISE: COMPREHENSIVE ANALYSIS\n")
## 
## 
## 🏆 CAPSTONE EXERCISE: COMPREHENSIVE ANALYSIS
cat("=" , rep("=", 55), "\n", sep="")
## ========================================================
cat("SCENARIO: You are presenting to the board of directors about market positioning\n")
## SCENARIO: You are presenting to the board of directors about market positioning
cat("for a new luxury car model. Prepare a complete statistical brief.\n\n")
## for a new luxury car model. Prepare a complete statistical brief.
# Select luxury cars (top 25% by price)
luxury_threshold <- quantile(cars$price_num, 0.75, na.rm = TRUE)
luxury_cars <- cars[cars$price_num >= luxury_threshold, ]

cat("LUXURY MARKET ANALYSIS (Top 25% by Price):\n")
## LUXURY MARKET ANALYSIS (Top 25% by Price):
cat("Threshold: $", format(round(luxury_threshold, 0), big.mark = ","), "\n")
## Threshold: $ 33,195
cat("Sample size: ", nrow(luxury_cars), " models\n")
## Sample size:  48  models
# Complete analysis
luxury_stats <- list(
  price_mean = mean(luxury_cars$price_num, na.rm = TRUE),
  price_median = median(luxury_cars$price_num, na.rm = TRUE),
  speed_mean = mean(luxury_cars$maxspeed, na.rm = TRUE),
  speed_median = median(luxury_cars$maxspeed, na.rm = TRUE),
  accel_mean = mean(luxury_cars$acceleration, na.rm = TRUE),
  accel_median = median(luxury_cars$acceleration, na.rm = TRUE)
)

cat("\n📊 LUXURY SEGMENT CHARACTERISTICS:\n")
## 
## 📊 LUXURY SEGMENT CHARACTERISTICS:
cat("Price Statistics:\n")
## Price Statistics:
cat("   Mean: $", format(round(luxury_stats$price_mean, 0), big.mark = ","), "\n")
##    Mean: $ 40,717
cat("   Median: $", format(round(luxury_stats$price_median, 0), big.mark = ","), "\n")
##    Median: $ 38,904
cat("Performance Statistics:\n")
## Performance Statistics:
cat("   Average Speed: ", round(luxury_stats$speed_mean, 1), " km/h\n")
##    Average Speed:  182.5  km/h
cat("   Median Acceleration: ", round(luxury_stats$accel_median, 2), " seconds\n")
##    Median Acceleration:  10.1  seconds
# Country analysis for luxury segment
luxury_countries <- table(luxury_cars$country)
top_luxury_country <- names(luxury_countries)[which.max(luxury_countries)]

cat("Geographic Distribution:\n")
## Geographic Distribution:
cat("   Leading luxury manufacturer: ", top_luxury_country, "\n")
##    Leading luxury manufacturer:  Germany
cat("   Market share: ", round(max(luxury_countries)/nrow(luxury_cars)*100, 1), "%\n")
##    Market share:  27.1 %
cat("\n🎯 STRATEGIC RECOMMENDATIONS:\n")
## 
## 🎯 STRATEGIC RECOMMENDATIONS:
cat("   • Target price: $", format(round(luxury_stats$price_median, -3), big.mark = ","), "\n")
##    • Target price: $ 39,000
cat("   • Minimum speed: ", round(luxury_stats$speed_median, 0), " km/h\n")
##    • Minimum speed:  185  km/h
cat("   • Maximum acceleration: ", round(luxury_stats$accel_median, 1), " seconds\n")
##    • Maximum acceleration:  10.1  seconds
cat("   • Benchmark against: ", top_luxury_country, " manufacturers\n")
##    • Benchmark against:  Germany  manufacturers
cat("\n📈 SUCCESS METRICS:\n")
## 
## 📈 SUCCESS METRICS:
cat("   Price positioning within luxury segment median ±10%\n")
##    Price positioning within luxury segment median ±10%
cat("   Performance specs matching or exceeding segment averages\n")
##    Performance specs matching or exceeding segment averages
cat("   Quality standards aligned with ", top_luxury_country, " benchmarks\n")
##    Quality standards aligned with  Germany  benchmarks

Conclusion

Learning Outcomes Achieved

Today’s comprehensive journey through descriptive statistics has equipped you with:

  • Mathematical Foundation: Precise understanding of central tendency formulas and calculations
  • Practical Application: Real-world business problem-solving using statistical measures
  • Technical Proficiency: Hands-on experience with R and UBStats package
  • Strategic Thinking: Converting statistical insights into actionable business recommendations
  • Professional Communication: Presenting statistical findings to diverse audiences

The Statistical Mindset

Remember that statistics is not just about numbers—it’s about understanding patterns, making informed decisions, and communicating insights effectively. Every measure we calculate tells a story about our data, and every story guides important business decisions.

As we continue our statistical journey, you’ll discover that today’s foundation in central tendency naturally leads to questions about variability, relationships between variables, and ultimately, to the powerful world of inferential statistics.

Next Session: We’ll explore how spread and variability around these central measures reveal even deeper insights about market dynamics, risk assessment, and competitive positioning.


“In God we trust. All others must bring data.” - W. Edwards Deming