Chapter 1: Introduction - Completing the Statistical Picture

Welcome Back!

In our previous lectures, we learned how to find the “typical” value in our data using measures of central tendency (mean, median, mode). But that’s only half the story! Today we complete the picture by understanding:

How spread out our data is (measures of variability)
How two variables relate to each other (bivariate analysis)

Think of it this way: if central tendency tells us the “average student grade,” variability tells us whether all students performed similarly or if there’s a huge range from failing to excellent grades.

The Restaurant Analogy - Why Variability Matters

Imagine two pizza delivery restaurants:

Restaurant A: Delivery times: 28, 29, 30, 31, 32 minutes (Average: 30 minutes)
Restaurant B: Delivery times: 10, 20, 30, 40, 50 minutes (Average: 30 minutes)

Both have the same average, but which would you prefer? Restaurant A is predictable and reliable, while Restaurant B is unpredictable. This is why we need measures of variability!

# Load required libraries
library(UBStats)

# Use built-in mtcars dataset and modify it for our examples
cars <- mtcars

# Add some columns to match our examples
cars$country <- sample(c("Germany", "Japan", "France", "Italy", "United States"), 
                      nrow(cars), replace=TRUE, 
                      prob=c(0.3, 0.25, 0.15, 0.15, 0.15))

cars$price_num <- cars$hp * 100 + rnorm(nrow(cars), 15000, 5000)
cars$price_num <- pmax(cars$price_num, 10000)  # Minimum price $10,000

cars$price_classes <- cut(cars$price_num, 
                         breaks=c(0, 20000, 35000, Inf),
                         labels=c("low", "mid", "high"))

cars$maxspeed <- cars$hp * 2.5 + rnorm(nrow(cars), 50, 15)
cars$acceleration <- 20 - (cars$hp/20) + rnorm(nrow(cars), 0, 2)
cars$weight <- cars$wt * 500
cars$sales <- rpois(nrow(cars), 8000)

# Let's start our journey!
cat("🚗 Today we'll analyze", nrow(cars), "car models\n")

## 🚗 Today we'll analyze 32 car models

cat("📊 We'll explore both variability and relationships between variables\n")

## 📊 We'll explore both variability and relationships between variables

# Let's start our journey!
cat("🚗 Today we'll analyze", nrow(cars), "car models\n")

## 🚗 Today we'll analyze 32 car models

cat("📊 We'll explore both variability and relationships between variables\n")

## 📊 We'll explore both variability and relationships between variables

Chapter 2: Range and Interquartile Range - Simple but Important

2.1 Range - The Distance Between Extremes

The range is the simplest measure of spread:

\[\text{Range} = x_{\text{max}} - x_{\text{min}}\]

Example 1 from Your Notes: Two Samples

Let’s work through Example 1 from your PDF exactly as written:

Sample 1: 1, 2, 3, 4, 5, 6, 7
Sample 2: 1, 2, 3, 4, 5, 6, 100

# Example 1 from PDF - exactly as in your notes
sample1 <- c(1, 2, 3, 4, 5, 6, 7)
sample2 <- c(1, 2, 3, 4, 5, 6, 100)

cat("📋 Example 1 from Your Notes:\n")

## 📋 Example 1 from Your Notes:

cat("Sample 1:", paste(sample1, collapse=", "), "\n")

## Sample 1: 1, 2, 3, 4, 5, 6, 7

cat("Sample 2:", paste(sample2, collapse=", "), "\n\n")

## Sample 2: 1, 2, 3, 4, 5, 6, 100

# Calculate ranges
range1 <- max(sample1) - min(sample1)
range2 <- max(sample2) - min(sample2)

cat("🔢 Range Calculations:\n")

## 🔢 Range Calculations:

cat("Sample 1 Range: 7 - 1 =", range1, "\n")

## Sample 1 Range: 7 - 1 = 6

cat("Sample 2 Range: 100 - 1 =", range2, "\n\n")

## Sample 2 Range: 100 - 1 = 99

cat("🤔 What happened? One outlier (100) made the range jump from 6 to 99!\n")

## 🤔 What happened? One outlier (100) made the range jump from 6 to 99!

2.2 Interquartile Range (IQR) - The Robust Alternative

The IQR measures the spread of the middle 50% of data:

\[\text{IQR} = Q_3 - Q_1\]

# Calculate IQR for both samples
cat("📊 IQR Calculations:\n")

## 📊 IQR Calculations:

# For Sample 1
q1_s1 <- quantile(sample1, 0.25)  # Q1
q3_s1 <- quantile(sample1, 0.75)  # Q3
iqr1 <- q3_s1 - q1_s1

cat("Sample 1: Q1 =", q1_s1, ", Q3 =", q3_s1, ", IQR =", iqr1, "\n")

## Sample 1: Q1 = 2.5 , Q3 = 5.5 , IQR = 3

# For Sample 2
q1_s2 <- quantile(sample2, 0.25)  # Q1
q3_s2 <- quantile(sample2, 0.75)  # Q3
iqr2 <- q3_s2 - q1_s2

cat("Sample 2: Q1 =", q1_s2, ", Q3 =", q3_s2, ", IQR =", iqr2, "\n\n")

## Sample 2: Q1 = 2.5 , Q3 = 5.5 , IQR = 3

cat("✨ Amazing! The IQR stayed the same because it focuses on the middle 50%\n")

## ✨ Amazing! The IQR stayed the same because it focuses on the middle 50%

cat("   and ignores extreme outliers!\n")

##    and ignores extreme outliers!

# Visualize the difference
par(mfrow=c(1,2))
boxplot(sample1, main="Sample 1: No Outliers", ylab="Values", col="lightblue")
boxplot(sample2, main="Sample 2: With Outlier", ylab="Values", col="lightcoral")

par(mfrow=c(1,1))

Key Learning: IQR is “robust” - it’s not affected by extreme outliers, making it more reliable for describing typical variability.

Chapter 3: Variance - The Mathematical Foundation

Now we get to the heart of variability - variance. This is where all the formulas from your PDF come into play!

3.1 Understanding Variance Conceptually

Variance measures the average squared distance from the mean. Think of it as asking: “On average, how far away are my data points from the center?”

Why Do We Square the Deviations?

Let me show you why we can’t just average the deviations:

# Simple example: 2, 4, 6 (mean = 4)
simple_data <- c(2, 4, 6)
mean_val <- mean(simple_data)
deviations <- simple_data - mean_val

cat("📈 Why We Need to Square Deviations:\n")

## 📈 Why We Need to Square Deviations:

cat("Data:", paste(simple_data, collapse=", "), "\n")

## Data: 2, 4, 6

cat("Mean:", mean_val, "\n")

## Mean: 4

cat("Deviations:", paste(deviations, collapse=", "), "\n")

## Deviations: -2, 0, 2

cat("Sum of deviations:", sum(deviations), "\n\n")

## Sum of deviations: 0

cat("😱 The deviations always sum to zero! Positive and negative cancel out.\n")

## 😱 The deviations always sum to zero! Positive and negative cancel out.

cat("💡 Solution: Square them to make all values positive!\n")

## 💡 Solution: Square them to make all values positive!

squared_devs <- deviations^2
cat("Squared deviations:", paste(squared_devs, collapse=", "), "\n")

## Squared deviations: 4, 0, 4

cat("Now we can average them:", mean(squared_devs), "\n")

## Now we can average them: 2.666667

3.2 Population Variance - All Formulas from Your PDF

Your PDF shows multiple formulas for population variance. Let me explain every single one:

Original Formula

\[\sigma^2 = \frac{1}{N}\sum_{i=1}^{N}(x_i - \mu)^2\]

Shortcut Formula

\[\sigma^2 = \frac{\sum_{i=1}^{N} x_i^2}{N} - \mu^2\]

Let’s work through Example 2 from your PDF step by step:

Example 2: Airline Ticket Prices

Data: 80, 120, 150, 90 (dollars)
Assignment: Compute the variance

# Example 2 from your PDF - Airline tickets
prices <- c(80, 120, 150, 90)
N <- length(prices)

cat("✈️ Example 2: Airline Ticket Variance Calculation\n")

## ✈️ Example 2: Airline Ticket Variance Calculation

cat("Data: $", paste(prices, collapse=", $"), "\n")

## Data: $ 80, $120, $150, $90

cat("Population size (N):", N, "\n\n")

## Population size (N): 4

# Step 1: Calculate population mean (μ)
mu <- sum(prices) / N
cat("📊 Step 1 - Population Mean:\n")

## 📊 Step 1 - Population Mean:

cat("μ = (", paste(prices, collapse=" + "), ") ÷", N, "\n")

## μ = ( 80 + 120 + 150 + 90 ) ÷ 4

cat("μ = ", sum(prices), " ÷", N, " = $", mu, "\n\n")

## μ =  440  ÷ 4  = $ 110

# Step 2: Calculate deviations (xi - μ)
deviations <- prices - mu
cat("📏 Step 2 - Deviations from Mean:\n")

## 📏 Step 2 - Deviations from Mean:

for(i in 1:N) {
  cat("x", i, "- μ = $", prices[i], " - $", mu, " = $", deviations[i], "\n")
}

## x 1 - μ = $ 80  - $ 110  = $ -30 
## x 2 - μ = $ 120  - $ 110  = $ 10 
## x 3 - μ = $ 150  - $ 110  = $ 40 
## x 4 - μ = $ 90  - $ 110  = $ -20

# Step 3: Square the deviations
squared_devs <- deviations^2
cat("\n🔢 Step 3 - Squared Deviations:\n")

## 
## 🔢 Step 3 - Squared Deviations:

for(i in 1:N) {
  cat("(x", i, "- μ)² = ($", deviations[i], ")² = ", squared_devs[i], "\n")
}

## (x 1 - μ)² = ($ -30 )² =  900 
## (x 2 - μ)² = ($ 10 )² =  100 
## (x 3 - μ)² = ($ 40 )² =  1600 
## (x 4 - μ)² = ($ -20 )² =  400

# Step 4: Sum squared deviations
sum_squared <- sum(squared_devs)
cat("\n➕ Step 4 - Sum of Squared Deviations:\n")

## 
## ➕ Step 4 - Sum of Squared Deviations:

cat("Σ(xi - μ)² = ", paste(squared_devs, collapse=" + "), " = ", sum_squared, "\n")

## Σ(xi - μ)² =  900 + 100 + 1600 + 400  =  3000

# Step 5: Calculate variance
variance <- sum_squared / N
cat("\n🎯 Step 5 - Population Variance:\n")

## 
## 🎯 Step 5 - Population Variance:

cat("σ² = Σ(xi - μ)² ÷ N = ", sum_squared, " ÷", N, " = ", variance, "\n")

## σ² = Σ(xi - μ)² ÷ N =  3000  ÷ 4  =  750

Now let’s verify using the shortcut formula:

cat("🚀 Shortcut Formula Verification:\n")

## 🚀 Shortcut Formula Verification:

cat("σ² = (Σxi² ÷ N) - μ²\n\n")

## σ² = (Σxi² ÷ N) - μ²

# Calculate Σxi²
sum_x_squared <- sum(prices^2)
cat("📊 Sum of squares:\n")

## 📊 Sum of squares:

for(i in 1:N) {
  cat("x", i, "² = ", prices[i], "² = ", prices[i]^2, "\n")
}

## x 1 ² =  80 ² =  6400 
## x 2 ² =  120 ² =  14400 
## x 3 ² =  150 ² =  22500 
## x 4 ² =  90 ² =  8100

cat("Σxi² = ", paste(prices^2, collapse=" + "), " = ", sum_x_squared, "\n\n")

## Σxi² =  6400 + 14400 + 22500 + 8100  =  51400

# Apply shortcut formula
shortcut_variance <- (sum_x_squared / N) - mu^2
cat("🔄 Shortcut calculation:\n")

## 🔄 Shortcut calculation:

cat("σ² = (", sum_x_squared, " ÷", N, ") - (", mu, ")²\n")

## σ² = ( 51400  ÷ 4 ) - ( 110 )²

cat("σ² = ", sum_x_squared/N, " - ", mu^2, " = ", shortcut_variance, "\n\n")

## σ² =  12850  -  12100  =  750

cat("✅ Both methods give the same result: σ² =", variance, "\n")

## ✅ Both methods give the same result: σ² = 750

3.3 Sample Variance - The n-1 Mystery Solved

When we have a sample (not the whole population), we use:

\[s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2\]

Why n-1 instead of n? Because we used our data to estimate the mean, we “lose” one degree of freedom. This gives us a better estimate of the true population variance.

cat("🔬 Sample Variance (treating our airline data as a sample):\n")

## 🔬 Sample Variance (treating our airline data as a sample):

n <- length(prices)
x_bar <- mean(prices)  # Sample mean

# Calculate sample variance manually
sample_deviations <- prices - x_bar
sample_variance_manual <- sum(sample_deviations^2) / (n - 1)

cat("Sample mean (x̄):", x_bar, "\n")

## Sample mean (x̄): 110

cat("Manual calculation: s² =", round(sample_variance_manual, 2), "\n")

## Manual calculation: s² = 1000

# Verify with R function
sample_variance_R <- var(prices)
cat("R var() function:", round(sample_variance_R, 2), "\n")

## R var() function: 1000

cat("✅ Perfect match!\n")

## ✅ Perfect match!

Chapter 4: Frequency Distribution Formulas - All From Your PDF

When data is organized in frequency tables, we need special formulas. Your PDF shows these clearly:

4.1 Frequency Data Formulas

Population Variance for Frequency Data

Original Formula: \[\sigma^2 = \frac{1}{N}\sum_{k=1}^{K}(x_k - \mu)^2 f_k = \sum_{k=1}^{K}(x_k - \mu)^2 p_k\]

Shortcut Formula: \[\sigma^2 = \frac{\sum_{k=1}^{K} x_k^2 f_k}{N} - \mu^2 = \sum_{k=1}^{K} x_k^2 p_k - \mu^2\]

Example 3 from Your PDF: Car Ownership

Let’s work through Example 3 exactly as shown in your notes:

Variable: Number of cars owned (sample of 100 families)

Number of cars (xk)	Frequency (fk)	Proportion (pk)
1	32	0.32
2	48	0.48
3	16	0.16
5	4	0.04

# Example 3 from PDF - Car ownership frequency data
cars_owned <- c(1, 2, 3, 5)
frequencies <- c(32, 48, 16, 4)
N_total <- sum(frequencies)
proportions <- frequencies / N_total

cat("🚗 Example 3: Car Ownership Frequency Analysis\n")

## 🚗 Example 3: Car Ownership Frequency Analysis

cat("Values (xk):", paste(cars_owned, collapse=", "), "\n")

## Values (xk): 1, 2, 3, 5

cat("Frequencies (fk):", paste(frequencies, collapse=", "), "\n")

## Frequencies (fk): 32, 48, 16, 4

cat("Proportions (pk):", paste(round(proportions, 2), collapse=", "), "\n")

## Proportions (pk): 0.32, 0.48, 0.16, 0.04

cat("Total families (N):", N_total, "\n\n")

## Total families (N): 100

# Step 1: Calculate weighted mean
weighted_mean <- sum(cars_owned * proportions)
cat("📊 Step 1 - Weighted Mean:\n")

## 📊 Step 1 - Weighted Mean:

cat("μ = Σ(xk × pk)\n")

## μ = Σ(xk × pk)

for(i in 1:length(cars_owned)) {
  cat("   ", cars_owned[i], "×", proportions[i], "=", 
      round(cars_owned[i] * proportions[i], 3), "\n")
}

##     1 × 0.32 = 0.32 
##     2 × 0.48 = 0.96 
##     3 × 0.16 = 0.48 
##     5 × 0.04 = 0.2

cat("μ = ", round(weighted_mean, 3), " cars per family\n\n")

## μ =  1.96  cars per family

# Step 2: Variance using original formula
cat("📈 Step 2 - Variance (Original Formula):\n")

## 📈 Step 2 - Variance (Original Formula):

cat("σ² = Σ(xk - μ)² × pk\n")

## σ² = Σ(xk - μ)² × pk

deviations_freq <- cars_owned - weighted_mean
squared_dev_freq <- deviations_freq^2
variance_components <- squared_dev_freq * proportions

for(i in 1:length(cars_owned)) {
  cat("(", cars_owned[i], "-", round(weighted_mean, 3), ")² × ", 
      proportions[i], " = ", round(squared_dev_freq[i], 3), " × ", 
      proportions[i], " = ", round(variance_components[i], 4), "\n")
}

## ( 1 - 1.96 )² ×  0.32  =  0.922  ×  0.32  =  0.2949 
## ( 2 - 1.96 )² ×  0.48  =  0.002  ×  0.48  =  8e-04 
## ( 3 - 1.96 )² ×  0.16  =  1.082  ×  0.16  =  0.1731 
## ( 5 - 1.96 )² ×  0.04  =  9.242  ×  0.04  =  0.3697

variance_freq <- sum(variance_components)
cat("σ² = ", paste(round(variance_components, 4), collapse=" + "), 
    " = ", round(variance_freq, 3), "\n\n")

## σ² =  0.2949 + 8e-04 + 0.1731 + 0.3697  =  0.838

# Step 3: Verify with shortcut formula
cat("🚀 Step 3 - Shortcut Formula Verification:\n")

## 🚀 Step 3 - Shortcut Formula Verification:

cat("σ² = Σ(xk² × pk) - μ²\n")

## σ² = Σ(xk² × pk) - μ²

x_squared_times_p <- cars_owned^2 * proportions
for(i in 1:length(cars_owned)) {
  cat(cars_owned[i], "² × ", proportions[i], " = ", 
      cars_owned[i]^2, " × ", proportions[i], " = ", 
      round(x_squared_times_p[i], 3), "\n")
}

## 1 ² ×  0.32  =  1  ×  0.32  =  0.32 
## 2 ² ×  0.48  =  4  ×  0.48  =  1.92 
## 3 ² ×  0.16  =  9  ×  0.16  =  1.44 
## 5 ² ×  0.04  =  25  ×  0.04  =  1

shortcut_variance <- sum(x_squared_times_p) - weighted_mean^2
cat("σ² = ", round(sum(x_squared_times_p), 3), " - ", 
    round(weighted_mean^2, 3), " = ", round(shortcut_variance, 3), "\n")

## σ² =  4.68  -  3.842  =  0.838

cat("✅ Both formulas match!\n")

## ✅ Both formulas match!

Sample Variance for Frequency Data

Sample Formula: \[s^2 = \frac{n}{n-1}\sum_{k=1}^{K}(x_k - \bar{x})^2 p_k\]

Sample Shortcut Formula: \[s^2 = \frac{n}{n-1}\left(\sum_{k=1}^{K} x_k^2 p_k - \bar{x}^2\right)\]

# Calculate sample variance for frequency data
sample_var_freq <- (N_total/(N_total-1)) * variance_freq

cat("🔬 Sample Variance for Frequency Data:\n")

## 🔬 Sample Variance for Frequency Data:

cat("s² = (n/(n-1)) × σ²\n")

## s² = (n/(n-1)) × σ²

cat("s² = (", N_total, "/(", N_total, "-1)) × ", round(variance_freq, 3), "\n")

## s² = ( 100 /( 100 -1)) ×  0.838

cat("s² = ", round(sample_var_freq, 3), "\n")

## s² =  0.847

Chapter 5: Grouped Data and Standard Deviation

5.1 Grouped Data (Interval Classes) - Approximate Variance

Your PDF shows that for grouped data (continuous variables in interval classes), we get approximate variance:

Population Variance: \[\sigma^2 = \frac{1}{N}\sum_{k=1}^{K}(m_k - \mu)^2 f_k\]

Sample Variance: \[s^2 = \frac{n}{n-1}\left[\frac{\sum_{k=1}^{K} m_k^2 f_k}{n} - \bar{x}^2\right]\]

Where \(m_k\) = midpoint of each interval class.

# Example of grouped data - car speeds
cat("🏎️ Grouped Data Example: Car Speeds\n")

## 🏎️ Grouped Data Example: Car Speeds

cat("Note: This gives APPROXIMATE variance because we use midpoints\n\n")

## Note: This gives APPROXIMATE variance because we use midpoints

# Speed intervals and their midpoints
intervals <- c("[0,30)", "[30,50)", "[50,100)")
midpoints <- c(15, 40, 75)  # mk values
frequencies_speed <- c(4, 12, 8)
n_speed <- sum(frequencies_speed)

cat("Speed Intervals:", paste(intervals, collapse=", "), "\n")

## Speed Intervals: [0,30), [30,50), [50,100)

cat("Midpoints (mk):", paste(midpoints, collapse=", "), "\n")

## Midpoints (mk): 15, 40, 75

cat("Frequencies (fk):", paste(frequencies_speed, collapse=", "), "\n\n")

## Frequencies (fk): 4, 12, 8

# Calculate approximate mean
approx_mean <- sum(midpoints * frequencies_speed) / n_speed
cat("Approximate mean: x̄ =", round(approx_mean, 1), "km/h\n")

## Approximate mean: x̄ = 47.5 km/h

# Calculate approximate sample variance
mk_squared_fk <- midpoints^2 * frequencies_speed
approx_variance <- (n_speed/(n_speed-1)) * 
                   (sum(mk_squared_fk)/n_speed - approx_mean^2)

cat("Approximate sample variance: s² =", round(approx_variance, 1), "\n")

## Approximate sample variance: s² = 476.1

cat("⚠️ Remember: This is approximate because we used interval midpoints!\n")

## ⚠️ Remember: This is approximate because we used interval midpoints!

5.2 Standard Deviation - Making Variance Interpretable

The Problem with Variance Units

Variance has squared units - if we measure height in cm, variance is in cm². What does “25 cm²” of height variability mean? It’s confusing!

Solution: Take the square root to get back to original units!

\[\text{Standard Deviation} = \sqrt{\text{Variance}}\]

Population Standard Deviation

\[\sigma = \sqrt{\sigma^2}\]

Sample Standard Deviation

\[s = \sqrt{s^2}\]

# Using our airline ticket example
cat("✈️ Standard Deviation from Airline Ticket Example:\n")

## ✈️ Standard Deviation from Airline Ticket Example:

cat("Variance: σ² =", variance, "dollars²\n")

## Variance: σ² = 750 dollars²

std_dev <- sqrt(variance)
cat("Standard deviation: σ = √", variance, " = $", round(std_dev, 2), "\n\n")

## Standard deviation: σ = √ 750  = $ 27.39

cat("💡 Interpretation: On average, ticket prices are about $", 
    round(std_dev, 2), " away from the mean of $", mu, "\n")

## 💡 Interpretation: On average, ticket prices are about $ 27.39  away from the mean of $ 110

# For the car ownership example
std_dev_cars <- sqrt(variance_freq)
cat("\n🚗 Car Ownership Standard Deviation:\n")

## 
## 🚗 Car Ownership Standard Deviation:

cat("σ = √", round(variance_freq, 3), " = ", round(std_dev_cars, 3), " cars\n")

## σ = √ 0.838  =  0.916  cars

5.3 The Empirical Rule (68-95-99.7 Rule)

For normally distributed data: - 68% of data falls within 1 standard deviation of the mean - 95% of data falls within 2 standard deviations
- 99.7% of data falls within 3 standard deviations

# Visualize the empirical rule
x <- seq(-4, 4, 0.01)
y <- dnorm(x)

plot(x, y, type="l", lwd=2, col="blue", 
     main="The Empirical Rule (68-95-99.7)",
     xlab="Standard deviations from mean", 
     ylab="Probability density")

# Shade areas
polygon(c(-1, seq(-1, 1, 0.01), 1), c(0, dnorm(seq(-1, 1, 0.01)), 0), 
        col=rgb(0,0,1,0.3), border=NA)
polygon(c(-2, seq(-2, 2, 0.01), 2), c(0, dnorm(seq(-2, 2, 0.01)), 0), 
        col=rgb(0,1,0,0.2), border=NA)
polygon(c(-3, seq(-3, 3, 0.01), 3), c(0, dnorm(seq(-3, 3, 0.01)), 0), 
        col=rgb(1,0,0,0.1), border=NA)

# Add labels
text(0, 0.2, "68%", cex=1.2, font=2)
text(0, 0.05, "95%", cex=1.2, font=2)
text(0, 0.02, "99.7%", cex=1.2, font=2)

# Add vertical lines
abline(v=c(-3,-2,-1,0,1,2,3), lty=2, col="gray")

# Apply empirical rule to car prices
price_mean <- mean(cars$price_num)
price_sd <- sd(cars$price_num)

cat("🚗 Empirical Rule Applied to Car Prices:\n")

## 🚗 Empirical Rule Applied to Car Prices:

cat("Mean price: $", round(price_mean, 0), "\n")

## Mean price: $ 30620

cat("Standard deviation: $", round(price_sd, 0), "\n\n")

## Standard deviation: $ 9531

cat("📊 Expected ranges:\n")

## 📊 Expected ranges:

cat("68% of cars priced between: $", 
    round(price_mean - price_sd, 0), " - $", 
    round(price_mean + price_sd, 0), "\n")

## 68% of cars priced between: $ 21089  - $ 40151

cat("95% of cars priced between: $", 
    round(price_mean - 2*price_sd, 0), " - $", 
    round(price_mean + 2*price_sd, 0), "\n")

## 95% of cars priced between: $ 11557  - $ 49682

cat("99.7% of cars priced between: $", 
    round(price_mean - 3*price_sd, 0), " - $", 
    round(price_mean + 3*price_sd, 0), "\n")

## 99.7% of cars priced between: $ 2026  - $ 59214

Chapter 6: UBStats Functions and Coefficient of Variation

6.1 Using UBStats Functions - Making Life Easier

Now let’s use the functions from your class scripts to calculate all these measures automatically:

# Using UBStats functions for car price analysis
cat("🔧 Using UBStats Functions for Car Price Analysis:\n\n")

## 🔧 Using UBStats Functions for Car Price Analysis:

# All dispersion measures at once
cat("📊 All Variability Measures:\n")

## 📊 All Variability Measures:

price_dispersion <- distr.summary.x(cars$price_num, stats="dispersion")

##   n n.a    range  IQrange      sd      var   cv
##  32   0 34554.64 13107.09 9531.25 90844686 0.31

print(price_dispersion)

## $`Measures of dispersion`
##    n n.a    range  IQrange       sd      var        cv
## 1 32   0 34554.64 13107.09 9531.248 90844686 0.3112753

cat("\n📈 Complete Summary (Central Tendency + Variability):\n")

## 
## 📈 Complete Summary (Central Tendency + Variability):

price_summary <- distr.summary.x(cars$price_num, stats="summary")

##   n n.a      min       q1   median     mean       q3      max      sd      var
##  32   0 14123.39 24104.94 27685.31 30619.99 37212.03 48678.02 9531.25 90844686

print(price_summary)

## $`Summary measures`
##    n n.a      min       q1   median     mean       q3      max       sd
## 1 32   0 14123.39 24104.94 27685.31 30619.99 37212.03 48678.02 9531.248
##        var
## 1 90844686

6.2 Coefficient of Variation - Comparing Different Variables

The coefficient of variation (CV) lets us compare variability across different units:

\[CV = \frac{\text{Standard Deviation}}{\text{Mean}} \times 100\%\]

cat("📊 Coefficient of Variation Analysis:\n")

## 📊 Coefficient of Variation Analysis:

cat("Which car characteristic is most variable?\n\n")

## Which car characteristic is most variable?

# Calculate CV for different variables
price_cv <- (sd(cars$price_num) / mean(cars$price_num)) * 100
speed_cv <- (sd(cars$maxspeed) / mean(cars$maxspeed)) * 100
weight_cv <- (sd(cars$weight) / mean(cars$weight)) * 100
accel_cv <- (sd(cars$acceleration) / mean(cars$acceleration)) * 100

cat("Variable         Mean        SD        CV\n")

## Variable         Mean        SD        CV

cat("Price ($)      ", sprintf("%8.0f", mean(cars$price_num)), 
    "   ", sprintf("%8.0f", sd(cars$price_num)), 
    "   ", sprintf("%5.1f%%", price_cv), "\n")

## Price ($)          30620         9531      31.1%

cat("Max Speed      ", sprintf("%8.1f", mean(cars$maxspeed)), 
    "   ", sprintf("%8.1f", sd(cars$maxspeed)), 
    "   ", sprintf("%5.1f%%", speed_cv), "\n")

## Max Speed          420.6        169.1      40.2%

cat("Weight (kg)    ", sprintf("%8.0f", mean(cars$weight)), 
    "   ", sprintf("%8.0f", sd(cars$weight)), 
    "   ", sprintf("%5.1f%%", weight_cv), "\n")

## Weight (kg)         1609          489      30.4%

cat("Acceleration   ", sprintf("%8.2f", mean(cars$acceleration)), 
    "   ", sprintf("%8.2f", sd(cars$acceleration)), 
    "   ", sprintf("%5.1f%%", accel_cv), "\n\n")

## Acceleration       12.51         4.27      34.2%

# Interpretation
cat("💡 Interpretation:\n")

## 💡 Interpretation:

cv_values <- c(Price = price_cv, Speed = speed_cv, Weight = weight_cv, Acceleration = accel_cv)
cv_sorted <- sort(cv_values, decreasing = TRUE)

for(i in 1:length(cv_sorted)) {
  cat(i, ". ", names(cv_sorted)[i], ": ", sprintf("%.1f%%", cv_sorted[i]), 
      " (", ifelse(cv_sorted[i] > 30, "High", ifelse(cv_sorted[i] > 15, "Moderate", "Low")), " variability)\n")
}

## 1 .  Speed :  40.2%  ( High  variability)
## 2 .  Acceleration :  34.2%  ( High  variability)
## 3 .  Price :  31.1%  ( High  variability)
## 4 .  Weight :  30.4%  ( High  variability)

cat("\nMost variable:", names(cv_sorted)[1], "- indicates diverse market segments\n")

## 
## Most variable: Speed - indicates diverse market segments

cat("Least variable:", names(cv_sorted)[4], "- suggests more standardized characteristics\n")

## Least variable: Weight - suggests more standardized characteristics

Chapter 7: Advanced Examples from Your PDF Notes

7.1 Example 4: Five Variables with Same Mean and Median

Let’s work through Example 4 from your variability PDF - the five variables comparison:

cat("📋 Example 4: Five Variables Analysis\n")

## 📋 Example 4: Five Variables Analysis

cat("Goal: Show that variables can have same mean/median but different variability\n\n")

## Goal: Show that variables can have same mean/median but different variability

# Data from your PDF (N=7 for each variable)
var_A <- c(1, 1, 4, 4, 4, 7, 7)
var_B <- c(1, 2, 3, 4, 5, 6, 7)  
var_C <- c(1, 2, 4, 4, 4, 6, 7)
var_D <- c(1, 4, 4, 4, 4, 4, 7)
var_E <- c(4, 4, 4, 4, 4, 4, 4)

variables <- list(A=var_A, B=var_B, C=var_C, D=var_D, E=var_E)

cat("📊 Data for each variable:\n")

## 📊 Data for each variable:

for(i in 1:5) {
  cat("Variable", names(variables)[i], ":", 
      paste(variables[[i]], collapse=", "), "\n")
}

## Variable A : 1, 1, 4, 4, 4, 7, 7 
## Variable B : 1, 2, 3, 4, 5, 6, 7 
## Variable C : 1, 2, 4, 4, 4, 6, 7 
## Variable D : 1, 4, 4, 4, 4, 4, 7 
## Variable E : 4, 4, 4, 4, 4, 4, 4

cat("\n📈 Complete Analysis of All Five Variables:\n\n")

## 
## 📈 Complete Analysis of All Five Variables:

# Calculate all measures exactly as shown in your PDF
results <- data.frame(
  Variable = names(variables),
  Min = sapply(variables, min),
  Q1 = sapply(variables, function(x) quantile(x, 0.25)),
  Median = sapply(variables, median),
  Mean = sapply(variables, mean),
  Q3 = sapply(variables, function(x) quantile(x, 0.75)),
  Max = sapply(variables, max),
  Range = sapply(variables, function(x) max(x) - min(x)),
  IQR = sapply(variables, IQR),
  Variance = sapply(variables, var),
  Std_Dev = sapply(variables, sd)
)

# Print only the numeric columns with proper rounding
results_numeric <- results[, -1]  # Remove the Variable column for rounding
results_rounded <- round(results_numeric, 2)
results_final <- cbind(Variable = results$Variable, results_rounded)

print(results_final)

##   Variable Min  Q1 Median Mean  Q3 Max Range IQR Variance Std_Dev
## A        A   1 2.5      4    4 5.5   7     6   3     6.00    2.45
## B        B   1 2.5      4    4 5.5   7     6   3     4.67    2.16
## C        C   1 3.0      4    4 5.0   7     6   2     4.33    2.08
## D        D   1 4.0      4    4 4.0   7     6   0     3.00    1.73
## E        E   4 4.0      4    4 4.0   4     0   0     0.00    0.00

cat("\n🔍 Key Observations from Your PDF:\n")

## 
## 🔍 Key Observations from Your PDF:

cat("- ALL variables have same mean (4) and median (4)\n")

## - ALL variables have same mean (4) and median (4)

cat("- But they have very different variability patterns!\n")

## - But they have very different variability patterns!

cat("- Variable E has zero variability (all values = 4)\n")

## - Variable E has zero variability (all values = 4)

cat("- Variable A has highest variability\n")

## - Variable A has highest variability

cat("- Only variance and standard deviation capture these differences!\n")

## - Only variance and standard deviation capture these differences!

# Visualize all five variables
par(mfrow=c(2,3))

for(i in 1:5) {
  boxplot(variables[[i]], 
          main=paste("Variable", names(variables)[i]),
          ylab="Values", col=rainbow(5)[i])
  abline(h=4, col="red", lty=2)  # Mean line
}

# Add histogram for comparison
hist(var_A, main="Variable A Distribution", 
     xlab="Values", col="lightblue", breaks=0:8)

par(mfrow=c(1,1))

7.2 Example 5: Business Scenario - Japanese vs European Cars

Now let’s tackle Example 5 - the business scenario from your PDF:

cat("🚗 Example 5: Car Manufacturer Price Analysis\n")

## 🚗 Example 5: Car Manufacturer Price Analysis

cat("Business Question: Are Japanese car prices too differentiated?\n\n")

## Business Question: Are Japanese car prices too differentiated?

# Question 1: Are car models more dispersed in terms of sales or price?
cat("📊 Question 1: Sales vs Price Variability\n")

## 📊 Question 1: Sales vs Price Variability

sales_stats <- distr.summary.x(cars$sales, stats=c("mean","sd","cv"))

##   n n.a    mean    sd   cv
##  32   0 8012.88 81.21 0.01

price_stats <- distr.summary.x(cars$price_num, stats=c("mean","sd","cv"))

##   n n.a     mean      sd   cv
##  32   0 30619.99 9531.25 0.31

print(sales_stats)

## $`Requested statistics`
##    n n.a     mean       sd         cv
## 1 32   0 8012.875 81.21407 0.01013545

print(price_stats)

## $`Requested statistics`
##    n n.a     mean       sd        cv
## 1 32   0 30619.99 9531.248 0.3112753

cat("\n💡 Answer: Sales are more dispersed!\n")

## 
## 💡 Answer: Sales are more dispersed!

cat("Sales CV =", round((sd(cars$sales)/mean(cars$sales))*100, 1), "%\n")

## Sales CV = 1 %

cat("Price CV =", round((sd(cars$price_num)/mean(cars$price_num))*100, 1), "%\n")

## Price CV = 31.1 %

cat("\n📊 Question 2: Price Variability by Country\n")

## 
## 📊 Question 2: Price Variability by Country

# Compare price variability among different countries
price_by_country <- distr.summary.x(cars$price_num, 
                                   stats=c("mean","sd","cv"), 
                                   by1=cars$country)

##   cars$country  n n.a     mean       sd   cv
##         France  4   0 25927.03  6804.56 0.26
##        Germany  6   0 29644.77 10216.58 0.34
##          Italy  6   0 30332.72 11953.73 0.39
##          Japan 12   0 31136.82  9580.72 0.31
##  United States  4   0 35656.21  8799.23 0.25

print(price_by_country)

## $`Requested statistics`
##    cars$country  n n.a     mean        sd        cv
## 1        France  4   0 25927.03  6804.561 0.2624504
## 2       Germany  6   0 29644.77 10216.579 0.3446334
## 3         Italy  6   0 30332.72 11953.735 0.3940871
## 4         Japan 12   0 31136.82  9580.721 0.3076975
## 5 United States  4   0 35656.21  8799.226 0.2467796

cat("\n🎯 Manager's Concern Analysis:\n")

## 
## 🎯 Manager's Concern Analysis:

# Check if Japan and Germany exist in the data
available_countries <- unique(cars$country)
cat("Available countries:", paste(available_countries, collapse=", "), "\n")

## Available countries: Germany, Italy, Japan, United States, France

# Find Japan and Germany data more safely
if("Japan" %in% available_countries && "Germany" %in% available_countries) {
  # Calculate CV manually for Japan and Germany
  japan_cars <- cars[cars$country == "Japan", ]
  germany_cars <- cars[cars$country == "Germany", ]
  
  japan_cv <- (sd(japan_cars$price_num, na.rm=TRUE) / mean(japan_cars$price_num, na.rm=TRUE)) * 100
  germany_cv <- (sd(germany_cars$price_num, na.rm=TRUE) / mean(germany_cars$price_num, na.rm=TRUE)) * 100
  
  cat("Japanese cars CV:", round(japan_cv, 1), "%\n")
  cat("German cars CV:", round(germany_cv, 1), "%\n")
  
  if(japan_cv > germany_cv) {
    cat("⚠️  Manager's concern is VALID - Japanese cars are more variable\n")
  } else {
    cat("✅ Manager's concern is UNFOUNDED - Japanese cars are not more variable\n")
  }
} else {
  cat("Note: Japan or Germany not found in dataset\n")
  cat("Comparing available countries instead:\n")
  
  # Get CV for all countries
  country_cvs <- c()
  for(country in available_countries) {
    country_cars <- cars[cars$country == country, ]
    if(nrow(country_cars) > 1) {
      cv <- (sd(country_cars$price_num, na.rm=TRUE) / mean(country_cars$price_num, na.rm=TRUE)) * 100
      country_cvs[country] <- cv
      cat(country, "CV:", round(cv, 1), "%\n")
    }
  }
  
  # Find most and least variable
  if(length(country_cvs) > 0) {
    most_variable <- names(which.max(country_cvs))
    least_variable <- names(which.min(country_cvs))
    cat("Most variable country:", most_variable, "(", round(max(country_cvs), 1), "%)\n")
    cat("Least variable country:", least_variable, "(", round(min(country_cvs), 1), "%)\n")
  }
}

## Japanese cars CV: 30.8 %
## German cars CV: 34.5 %
## ✅ Manager's concern is UNFOUNDED - Japanese cars are not more variable

Chapter 8: Bivariate Analysis - Studying Relationships

Now we move to Part 2 of today’s lecture: How do we study relationships between TWO variables? This is called bivariate analysis.

8.1 Types of Bivariate Relationships

Your PDF notes show three main cases:

Two Qualitative Variables → Crosstabs and bar charts
Two Quantitative Variables → Scatterplots and correlation
One Qualitative + One Quantitative → Conditional analysis

cat("🔍 Bivariate Analysis Overview:\n")

## 🔍 Bivariate Analysis Overview:

cat("We'll explore relationships between pairs of variables\n")

## We'll explore relationships between pairs of variables

cat("Goal: Understand how one variable relates to another\n\n")

## Goal: Understand how one variable relates to another

# Check our available variables
cat("Available variables in cars dataset:\n")

## Available variables in cars dataset:

str(cars[, c("country", "price_classes", "price_num", "maxspeed", "acceleration")])

## 'data.frame':    32 obs. of  5 variables:
##  $ country      : chr  "Germany" "Italy" "Japan" "United States" ...
##  $ price_classes: Factor w/ 3 levels "low","mid","high": 2 1 2 3 3 3 3 2 2 2 ...
##  $ price_num    : num  24634 19354 23888 39901 39129 ...
##  $ maxspeed     : num  314 345 270 314 470 ...
##  $ acceleration : num  15.5 15.6 15.7 12.3 12.4 ...

8.2 Two Qualitative Variables - Crosstabs

When both variables are categorical, we use crosstabs (contingency tables).

Example from Bivariate PDF: Industry vs Solvency Rating

Let’s recreate Case A from your bivariate analysis notes:

# Example from your bivariate PDF - Industry vs Solvency Rating
cat("📋 Case A: Industry vs Solvency Rating\n")

## 📋 Case A: Industry vs Solvency Rating

cat("Question: Does company rating depend on industry?\n\n")

## Question: Does company rating depend on industry?

# Create the data from your PDF
industry <- c(rep("Manufacturing", 240), rep("Financial", 160))
solvency <- c(rep("Low", 36), rep("Average", 124), rep("High", 80),     # Manufacturing
              rep("Low", 12), rep("Average", 64), rep("High", 84))      # Financial

# Create crosstab
crosstab <- table(industry, solvency)
print(crosstab)

##                solvency
## industry        Average High Low
##   Financial          64   84  12
##   Manufacturing     124   80  36

cat("\n📊 Step 1: Calculate conditional frequencies (Y|X)\n")

## 
## 📊 Step 1: Calculate conditional frequencies (Y|X)

# Calculate conditional frequencies by row
conditional_freq <- prop.table(crosstab, margin=1) * 100
print(round(conditional_freq, 1))

##                solvency
## industry        Average High  Low
##   Financial        40.0 52.5  7.5
##   Manufacturing    51.7 33.3 15.0

# Step 2: Graphical presentation - Stacked bar chart
barplot(conditional_freq, beside=FALSE, 
        main="Solvency Rating by Industry (Conditional %)",
        xlab="Solvency Rating", ylab="Percentage",
        col=c("lightblue", "lightcoral"),
        legend.text=rownames(conditional_freq))

cat("\n🎯 Step 3: Conclusion\n")

## 
## 🎯 Step 3: Conclusion

cat("Manufacturing: Only 33.3% have high rating\n")

## Manufacturing: Only 33.3% have high rating

cat("Financial: 52.5% have high rating\n")

## Financial: 52.5% have high rating

cat("→ The distributions are DIFFERENT, so rating depends on industry!\n")

## → The distributions are DIFFERENT, so rating depends on industry!

Using UBStats Functions for Crosstabs

# Let's analyze country vs price class in our cars dataset
cat("🚗 Analyzing Country vs Price Class:\n")

## 🚗 Analyzing Country vs Price Class:

# Create frequency table
country_price_table <- distr.table.xy(cars$country, cars$price_classes, 
                                      freq.type="y|x", freq="prop")

## y|x: Proportions
##                cars$price_classes
## cars$country     low  mid high TOTAL
##   France        0.25 0.50 0.25  1.00
##   Germany       0.17 0.67 0.17  1.00
##   Italy         0.33 0.17 0.50  1.00
##   Japan         0.08 0.67 0.25  1.00
##   United States 0.00 0.50 0.50  1.00

print(country_price_table)

## $`y|x: Proportions`
##                      low       mid      high TOTAL
## France        0.25000000 0.5000000 0.2500000     1
## Germany       0.16666667 0.6666667 0.1666667     1
## Italy         0.33333333 0.1666667 0.5000000     1
## Japan         0.08333333 0.6666667 0.2500000     1
## United States 0.00000000 0.5000000 0.5000000     1

# Create visualization
distr.plot.xy(cars$country, cars$price_classes, 
              freq.type="y|x", freq="prop",
              plot.type="bars", bar.type="stacked")

8.3 Two Quantitative Variables - Correlation and Scatterplots

When both variables are numerical, we use scatterplots and correlation.

Correlation Coefficient

The correlation coefficient (r) measures the strength of linear relationship:

\(r = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2 \sum_{i=1}^{n}(y_i - \bar{y})^2}}\)

r = +1: Perfect positive linear relationship
r = 0: No linear relationship
r = -1: Perfect negative linear relationship

cat("📈 Correlation Analysis: Car Price vs Performance\n\n")

## 📈 Correlation Analysis: Car Price vs Performance

# Calculate correlations
price_speed_cor <- cor(cars$price_num, cars$maxspeed)
price_accel_cor <- cor(cars$price_num, cars$acceleration)

cat("Price vs Max Speed correlation:", round(price_speed_cor, 3), "\n")

## Price vs Max Speed correlation: 0.794

cat("Price vs Acceleration correlation:", round(price_accel_cor, 3), "\n\n")

## Price vs Acceleration correlation: -0.827

# Interpretation
cat("💡 Interpretation:\n")

## 💡 Interpretation:

if(price_speed_cor > 0.5) {
  cat("- Strong positive correlation between price and speed\n")
} else if(price_speed_cor > 0.3) {
  cat("- Moderate positive correlation between price and speed\n")
} else {
  cat("- Weak correlation between price and speed\n")
}

## - Strong positive correlation between price and speed

if(price_accel_cor < -0.3) {
  cat("- Negative correlation: expensive cars accelerate faster (lower seconds)\n")
}

## - Negative correlation: expensive cars accelerate faster (lower seconds)

# Create scatterplots
par(mfrow=c(1,2))

# Price vs Speed
plot(cars$price_num, cars$maxspeed, 
     main="Price vs Max Speed", 
     xlab="Price ($)", ylab="Max Speed (km/h)",
     pch=19, col="blue", alpha=0.6)

## Warning in plot.window(...): "alpha" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "alpha" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "alpha" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "alpha" is not a
## graphical parameter

## Warning in box(...): "alpha" is not a graphical parameter

## Warning in title(...): "alpha" is not a graphical parameter

abline(lm(maxspeed ~ price_num, data=cars), col="red", lwd=2)

# Price vs Acceleration  
plot(cars$price_num, cars$acceleration,
     main="Price vs Acceleration",
     xlab="Price ($)", ylab="Acceleration (0-100 km/h seconds)",
     pch=19, col="green", alpha=0.6)

## Warning in plot.window(...): "alpha" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "alpha" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "alpha" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "alpha" is not a
## graphical parameter

## Warning in box(...): "alpha" is not a graphical parameter

## Warning in title(...): "alpha" is not a graphical parameter

abline(lm(acceleration ~ price_num, data=cars), col="red", lwd=2)

par(mfrow=c(1,1))

Using UBStats for Scatterplots

# Using UBStats functions
distr.plot.xy(cars$price_num, cars$maxspeed, plot.type="scatter")

cat("📊 Using UBStats for correlation analysis:\n")

## 📊 Using UBStats for correlation analysis:

cat("Correlation between price and speed:", 
    round(cor(cars$price_num, cars$maxspeed), 3), "\n")

## Correlation between price and speed: 0.794

8.4 Mixed Types: One Qualitative + One Quantitative

When we have one categorical and one numerical variable, we use conditional analysis.

cat("🔀 Mixed Analysis: Country (Categorical) vs Price (Numerical)\n\n")

## 🔀 Mixed Analysis: Country (Categorical) vs Price (Numerical)

# Conditional summary statistics
price_by_country <- distr.summary.x(cars$price_num, by1=cars$country, 
                                   stats=c("central", "dispersion"))

##   cars$country  n n.a     mode n.modes  mode%   median     mean
##         France  4   0 26792.50       4 0.2500 24430.20 25927.03
##        Germany  6   0 24634.45       6 0.1667 30333.46 29644.77
##          Italy  6   0 19354.12       6 0.1667 28582.25 30332.72
##          Japan 12   0 23887.56      12 0.0833 27685.31 31136.82
##  United States  4   0 39900.58       4 0.2500 34581.44 35656.21
##   cars$country  n n.a    range  IQrange       sd       var   cv
##         France  4   0 15413.19  7396.75  6804.56  46302047 0.26
##        Germany  6   0 30362.11  8410.38 10216.58 104378488 0.34
##          Italy  6   0 27358.69 18614.83 11953.73 142891774 0.39
##          Japan 12   0 31365.75  9538.84  9580.72  91790206 0.31
##  United States  4   0 18497.14 12603.00  8799.23  77426383 0.25

print(price_by_country)

## $`Central tendency measures`
##    cars$country  n n.a     mode n.modes      mode%   median     mean
## 1        France  4   0 26792.50       4 0.25000000 24430.20 25927.03
## 2       Germany  6   0 24634.45       6 0.16666667 30333.46 29644.77
## 3         Italy  6   0 19354.12       6 0.16666667 28582.25 30332.72
## 4         Japan 12   0 23887.56      12 0.08333333 27685.31 31136.82
## 5 United States  4   0 39900.58       4 0.25000000 34581.44 35656.21
## 
## $`Measures of dispersion`
##    cars$country  n n.a    range   IQrange        sd       var        cv
## 1        France  4   0 15413.19  7396.747  6804.561  46302047 0.2624504
## 2       Germany  6   0 30362.11  8410.382 10216.579 104378488 0.3446334
## 3         Italy  6   0 27358.69 18614.825 11953.735 142891774 0.3940871
## 4         Japan 12   0 31365.75  9538.835  9580.721  91790206 0.3076975
## 5 United States  4   0 18497.14 12602.998  8799.226  77426383 0.2467796

# Side-by-side boxplots
distr.plot.xy(cars$country, cars$price_num, plot.type="boxplot")

Chapter 9: Working with R Scripts Functions

Let’s use the exact functions from your class R scripts:

9.1 From Video 3 Script - Univariate Distributions

cat("🔧 Using Functions from Video 3 Script:\n\n")

## 🔧 Using Functions from Video 3 Script:

# Frequency tables for categorical variables
cat("📊 Frequency Table for Country:\n")

## 📊 Frequency Table for Country:

country_table <- distr.table.x(x=cars$country, freq=c("counts","prop","perc"))

##   cars$country Count Prop Percent
##         France     4 0.12      12
##        Germany     6 0.19      19
##          Italy     6 0.19      19
##          Japan    12 0.38      38
##  United States     4 0.12      12
##          TOTAL    32 1.00     100

print(country_table)

##      cars$country Count   Prop Percent
## 1          France     4 0.1250   12.50
## 2         Germany     6 0.1875   18.75
## 3           Italy     6 0.1875   18.75
## 4           Japan    12 0.3750   37.50
## 5   United States     4 0.1250   12.50
## Sum         TOTAL    32 1.0000  100.00

# Numerical variables with many values - use breaks
cat("\n📈 Price Distribution with Custom Breaks:\n")

## 
## 📈 Price Distribution with Custom Breaks:

price_breaks <- c(10000, 20000, 30000, 50000, 100000)
price_table <- distr.table.x(x=cars$price_num, 
                            freq=c("counts","prop","dens"),
                            breaks=price_breaks)

##  cars$price_num Count Prop Density
##   [10000,20000)     5 0.16   2e-05
##   [20000,30000)    14 0.44   4e-05
##   [30000,50000)    13 0.41   2e-05
##  [50000,100000]     0 0.00       0
##           TOTAL    32 1.00

print(price_table)

##     cars$price_num Count    Prop     Density
## 1    [10000,20000)     5 0.15625 1.56250e-05
## 2    [20000,30000)    14 0.43750 4.37500e-05
## 3    [30000,50000)    13 0.40625 2.03125e-05
## 4   [50000,100000]     0 0.00000 0.00000e+00
## Sum          TOTAL    32 1.00000          NA

# Plots from Video 3 script
par(mfrow=c(1,2))

# Pie chart for categorical data
distr.plot.x(x=cars$country, freq="proportions", plot.type="pie")

# Histogram for numerical data
distr.plot.x(x=cars$price_num, plot.type="hist", breaks=8)

par(mfrow=c(1,1))

9.2 From Video 4 Script - Summary Measures

cat("🔧 Using Functions from Video 4 Script:\n\n")

## 🔧 Using Functions from Video 4 Script:

# Summary measures for central tendency
cat("📊 Central Tendency Measures:\n")

## 📊 Central Tendency Measures:

central_measures <- distr.summary.x(x=cars$price_num, stats="central")

##   n n.a     mode n.modes  mode%   median     mean
##  32   0 24634.45      32 0.0312 27685.31 30619.99

print(central_measures)

## $`Central tendency measures`
##    n n.a     mode n.modes   mode%   median     mean
## 1 32   0 24634.45      32 0.03125 27685.31 30619.99

# Quartiles and percentiles
cat("\n📈 Quartiles and Percentiles:\n")

## 
## 📈 Quartiles and Percentiles:

quartile_measures <- distr.summary.x(x=cars$price_num, stats="quartiles")

##   n n.a      min      p25      p50      p75      max
##  32   0 14123.39 24104.94 27685.31 37212.03 48678.02

print(quartile_measures)

## $Quartiles
##    n n.a      min      p25      p50      p75      max
## 1 32   0 14123.39 24104.94 27685.31 37212.03 48678.02

# All dispersion measures
cat("\n📏 All Dispersion Measures:\n")

## 
## 📏 All Dispersion Measures:

dispersion_measures <- distr.summary.x(x=cars$price_num, stats="dispersion")

##   n n.a    range  IQrange      sd      var   cv
##  32   0 34554.64 13107.09 9531.25 90844686 0.31

print(dispersion_measures)

## $`Measures of dispersion`
##    n n.a    range  IQrange       sd      var        cv
## 1 32   0 34554.64 13107.09 9531.248 90844686 0.3112753

# Complete summary
cat("\n📋 Complete Summary:\n")

## 
## 📋 Complete Summary:

complete_summary <- distr.summary.x(x=cars$price_num, stats="summary")

##   n n.a      min       q1   median     mean       q3      max      sd      var
##  32   0 14123.39 24104.94 27685.31 30619.99 37212.03 48678.02 9531.25 90844686

print(complete_summary)

## $`Summary measures`
##    n n.a      min       q1   median     mean       q3      max       sd
## 1 32   0 14123.39 24104.94 27685.31 30619.99 37212.03 48678.02 9531.248
##        var
## 1 90844686

9.3 From Video 5 Script - Bivariate Analysis

cat("🔧 Using Functions from Video 5 Script:\n\n")

## 🔧 Using Functions from Video 5 Script:

# Joint frequency distributions
cat("📊 Joint Frequency Table:\n")

## 📊 Joint Frequency Table:

joint_table <- distr.table.xy(x=cars$country, y=cars$price_classes, 
                             freq.type="joint", freq="count")

## Joint counts
##                cars$price_classes
## cars$country    low mid high TOTAL
##   France          1   2    1     4
##   Germany         1   4    1     6
##   Italy           2   1    3     6
##   Japan           1   8    3    12
##   United States   0   2    2     4
##   TOTAL           5  17   10    32

print(joint_table)

## $`Joint counts`
##               low mid high TOTAL
## France          1   2    1     4
## Germany         1   4    1     6
## Italy           2   1    3     6
## Japan           1   8    3    12
## United States   0   2    2     4
## TOTAL           5  17   10    32

# Conditional frequencies
cat("\n📈 Conditional Frequencies (Price Class | Country):\n")

## 
## 📈 Conditional Frequencies (Price Class | Country):

conditional_table <- distr.table.xy(x=cars$country, y=cars$price_classes,
                                   freq.type="y|x", freq="prop")

## y|x: Proportions
##                cars$price_classes
## cars$country     low  mid high TOTAL
##   France        0.25 0.50 0.25  1.00
##   Germany       0.17 0.67 0.17  1.00
##   Italy         0.33 0.17 0.50  1.00
##   Japan         0.08 0.67 0.25  1.00
##   United States 0.00 0.50 0.50  1.00

print(conditional_table)

## $`y|x: Proportions`
##                      low       mid      high TOTAL
## France        0.25000000 0.5000000 0.2500000     1
## Germany       0.16666667 0.6666667 0.1666667     1
## Italy         0.33333333 0.1666667 0.5000000     1
## Japan         0.08333333 0.6666667 0.2500000     1
## United States 0.00000000 0.5000000 0.5000000     1

# Bivariate plots from Video 5 script
par(mfrow=c(1,2))

# Side-by-side bar chart
distr.plot.xy(x=cars$country, y=cars$price_classes,
              freq.type="y|x", freq="prop",
              plot.type="bars", bar.type="beside")

# Scatter plot for two numerical variables
distr.plot.xy(x=cars$price_num, y=cars$maxspeed, 
              plot.type="scatter")

par(mfrow=c(1,1))

# Correlation analysis
cat("📈 Correlation Analysis:\n")

## 📈 Correlation Analysis:

price_speed_cor <- cor(cars$price_num, cars$maxspeed)
cat("Price vs Speed correlation:", round(price_speed_cor, 3), "\n")

## Price vs Speed correlation: 0.794

Chapter 10: Real-World Business Applications

10.1 Quality Control Example

cat("🏭 Quality Control Application:\n")

## 🏭 Quality Control Application:

cat("A factory produces bolts that should be 10cm long\n\n")

## A factory produces bolts that should be 10cm long

# Simulate two production scenarios
set.seed(123)  # For reproducible results

# Scenario A: Good quality control
bolts_A <- rnorm(100, mean=10.0, sd=0.1)
# Scenario B: Poor quality control  
bolts_B <- rnorm(100, mean=10.0, sd=0.5)

cat("📊 Quality Control Comparison:\n")

## 📊 Quality Control Comparison:

cat("Scenario A: Mean =", round(mean(bolts_A), 3), 
    "cm, SD =", round(sd(bolts_A), 3), "cm\n")

## Scenario A: Mean = 10.009 cm, SD = 0.091 cm

cat("Scenario B: Mean =", round(mean(bolts_B), 3), 
    "cm, SD =", round(sd(bolts_B), 3), "cm\n\n")

## Scenario B: Mean = 9.946 cm, SD = 0.483 cm

# Calculate coefficient of variation
cv_A <- (sd(bolts_A) / mean(bolts_A)) * 100
cv_B <- (sd(bolts_B) / mean(bolts_B)) * 100

cat("💡 Quality Assessment:\n")

## 💡 Quality Assessment:

cat("Scenario A CV:", round(cv_A, 2), "% - EXCELLENT quality control\n")

## Scenario A CV: 0.91 % - EXCELLENT quality control

cat("Scenario B CV:", round(cv_B, 2), "% - POOR quality control\n")

## Scenario B CV: 4.86 % - POOR quality control

# Visualize quality control differences
par(mfrow=c(1,2))

hist(bolts_A, main="Scenario A: Good Quality Control", 
     xlab="Bolt Length (cm)", col="lightgreen", 
     xlim=c(8, 12), breaks=20)
abline(v=10, col="red", lwd=2, lty=2)

hist(bolts_B, main="Scenario B: Poor Quality Control", 
     xlab="Bolt Length (cm)", col="lightcoral", 
     xlim=c(8, 12), breaks=20)
abline(v=10, col="red", lwd=2, lty=2)

par(mfrow=c(1,1))

cat("🎯 Business Impact:\n")

## 🎯 Business Impact:

cat("- Scenario A: Consistent, predictable production\n")

## - Scenario A: Consistent, predictable production

cat("- Scenario B: High variability = quality problems\n")

## - Scenario B: High variability = quality problems

10.2 Investment Risk Analysis

cat("💰 Investment Portfolio Risk Analysis:\n\n")

## 💰 Investment Portfolio Risk Analysis:

# Two investment portfolios with same mean return
portfolio_A <- c(8, 9, 10, 11, 12)  # Conservative
portfolio_B <- c(0, 5, 10, 15, 20)  # Risky

cat("Portfolio A returns:", paste(portfolio_A, "%", collapse=", "), "\n")

## Portfolio A returns: 8 %, 9 %, 10 %, 11 %, 12 %

cat("Portfolio B returns:", paste(portfolio_B, "%", collapse=", "), "\n\n")

## Portfolio B returns: 0 %, 5 %, 10 %, 15 %, 20 %

# Calculate risk measures
mean_A <- mean(portfolio_A)
mean_B <- mean(portfolio_B)
sd_A <- sd(portfolio_A)
sd_B <- sd(portfolio_B)

cat("📊 Risk Analysis:\n")

## 📊 Risk Analysis:

cat("Portfolio A: Mean =", mean_A, "%, SD =", round(sd_A, 2), "%\n")

## Portfolio A: Mean = 10 %, SD = 1.58 %

cat("Portfolio B: Mean =", mean_B, "%, SD =", round(sd_B, 2), "%\n\n")

## Portfolio B: Mean = 10 %, SD = 7.91 %

cat("💡 Investment Advice:\n")

## 💡 Investment Advice:

cat("- Both portfolios have same expected return (", mean_A, "%)\n")

## - Both portfolios have same expected return ( 10 %)

cat("- Portfolio A is much less risky (lower standard deviation)\n")

## - Portfolio A is much less risky (lower standard deviation)

cat("- Portfolio B has higher volatility = higher risk\n")

## - Portfolio B has higher volatility = higher risk

Chapter 11: Practice Problems and Exercises

11.1 Exercise 1: Manual Variance Calculation

Your turn! Calculate variance manually for this small dataset:

Test scores: 85, 90, 78, 92, 88

cat("🎯 Exercise 1: Manual Variance Calculation\n")

## 🎯 Exercise 1: Manual Variance Calculation

cat("Test scores: 85, 90, 78, 92, 88\n")

## Test scores: 85, 90, 78, 92, 88

cat("Calculate sample variance step by step!\n\n")

## Calculate sample variance step by step!

scores <- c(85, 90, 78, 92, 88)

# Step 1: Calculate mean
cat("Step 1: Calculate the mean\n")

## Step 1: Calculate the mean

cat("Your turn! What is (85 + 90 + 78 + 92 + 88) ÷ 5 = ?\n\n")

## Your turn! What is (85 + 90 + 78 + 92 + 88) ÷ 5 = ?

# Provide solution
mean_scores <- mean(scores)
cat("✅ Solution: Mean =", mean_scores, "\n\n")

## ✅ Solution: Mean = 86.6

# Step 2: Calculate deviations
cat("Step 2: Calculate deviations from mean\n")

## Step 2: Calculate deviations from mean

deviations <- scores - mean_scores
for(i in 1:length(scores)) {
  cat("Score", i, ":", scores[i], "- ", mean_scores, "=", deviations[i], "\n")
}

## Score 1 : 85 -  86.6 = -1.6 
## Score 2 : 90 -  86.6 = 3.4 
## Score 3 : 78 -  86.6 = -8.6 
## Score 4 : 92 -  86.6 = 5.4 
## Score 5 : 88 -  86.6 = 1.4

# Step 3: Square deviations and calculate variance
cat("\nStep 3: Square deviations and calculate sample variance\n")

## 
## Step 3: Square deviations and calculate sample variance

squared_devs <- deviations^2
sample_var <- sum(squared_devs) / (length(scores) - 1)

cat("Sum of squared deviations:", sum(squared_devs), "\n")

## Sum of squared deviations: 119.2

cat("Sample variance: s² =", round(sample_var, 2), "\n")

## Sample variance: s² = 29.8

cat("Standard deviation: s =", round(sqrt(sample_var), 2), "\n")

## Standard deviation: s = 5.46

11.2 Exercise 2: Coefficient of Variation Challenge

cat("🎯 Exercise 2: CV Comparison Challenge\n")

## 🎯 Exercise 2: CV Comparison Challenge

cat("Which is more variable: car weights or car prices?\n\n")

## Which is more variable: car weights or car prices?

# Calculate CV for both
weight_cv <- (sd(cars$weight) / mean(cars$weight)) * 100
price_cv <- (sd(cars$price_num) / mean(cars$price_num)) * 100

cat("💪 Weight Statistics:\n")

## 💪 Weight Statistics:

cat("Mean:", round(mean(cars$weight), 0), "kg\n")

## Mean: 1609 kg

cat("SD:", round(sd(cars$weight), 0), "kg\n")

## SD: 489 kg

cat("CV:", round(weight_cv, 1), "%\n\n")

## CV: 30.4 %

cat("💰 Price Statistics:\n")

## 💰 Price Statistics:

cat("Mean: $", round(mean(cars$price_num), 0), "\n")

## Mean: $ 30620

cat("SD: $", round(sd(cars$price_num), 0), "\n")

## SD: $ 9531

cat("CV:", round(price_cv, 1), "%\n\n")

## CV: 31.1 %

cat("🏆 Winner: ", ifelse(weight_cv > price_cv, "Weight", "Price"), 
    " is more variable (higher CV)\n")

## 🏆 Winner:  Price  is more variable (higher CV)

11.3 Exercise 3: Correlation Interpretation

cat("🎯 Exercise 3: Correlation Detective\n")

## 🎯 Exercise 3: Correlation Detective

cat("Interpret these correlation coefficients:\n\n")

## Interpret these correlation coefficients:

# Different correlation examples
correlations <- c(0.85, -0.92, 0.12, -0.45, 0.0)
variables <- c("Study hours vs Exam scores",
              "Car age vs Market value", 
              "Height vs Happiness",
              "Temperature vs Heating costs",
              "Shoe size vs Intelligence")

for(i in 1:length(correlations)) {
  cat("📊", variables[i], ": r =", correlations[i], "\n")
  
  # Interpretation
  r <- abs(correlations[i])
  direction <- ifelse(correlations[i] > 0, "positive", 
                     ifelse(correlations[i] < 0, "negative", "no"))
  
  strength <- ifelse(r > 0.8, "very strong",
                    ifelse(r > 0.6, "strong",
                          ifelse(r > 0.4, "moderate",
                                ifelse(r > 0.2, "weak", "very weak"))))
  
  cat("   Interpretation:", strength, direction, "relationship\n\n")
}

## 📊 Study hours vs Exam scores : r = 0.85 
##    Interpretation: very strong positive relationship
## 
## 📊 Car age vs Market value : r = -0.92 
##    Interpretation: very strong negative relationship
## 
## 📊 Height vs Happiness : r = 0.12 
##    Interpretation: very weak positive relationship
## 
## 📊 Temperature vs Heating costs : r = -0.45 
##    Interpretation: moderate negative relationship
## 
## 📊 Shoe size vs Intelligence : r = 0 
##    Interpretation: very weak no relationship

Chapter 12: Summary and Key Takeaways

12.1 Complete Formula Reference

Here’s your complete reference for all formulas covered today:

Measures of Variability

Range: \(\text{Range} = x_{\max} - x_{\min}\)

IQR: \(\text{IQR} = Q_3 - Q_1\)

Population Variance (Raw Data): \(\sigma^2 = \frac{1}{N}\sum_{i=1}^{N}(x_i - \mu)^2 = \frac{\sum_{i=1}^{N} x_i^2}{N} - \mu^2\)

Sample Variance (Raw Data): \(s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2 = \frac{n}{n-1}\left(\frac{\sum_{i=1}^{n} x_i^2}{n} - \bar{x}^2\right)\)

Frequency Data Variance: \(\sigma^2 = \sum_{k=1}^{K}(x_k - \mu)^2 p_k = \sum_{k=1}^{K} x_k^2 p_k - \mu^2\)

Standard Deviation: \(s = \sqrt{s^2}\)

Coefficient of Variation: \(CV = \frac{s}{\bar{x}} \times 100\%\)

Bivariate Analysis

Correlation Coefficient: \(r = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2 \sum_{i=1}^{n}(y_i - \bar{y})^2}}\)

12.2 When to Use Each Measure

cat("📋 Quick Reference Guide:\n\n")

## 📋 Quick Reference Guide:

measures <- data.frame(
  Measure = c("Range", "IQR", "Variance", "Std Dev", "CV"),
  Best_Used_When = c(
    "Quick estimate, no outliers",
    "Data has outliers or skewed", 
    "Mathematical calculations",
    "Describing spread in original units",
    "Comparing different variables"
  ),
  Sensitive_to_Outliers = c("Yes", "No", "Yes", "Yes", "Depends"),
  stringsAsFactors = FALSE
)

print(measures)

##    Measure                      Best_Used_When Sensitive_to_Outliers
## 1    Range         Quick estimate, no outliers                   Yes
## 2      IQR         Data has outliers or skewed                    No
## 3 Variance           Mathematical calculations                   Yes
## 4  Std Dev Describing spread in original units                   Yes
## 5       CV       Comparing different variables               Depends

12.3 R Functions Summary

cat("🔧 Essential R Functions Summary:\n\n")

## 🔧 Essential R Functions Summary:

cat("📊 Basic Functions:\n")

## 📊 Basic Functions:

cat("range(x)          # Max - Min\n")

## range(x)          # Max - Min

cat("IQR(x)            # Q3 - Q1\n")

## IQR(x)            # Q3 - Q1

cat("var(x)            # Sample variance\n")

## var(x)            # Sample variance

cat("sd(x)             # Sample standard deviation\n")

## sd(x)             # Sample standard deviation

cat("cor(x, y)         # Correlation coefficient\n\n")

## cor(x, y)         # Correlation coefficient

cat("📈 UBStats Functions:\n")

## 📈 UBStats Functions:

cat("distr.summary.x(x, stats='dispersion')    # All variability measures\n")

## distr.summary.x(x, stats='dispersion')    # All variability measures

cat("distr.table.xy(x, y, freq.type='joint')   # Crosstabs\n")

## distr.table.xy(x, y, freq.type='joint')   # Crosstabs

cat("distr.plot.xy(x, y, plot.type='scatter')  # Scatterplots\n")

## distr.plot.xy(x, y, plot.type='scatter')  # Scatterplots

cat("distr.plot.x(x, plot.type='boxplot')      # Boxplots\n")

## distr.plot.x(x, plot.type='boxplot')      # Boxplots

12.4 Business Applications Recap

cat("💼 Key Business Applications:\n\n")

## 💼 Key Business Applications:

cat("🏭 Quality Control:\n")

## 🏭 Quality Control:

cat("- Use standard deviation to monitor production consistency\n")

## - Use standard deviation to monitor production consistency

cat("- Lower SD = better quality control\n")

## - Lower SD = better quality control

cat("- CV helps compare quality across different products\n\n")

## - CV helps compare quality across different products

cat("💰 Investment Analysis:\n")

## 💰 Investment Analysis:

cat("- Standard deviation measures investment risk\n")

## - Standard deviation measures investment risk

cat("- Higher SD = higher risk and volatility\n")

## - Higher SD = higher risk and volatility

cat("- Use correlation to diversify portfolios\n\n")

## - Use correlation to diversify portfolios

cat("📊 Market Research:\n")

## 📊 Market Research:

cat("- Use crosstabs to find customer segments\n")

## - Use crosstabs to find customer segments

cat("- Correlation identifies relationships between variables\n")

## - Correlation identifies relationships between variables

cat("- CV compares variability across different metrics\n")

## - CV compares variability across different metrics

Chapter 13: Next Steps and Further Practice

13.1 What’s Coming Next

In our next lecture, we’ll explore:

Probability Distributions - Normal, binomial, and other key distributions
Confidence Intervals - Estimating population parameters
Hypothesis Testing - Making statistical decisions
Regression Analysis - Predicting one variable from another

13.2 Practice Recommendations

cat("🎯 Recommended Practice:\n\n")

## 🎯 Recommended Practice:

cat("1. 📝 Manual Calculations:\n")

## 1. 📝 Manual Calculations:

cat("   - Practice variance calculations by hand\n")

##    - Practice variance calculations by hand

cat("   - Work through correlation examples\n")

##    - Work through correlation examples

cat("   - Create your own crosstabs\n\n")

##    - Create your own crosstabs

cat("2. 🔧 R Programming:\n")

## 2. 🔧 R Programming:

cat("   - Use the cars dataset for more analyses\n")

##    - Use the cars dataset for more analyses

cat("   - Try different variable combinations\n")

##    - Try different variable combinations

cat("   - Create your own visualizations\n\n")

##    - Create your own visualizations

cat("3. 💼 Real Applications:\n")

## 3. 💼 Real Applications:

cat("   - Find your own datasets online\n")

##    - Find your own datasets online

cat("   - Apply these concepts to your interests\n")

##    - Apply these concepts to your interests

cat("   - Think about business problems you could solve\n")

##    - Think about business problems you could solve

13.3 Final Motivation

cat("🌟 Congratulations! 🌟\n\n")

## 🌟 Congratulations! 🌟

cat("You've just mastered some of the most important concepts in statistics:\n")

## You've just mastered some of the most important concepts in statistics:

cat("✅ How to measure variability in data\n")

## ✅ How to measure variability in data

cat("✅ How to compare different types of variables\n")

## ✅ How to compare different types of variables

cat("✅ How to study relationships between variables\n")

## ✅ How to study relationships between variables

cat("✅ How to apply these concepts to real business problems\n\n")

## ✅ How to apply these concepts to real business problems

cat("🚀 You're now equipped with powerful tools for:\n")

## 🚀 You're now equipped with powerful tools for:

cat("- Making data-driven business decisions\n")

## - Making data-driven business decisions

cat("- Understanding risk and uncertainty\n")

## - Understanding risk and uncertainty

cat("- Finding patterns and relationships in data\n")

## - Finding patterns and relationships in data

cat("- Communicating statistical findings effectively\n\n")

## - Communicating statistical findings effectively

cat("Keep practicing, stay curious, and remember:\n")

## Keep practicing, stay curious, and remember:

cat("Statistics is not just about numbers - it's about understanding the world!\n")

## Statistics is not just about numbers - it's about understanding the world!

🎓 End of Lecture 3 - Great job making it through 120 minutes of intensive learning!

Remember to save your work and practice with the R code examples. See you next time for more statistical adventures!

# Session information for reproducibility
cat("📋 Session Information:\n")

## 📋 Session Information:

sessionInfo()

## R version 4.4.2 (2024-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: Europe/Budapest
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] UBStats_0.2.2
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     R6_2.5.1          fastmap_1.2.0     xfun_0.49        
##  [5] cachem_1.1.0      knitr_1.49        htmltools_0.5.8.1 rmarkdown_2.29   
##  [9] lifecycle_1.0.4   cli_3.6.3         sass_0.4.9        jquerylib_0.1.4  
## [13] compiler_4.4.2    rstudioapi_0.17.1 tools_4.4.2       evaluate_1.0.1   
## [17] bslib_0.8.0       yaml_2.3.10       rlang_1.1.4       jsonlite_1.8.9

Leksioni 3: Measures of Variability and Bivariate Analysis

Endri Raço

July 24, 2025