Chapter 1: Introduction to Statistical Data Analysis

Welcome to the fascinating world of statistical data analysis! In this comprehensive lecture, we will explore how to analyze and visualize data using R, focusing on understanding the characteristics of individual variables (univariate analysis).

Think of statistics as a powerful lens that helps us see patterns hidden in data. Just as a microscope reveals the invisible world of cells, statistical analysis reveals the invisible patterns in numbers that tell compelling stories about our world.

What You Will Learn Today

By the end of this lecture, you will master:

Data Type Recognition: How to identify and classify different types of variables
Visual Storytelling: Creating appropriate graphs that reveal data patterns
Distribution Analysis: Understanding the shape and characteristics of data distributions
Summary Statistics: Calculating and interpreting measures that describe data
Outlier Detection: Identifying unusual observations that require attention
Professional Reporting: Presenting statistical findings clearly and accurately

Chapter 2: Understanding Data Types - The Foundation of Analysis

Before we can analyze data, we must understand what type of data we’re working with. This is like a chef understanding ingredients before cooking - the technique depends entirely on what you’re working with.

The Data Type Hierarchy

Statistical Variables
├── Qualitative (Categorical)
│   ├── Nominal (No natural order)
│   │   └── Examples: Color, Brand, Country
│   └── Ordinal (Natural order exists)
│       └── Examples: Grade (A,B,C), Size (S,M,L)
└── Quantitative (Numerical)
    ├── Discrete (Countable)
    │   └── Examples: Number of children, Cars owned
    └── Continuous (Measurable)
        └── Examples: Height, Weight, Temperature

Setting Up Our Analytical Environment

Let’s begin our journey by preparing our tools:

```{r setup} # Load the UBStats package - our Swiss Army knife for statistical analysis library(UBStats)

Load our dataset - a comprehensive collection of car information

load(“stat_datasets_cl17.Rdata”)

First glimpse at our data

str(cars)


```{r data_exploration}
# Get a feel for our dataset
head(cars, 10)  # First 10 rows
dim(cars)       # Dimensions: rows and columns
names(cars)     # Variable names

Understanding Our Car Dataset

Imagine you’re a business analyst for a major European car manufacturer. Your dataset contains information about 190 different car models, and each variable tells a different part of the story:

Our Variables Explained:

model: Car model name (Qualitative Nominal)
sales: Annual sales in units (Quantitative Continuous)
bestselling: Best-selling status (1=Yes, 0=No) (Qualitative Nominal)
price_num: Price in dollars (Quantitative Continuous)
price_classes: Price category (low/mid/high) (Qualitative Ordinal)
maxspeed: Maximum speed in km/h (Quantitative Continuous)
acceleration: Time to reach 100 km/h (Quantitative Continuous)
urban_fuelcons: Urban fuel consumption (Quantitative Continuous)
fueltank: Fuel tank capacity in liters (Quantitative Continuous)
weight: Weight in kilograms (Quantitative Continuous)
n_doors_min: Number of doors (Quantitative Discrete)
country: Manufacturing country (Qualitative Nominal)

Chapter 3: Analyzing Qualitative Data - Finding Patterns in Categories

3.1 Nominal Variables: The Country Analysis Story

Let’s start our analysis by exploring where these cars are manufactured. This is our first detective work - uncovering geographical patterns in car production.

{r country_analysis} # Create a frequency distribution table country_freq <- distr.table.x(cars$country) print(country_freq)

💡 What This Table Tells Us: - Count: How many car models each country produces - Prop: The proportion (decimal form) of total production - Each row: Represents a manufacturing region’s contribution

{r country_detailed} # Add percentage for easier interpretation country_detailed <- distr.table.x(cars$country, freq=c("counts","prop", "perc")) print(country_detailed)

Visual Storytelling: Pie Charts vs Bar Charts

{r country_pie} # Pie chart - showing the "whole pie" of car production distr.plot.x(cars$country, plot.type = "pie")

When to Use Pie Charts: - When showing parts of a whole - When proportions are the main message - Maximum 5-7 categories for clarity

{r country_bars} # Bar chart - better for comparing specific values distr.plot.x(cars$country, plot.type = "bars")

When to Use Bar Charts: - When comparing frequencies between categories - When you have many categories - When exact values matter more than proportions

🔍 Business Insight Discovery

```{r european_analysis} # Calculate percentage of European car production # European countries: Europe-others, France, Germany, Italy european_countries <- c(“Europe - others”, “France”, “Germany”, “Italy”)

From our table: 14% + 15% + 26% + 11% = 66%

european_percentage <- 14 + 15 + 26 + 11 cat(“🇪🇺 European Car Models:”, european_percentage, “%”) cat(“🥇 Top Manufacturing Country: Germany (26%)”) cat(“🌏 Non-European Production:”, 100 - european_percentage, “%”)


## 3.2 Ordinal Variables: The Price Class Hierarchy

Now let's examine price categories - here, order matters! Low < Mid < High.

```{r price_classes_setup}
# Convert to factor with correct logical order
cars$price_classes <- factor(cars$price_classes, 
                            levels=c("low","mid","high"))

# Verify the ordering worked
levels(cars$price_classes)

```{r price_classes_analysis} # Frequency distribution price_class_freq <- distr.table.x(cars$price_classes) print(price_class_freq)

Visual representation

distr.plot.x(cars$price_classes, plot.type = “bars”)


### 📊 Key Business Question: Market Accessibility

```{r price_accessibility}
# What percentage of cars are priced at or below 'mid' class?
# This tells us about market accessibility
affordable_percentage <- 27 + 55  # low + mid
cat("💰 Affordable Cars (Low + Mid price):", affordable_percentage, "%\n")
cat("💎 Luxury Cars (High price):", 100 - affordable_percentage, "%\n")

Business Interpretation: - 82% of car models are priced in low or mid categories - Only 18% are in the luxury segment - This suggests a market focus on accessibility rather than exclusivity

Chapter 4: Analyzing Discrete Quantitative Data - Counting What Matters

Discrete variables are like counting objects - you can have 2 doors or 3 doors, but not 2.5 doors.

4.1 The Door Configuration Analysis

{r doors_analysis} # Frequency distribution for number of doors doors_freq <- distr.table.x(cars$n_doors_min) print(doors_freq)

Visualizing Discrete Data: The Spike Plot

{r doors_spike} # Spike plot - perfect for discrete data distr.plot.x(cars$n_doors_min, plot.type="spike", freq="prop")

Why Spike Plots for Discrete Data? - Each spike represents an exact value - Height shows frequency/proportion - No artificial binning needed - Clear visual separation between values

Cumulative Analysis: The Running Total Story

{r doors_cumulative} # Cumulative frequency plot distr.plot.x(cars$n_doors_min, plot.type="cumulative", freq="prop")

Reading Cumulative Plots: - X-axis: Number of doors - Y-axis: Cumulative proportion (running total) - Shows “what percentage have X doors or fewer”

🚗 Practical Business Question

{r doors_business_question} # How many car models have more than 4 doors? # From our frequency table: 5-door (134) + 7-door (1) = 135 cars_more_than_4_doors <- 134 + 1 cat("🚪 Car models with more than 4 doors:", cars_more_than_4_doors, "models\n") cat("📊 This represents:", round(135/190*100, 1), "% of all models\n")

Chapter 5: Analyzing Continuous Data - The Art of Histograms

Continuous data requires binning - grouping similar values together to reveal patterns. This is like organizing a library: individual books (data points) are grouped into sections (bins) to see the overall collection structure.

5.1 Fuel Tank Capacity: A Distribution Story

{r fuel_tank_histogram} # Create histogram with 5 equal-width classes distr.plot.x(cars$fueltank, plot.type = "hist", breaks = 5)

{r fuel_tank_table} # Corresponding frequency table fuel_table <- distr.table.x(cars$fueltank, breaks = 5) print(fuel_table)

🔍 Understanding Histogram Components

Interval Notation Explained: - [24.9,40): Includes 24.9, excludes 40 - [40,55): Includes 40, excludes 55 - [85,100]: Includes both 85 and 100 (last interval)

Distribution Characteristics: - Peak: Most cars (43%) have fuel tanks between 55-70 liters - Shape: Slightly right-skewed (tail extends right) - Range: From ~25 liters to 100 liters

📈 Practical Business Calculation

{r fuel_tank_business} # What percentage of cars have fuel tanks between 40 and 85 liters? # [40,55) + [55,70) + [70,85) = 32% + 43% + 13% = 88% fuel_40_85_percent <- 32 + 43 + 13 cat("⛽ Cars with fuel tank 40-85 liters:", fuel_40_85_percent, "%\n") cat("💡 This covers the vast majority of the market!\n")

5.2 The Sales Distribution Challenge

Sometimes equal-width bins don’t tell the full story. Let’s see why:

```{r sales_equal_bins} # Sales distribution with 8 equal-width classes distr.plot.x(cars$sales, plot.type = “hist”, breaks=8)

The corresponding table reveals the problem

distr.table.x(cars$sales, breaks=8)


**❗ The Problem with Equal-Width Bins:**
- 90% of cars fall in the first bin [44.2, 19400)
- Remaining bins have very few observations
- We lose detail about the majority of data
- The visualization is not informative

### 💡 The Solution: Custom Bin Widths

```{r sales_custom_bins}
# Use custom breaks that make business sense
custom_breaks <- c(0, 2000, 5000, 20000, 160000)
distr.plot.x(cars$sales, plot.type = "hist", breaks = custom_breaks)

# Much more informative frequency table
sales_custom <- distr.table.x(cars$sales, breaks = custom_breaks, freq=c("count","prop","dens"))
print(sales_custom)

🎯 Business Insights from Custom Bins: - Low volume [0-2000): 28% of models - Medium volume [2000-5000): 30% of models
- High volume [5000-20000): 32% of models - Very high volume [20000+): 9% of models

📊 Advanced Question: Interpolation

```{r sales_interpolation} # Approximate percentage of cars with sales between 1000 and 3000 units # This requires interpolation within bins

[0,2000) contains 54 cars (28%)

We need roughly from 1000 to 2000 (half the bin) = 14%

[2000,5000) contains 57 cars (30%)

We need roughly from 2000 to 3000 (1/3 of bin) = 10%

Total approximation: 14% + 10% = 24%

cat(“📈 Estimated cars with sales 1000-3000 units: ~24%”) cat(“🔍 This is an approximation using linear interpolation”)


---

# Chapter 6: Understanding Distribution Shapes - The Three Personalities

Every dataset has a personality revealed through its shape. Learning to read these personalities is crucial for proper analysis.

## 6.1 The Three Distribution Personalities

### 🎯 Symmetric Distribution: The Balanced Personality

  /\
 /  \
/    \

- Data balanced around center
- Mean ≈ Median ≈ Mode
- Bell-shaped appearance
- Most observations near center

### ➡️ Right-Skewed (Positively Skewed): The Long Right Tail

/
/ _ / _ ___

- Tail extends to the right
- Mean > Median > Mode
- Most values concentrated on left
- Few extreme high values

### ⬅️ Left-Skewed (Negatively Skewed): The Long Left Tail

    /\
___/  \

___/

- Tail extends to the left  
- Mode > Median > Mean
- Most values concentrated on right
- Few extreme low values

## 6.2 Real Examples from Our Data

```{r distribution_examples}
# Example 1: Fuel tank (approximately symmetric)
distr.plot.x(cars$fueltank, plot.type = "hist", breaks = 6)

# Example 2: Sales (heavily right-skewed)
distr.plot.x(cars$sales, plot.type = "hist", breaks = custom_breaks)

# Example 3: Acceleration (slightly right-skewed)
distr.plot.x(cars$acceleration, plot.type = "hist", breaks = 6)

Chapter 7: Cumulative Distributions - The Running Total Story

Cumulative distributions answer “what percentage of observations are at or below a certain value?”

7.1 The Price Ogive Analysis

{r price_ogive} # Create ogive (cumulative frequency curve) for price distr.plot.x(cars$price_num, plot.type="cumulative", breaks = 10, freq = "prop")

🎯 Reading the Ogive: A Critical Business Question

Question: “Is the minimum price of the top 20% most expensive car models greater than $40,000?”

How to Read the Ogive: 1. Top 20% most expensive = 80th percentile 2. Find 0.8 on Y-axis (80% cumulative) 3. Draw horizontal line to curve 4. Drop vertical line to X-axis 5. Read the price value

{r ogive_interpretation} cat("🔍 Ogive Reading Exercise:\n") cat("📊 At 80% cumulative frequency (80th percentile):\n") cat("💰 Price is approximately $40,000\n") cat("✅ Therefore, the statement is approximately CORRECT\n") cat("📈 The minimum price for top 20% expensive cars ≈ $40,000\n")

Chapter 8: Descriptive Statistics - Summarizing Data with Numbers

Numbers tell stories too. Let’s learn to calculate and interpret the key statistics that describe our data.

8.1 Central Tendency: Finding the “Typical” Value

🎯 Mean, Median, and Mode for Price

{r central_tendency} # Calculate central tendency measures for price price_central <- distr.summary.x(cars$price_num, stats="central") print(price_central)

Interpreting the Results: - Mean: $24,837 (arithmetic average - sum ÷ count) - Median: $19,714 (middle value when sorted) - Mode: $16,951 (most frequent value, appears 14 times)

```{r skewness_analysis} # Analyze the relationship between mean and median mean_price <- 24837.48 median_price <- 19713.5

cat(“📊 Distribution Shape Analysis:”) cat(“💰 Mean Price: $”, round(mean_price, 2), “”) cat(“🎯 Median Price: $”, round(median_price, 2), “”) cat(“📈 Mean - Median = $”, round(mean_price - median_price, 2), “”) cat(“🔍 Since Mean > Median: RIGHT-SKEWED distribution”) cat(“💡 A few very expensive cars pull the average up!”)


## 8.2 Percentiles and Quartiles: Dividing the Data

Quartiles divide data into four equal parts, like cutting a pizza into quarters.

```{r quartiles_analysis}
# Get quartiles for price analysis
price_quartiles <- distr.summary.x(cars$price_num, stats="quartiles")
print(price_quartiles)

🏆 Business Intelligence from Quartiles

{r quartile_interpretation} cat("🎯 Price Market Segmentation:\n") cat("💎 Luxury Segment (Top 25%): Above $", round(price_quartiles$p75, 0), "\n") cat("🔶 Premium Segment (50-75%): $", round(price_quartiles$p50, 0), " - $", round(price_quartiles$p75, 0), "\n") cat("🔸 Mid-market (25-50%): $", round(price_quartiles$p25, 0), " - $", round(price_quartiles$p50, 0), "\n") cat("💚 Budget Segment (Bottom 25%): Below $", round(price_quartiles$p25, 0), "\n")

🏁 Performance Analysis: The Need for Speed

```{r performance_percentiles} # 90th percentiles for top performance cars speed_p90 <- distr.summary.x(cars$maxspeed, stats="p90") accel_p90 <- distr.summary.x(cars$acceleration, stats=“p90”)

cat(“🏎️ TOP 10% PERFORMANCE THRESHOLDS:”) cat(“⚡ Minimum speed for top 10%:”, speed_p90$p90, " km/h\n") cat("🚀 Maximum acceleration time for top 10%: ", accel_p90$p90, ” seconds“)


## 8.3 The Five-Number Summary: Complete Picture

```{r five_number_summary}
# Five-number summary for acceleration
accel_summary <- distr.summary.x(cars$acceleration, stats="fivenumber")
print(accel_summary)

cat("\n🏁 ACCELERATION PERFORMANCE BREAKDOWN:\n")
cat("🥇 Fastest car: ", accel_summary$min, " seconds (0-100 km/h)\n")
cat("📊 Q1 (25th percentile): ", accel_summary$q1, " seconds\n") 
cat("🎯 Median (50th percentile): ", accel_summary$median, " seconds\n")
cat("📊 Q3 (75th percentile): ", accel_summary$q3, " seconds\n")
cat("🐌 Slowest car: ", accel_summary$max, " seconds\n")

Chapter 9: Boxplots - The Swiss Army Knife of Data Visualization

Boxplots pack an incredible amount of information into a simple graphic. They’re like a data summary in visual form.

9.1 Anatomy of a Boxplot

{r boxplot_maxspeed} # Create boxplot for maximum speed distr.plot.x(cars$maxspeed, plot.type = "boxplot")

📦 Boxplot Components Explained

    outlier  •
             |
    whisker  |---- Maximum within 1.5×IQR of Q3
             |
       Q3    ┌────┐
             │    │  ← IQR (Interquartile Range)
   Median    ├────┤  ← Dark line inside box
             │    │
       Q1    └────┘
             |
    whisker  |---- Minimum within 1.5×IQR of Q1
             |
    outlier  •

🔍 Outlier Detection

{r outlier_analysis} # Check for outliers in maximum speed cat("🚨 OUTLIER ANALYSIS FOR MAXIMUM SPEED:\n") cat("📊 Any points beyond the whiskers are outliers\n") cat("⚡ These represent cars with unusually high speeds\n") cat("🏎️ Could be supercars or sports cars\n") cat("🔍 Outliers require special attention in analysis\n")

9.2 Reading Distribution Shape from Boxplots

{r multiple_boxplots} # Compare different variables par(mfrow=c(2,2)) # 2x2 grid of plots distr.plot.x(cars$acceleration, plot.type = "boxplot", main="Acceleration") distr.plot.x(cars$price_num, plot.type = "boxplot", main="Price") distr.plot.x(cars$maxspeed, plot.type = "boxplot", main="Max Speed") distr.plot.x(cars$weight, plot.type = "boxplot", main="Weight") par(mfrow=c(1,1)) # Reset to single plot

📊 Shape Recognition Guide

Symmetric Distribution: - Median line centered in box - Equal whisker lengths - No skewness visible

Right-Skewed Distribution: - Median closer to Q1 (left side of box) - Right whisker longer than left - Outliers on right side

Left-Skewed Distribution: - Median closer to Q3 (right side of box) - Left whisker longer than right - Outliers on left side

Chapter 10: Case Study - Complete Market Analysis

Let’s put everything together in a comprehensive business analysis.

10.1 Executive Summary Generation

```{r executive_summary} cat(“🚗 AUTOMOTIVE MARKET ANALYSIS REPORT”) cat(“=” , rep(“=”, 45), “”, sep=““)

Basic dataset info

cat(“📊 Dataset:”, nrow(cars), ” car models analyzed“)

Geographic distribution

cat(“🌍 GEOGRAPHIC DISTRIBUTION:”) cat(“🇩🇪 Germany leads with 26% market share”) cat(“🇪🇺 European brands dominate: 66% of models”) cat(“🌏 Asia (Japan + others): 25% of models”) cat(“🇺🇸 US brands: 9% of models”)

Price analysis

price_stats <- distr.summary.x(cars$price_num, stats=“central”) cat(“💰 PRICE ANALYSIS:”) cat(“📈 Average price: $", round(price_stats$mean, 0),”“) cat(”🎯 Median price: $", round(price_stats$median, 0), “”) cat(“📊 Distribution: Right-skewed (luxury cars drive average up)”) cat(“💚 Budget-friendly focus: 82% priced low-to-mid range”)

Performance insights

cat(“🏎️ PERFORMANCE INSIGHTS:”) cat(“⚡ Top 10% speed threshold: 226 km/h”) cat(“🚀 Top 10% acceleration: Under 15.02 seconds”) cat(“🚨 Speed outliers detected (supercars)”)


## 10.2 Detailed Statistical Profile

```{r detailed_profile}
# Create comprehensive statistical summary
variables_to_analyze <- c("price_num", "maxspeed", "acceleration", "weight", "fueltank")

cat("\n📋 DETAILED STATISTICAL PROFILES:\n")
cat("=" , rep("=", 50), "\n", sep="")

for(var in variables_to_analyze) {
  cat("\n📊", toupper(gsub("_", " ", var)), ":\n")
  summary_stats <- distr.summary.x(cars[[var]], stats="fivenumber")
  central_stats <- distr.summary.x(cars[[var]], stats="central")
  
  cat("   Range: ", summary_stats$min, " to ", summary_stats$max, "\n")
  cat("   Mean: ", round(central_stats$mean, 2), "\n")
  cat("   Median: ", round(central_stats$median, 2), "\n")
  cat("   Q1-Q3: ", summary_stats$q1, " to ", summary_stats$q3, "\n")
  
  # Determine skewness
  if(central_stats$mean > central_stats$median) {
    cat("   Shape: Right-skewed\n")
  } else if(central_stats$mean < central_stats$median) {
    cat("   Shape: Left-skewed\n")
  } else {
    cat("   Shape: Approximately symmetric\n")
  }
}

Chapter 11: Advanced Exercises and Practice

11.1 Guided Practice Problems

🎯 Exercise 1: Distribution Detective

```{r exercise_1} cat(“🔍 EXERCISE 1: DISTRIBUTION DETECTIVE”) cat(“Analyze the fuel consumption distribution:”)

Create histogram

distr.plot.x(cars$urban_fuelcons, plot.type = “hist”, breaks = 6)

Calculate statistics

fuel_stats <- distr.summary.x(cars$urban_fuelcons, stats="central") cat("Mean fuel consumption: ", round(fuel_stats$mean, 2), ” L/100km“) cat(”Median fuel consumption: “, round(fuel_stats$median, 2),” L/100km“)

Student task: Determine the shape

if(fuel_stats$mean > fuel_stats$median) { cat(“✅ ANSWER: Right-skewed distribution”) cat(“💡 Interpretation: Most cars are fuel-efficient, but some gas-guzzlers pull the average up”) }


### 🎯 Exercise 2: Percentile Mastery

```{r exercise_2}
cat("\n🎯 EXERCISE 2: PERCENTILE MASTERY\n")
cat("Find the weight thresholds for different car categories:\n\n")

weight_summary <- distr.summary.x(cars$weight, stats="quartiles")
print(weight_summary)

cat("\n🚗 CAR WEIGHT CATEGORIES:\n")
cat("🪶 Lightweight (bottom 25%): Under ", weight_summary$p25, " kg\n")
cat("⚖️ Standard weight (25-75%): ", weight_summary$p25, "-", weight_summary$p75, " kg\n") 
cat("🏋️ Heavyweight (top 25%): Over ", weight_summary$p75, " kg\n")

🎯 Exercise 3: Outlier Investigation

```{r exercise_3} cat(“🚨 EXERCISE 3: OUTLIER INVESTIGATION”)

Check for outliers in different variables

distr.plot.x(cars$acceleration, plot.type = “boxplot”)

cat(“Investigation questions:”) cat(“1. Are there any acceleration outliers?”) cat(“2. What might cause extremely slow acceleration?”) cat(“3. How would outliers affect the mean vs median?”)

accel_stats <- distr.summary.x(cars$acceleration, stats="fivenumber") cat("\n📊 Acceleration Analysis:\n") cat("🚀 Fastest: ", accel_stats$min, ” seconds“) cat(”🐌 Slowest: “, accel_stats$max, " seconds\n") cat("📈 Range: ", accel_stats$max - accel_stats$min,” seconds“)


## 11.2 Real-World Application Scenarios

### 🏢 Scenario 1: Product Development Strategy

```{r scenario_1}
cat("🏢 BUSINESS SCENARIO 1: PRODUCT DEVELOPMENT\n")
cat("=" , rep("=", 50), "\n", sep="")
cat("You're developing a new car model. Use data to inform decisions:\n\n")

# Market positioning analysis
doors_popular <- distr.table.x(cars$n_doors_min)
cat("🚪 DOOR CONFIGURATION STRATEGY:\n")
print(doors_popular)
cat("💡 Recommendation: Focus on 5-door models (71% market preference)\n\n")

# Price positioning
price_classes_dist <- distr.table.x(cars$price_classes)
cat("💰 PRICE POSITIONING STRATEGY:\n")
print(price_classes_dist)
cat("💡 Recommendation: Target mid-price segment (55% of market)\n")

🎯 Scenario 2: Competitive Analysis

```{r scenario_2} cat(“🎯 BUSINESS SCENARIO 2: COMPETITIVE ANALYSIS”) cat(“=” , rep(“=”, 50), “”, sep=““)

Performance benchmarking

speed_benchmark <- distr.summary.x(cars$maxspeed, stats="quartiles") cat("⚡ SPEED BENCHMARKING:\n") cat("🥉 Entry level (25th percentile): ", speed_benchmark$p25, ” km/h“) cat(”🥈 Competitive (50th percentile): “, speed_benchmark$p50, " km/h\n") cat("🥇 Premium (75th percentile): ", speed_benchmark$p75,” km/h“) cat(”💡 To be competitive, aim for at least “, speed_benchmark$p50,” km/h“) ```

Chapter 12: Professional Reporting and Communication

12.1 Creating Executive Dashboards

```{r executive_dashboard} cat(“📊 EXECUTIVE DASHBOARD: KEY METRICS”) cat(“=” , rep(“=”, 60), “”, sep=““)

Key Performance Indicators (KPIs)

total_models <- nrow(cars) german_models <- sum(cars$country == "Germany") luxury_models <- sum(cars$price_classes == “high”, na.rm = TRUE) high_performance <- sum(cars$maxspeed > 200, na.rm = TRUE)

cat(“🎯 MARKET OVERVIEW:”) cat(” Total Models Analyzed: “, total_models,”“) cat(” German Market Share: “, round(german_models/total_models100, 1), ”%”) cat(” Luxury Segment: ”, round(luxury_models/total_models100, 1),”%“) cat(” High-Performance Cars (>200 km/h): “, round(high_performance/total_models*100, 1), “%”)

Price insights

price_q <- distr.summary.x(cars$price_num, stats=“quartiles”) cat(“💰 PRICE INTELLIGENCE:”) cat(” Market Entry Price: $", round(price_q$min, 0), “”) cat(” Budget Threshold (25%): $", round(price_q$p25, 0), “

Leksioni 1: Univariate Data Visualization and Descriptive Statistics

Endri Raço