1. Introduction

1.1 Dataset Selection

Objective: Introduce the “mtcars” dataset and explain the reason for its selection.

The “mtcars” dataset is a well-known dataset in R that provides useful details about different car models. It was compiled from the 1974 Motor Trend US magazine and includes data on 32 cars, such as miles per gallon (mpg), number of cylinders (cyl), displacement (disp), horsepower (hp), and other factors that significantly impact a car’s performance and fuel efficiency.

I chose the “mtcars” dataset because it’s widely utilized in basic data analysis and statistical modeling. It’s great for examining how various car features relate to fuel efficiency and performance. Its simplicity and real-world applicability make it perfect for practicing data manipulation, visualization, and modeling in R.

# Load the mtcars dataset
data(mtcars)

# Display the first few rows of the dataset
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

2. Description of the Dataset

2.1 Variables

Objective: Describe the variables present in the “mtcars” dataset.

# Display column names and their corresponding variables
names(mtcars)
##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb"

Interpretation of these results:

The “mpg” column represents miles per gallon (continuous variable).

The “cyl” column represents the number of cylinders (categorical variable).

The “disp” column represents displacement (continuous variable).

The “hp” column represents horsepower (continuous variable).

The “drat” column represents rear axle ratio (continuous variable).

The “wt” column represents weight (continuous variable).

The “qsec” column represents quarter mile time (continuous variable).

The “vs” column represents engine (0 = V-shaped, 1 = straight; categorical variable).

The “am” column represents transmission (0 = automatic, 1 = manual; categorical variable).

The “gear” column represents number of forward gears (categorical variable).

The “carb” column represents number of carburetors (categorical variable).


2.2 Observations

Objective: Discuss the number of observations in the “mtcars” dataset.

# Check the number of rows/observations in the dataset
nrow(mtcars)
## [1] 32

Interpretation: Each row in the “mtcars” dataset represents a different car model. There are a total of 32 rows/observations in the dataset.


2.3 Missing Values

Objective: Analyze if there are any missing values in the dataset.

# Check for missing values
any(is.na(mtcars))
## [1] FALSE

Interpretation: There are no missing values in “mtcars” dataset.


3. Exploratory Data Aanlysis

3.1 Summary Statistics for Numeric Variables:

summary(mtcars[, c("mpg", "disp", "hp", "drat", "wt", "qsec")])
##       mpg             disp             hp             drat      
##  Min.   :10.40   Min.   : 71.1   Min.   : 52.0   Min.   :2.760  
##  1st Qu.:15.43   1st Qu.:120.8   1st Qu.: 96.5   1st Qu.:3.080  
##  Median :19.20   Median :196.3   Median :123.0   Median :3.695  
##  Mean   :20.09   Mean   :230.7   Mean   :146.7   Mean   :3.597  
##  3rd Qu.:22.80   3rd Qu.:326.0   3rd Qu.:180.0   3rd Qu.:3.920  
##  Max.   :33.90   Max.   :472.0   Max.   :335.0   Max.   :4.930  
##        wt             qsec      
##  Min.   :1.513   Min.   :14.50  
##  1st Qu.:2.581   1st Qu.:16.89  
##  Median :3.325   Median :17.71  
##  Mean   :3.217   Mean   :17.85  
##  3rd Qu.:3.610   3rd Qu.:18.90  
##  Max.   :5.424   Max.   :22.90

Interpretation: The summary statistics provide information about the central tendency, spread, and distribution of the numeric variables in the dataset, including minimum, maximum, median, mean, and quartiles.


3.2 Frequency Tables for Categorical Variables:

table(mtcars$cyl)
## 
##  4  6  8 
## 11  7 14
table(mtcars$vs)
## 
##  0  1 
## 18 14
table(mtcars$am)
## 
##  0  1 
## 19 13
table(mtcars$gear)
## 
##  3  4  5 
## 15 12  5
table(mtcars$carb)
## 
##  1  2  3  4  6  8 
##  7 10  3 10  1  1

Interpretation: The frequency tables displays the count of observations for each category of the categorical variables in the dataset, providing insights into the distribution of these variables.We do these descriptive aalytics to gain a better understanding of the “mtcars” dataset’s characteristics and distribution of variables.


4. Questions

4.1 What is the distribution of miles per gallon (mpg) across different car models?

Objective: To analyze the variation in fuel efficiency (mpg) among different car models, identifying common trends and the range of mpg values across the dataset

hist(mtcars$mpg, main = "Distribution of Miles Per Gallon", xlab = "Miles Per Gallon", ylab = "Frequency")

Interpretation: The histogram of miles per gallon (mpg) shows a bell-shaped distribution, with a peak in the mid to high teens mpg. This indicates that many car models in the dataset have mpg values clustered in this range. However, there’s notable variability, with some models achieving much higher or lower mpg. This gives insight into the fuel efficiency across different car models, showcasing both the range of mpg values and common trends in mpg levels among the models.


4.2 How does the number of cylinders (cyl) affect the miles per gallon (mpg) of cars?

Objective: To investigate the relationship between the number of cylinders (cyl) and the miles per gallon (mpg) of cars, aiming to understand how changes in the number of cylinders impact fuel efficiency.

boxplot(mpg ~ cyl, data = mtcars, main = "Miles Per Gallon by Number of Cylinders", xlab = "Number of Cylinders", ylab = "Miles Per Gallon")

Interpretation: The boxplot depicting the relationship between the number of cylinders (cyl) and miles per gallon (mpg) reveals interesting insights. As the number of cylinders increases, there seems to be a general trend of decreasing mpg, indicating that cars with more cylinders tend to have lower fuel efficiency. This suggests that the number of cylinders is an important factor influencing a car’s fuel economy.


4.3 How does the transmission type (am) relate to the miles per gallon (mpg) and horsepower (hp) of cars?

Objective: To explore the relationship between transmission type (am) and both miles per gallon (mpg) and horsepower (hp) of cars, aiming to understand how different transmission types affect fuel efficiency and engine power.

plot(mtcars$mpg, mtcars$hp, col = ifelse(mtcars$am == 0, "red", "blue"), pch = ifelse(mtcars$am == 0, 16, 17), main = "Relationship between MPG, HP, and Transmission Type", xlab = "Miles Per Gallon (mpg)", ylab = "Horsepower (hp)", xlim = c(0, 40), ylim = c(0, 400))
legend("topright", legend = c("Automatic", "Manual"), col = c("red", "blue"), pch = c(16, 17), title = "Transmission Type")

Interpretation: The scatter plot visualizes the relationship between miles per gallon (mpg), horsepower (hp), and transmission type (am) of cars. Each point represents a car model, with red points indicating automatic transmissions and blue points indicating manual transmissions. From the visualization, it can be observed that there is no clear pattern between mpg, hp, and transmission type. However, there seems to be a tendency for cars with manual transmissions (blue points) to have higher horsepower and slightly better mpg compared to cars with automatic transmissions (red points). This suggests that manual transmissions may be associated with higher performance and potentially better fuel efficiency in the dataset.