Objective: Introduce the “mtcars” dataset and explain the reason for its selection.
The “mtcars” dataset is a well-known dataset in R that provides useful details about different car models. It was compiled from the 1974 Motor Trend US magazine and includes data on 32 cars, such as miles per gallon (mpg), number of cylinders (cyl), displacement (disp), horsepower (hp), and other factors that significantly impact a car’s performance and fuel efficiency.
I chose the “mtcars” dataset because it’s widely utilized in basic data analysis and statistical modeling. It’s great for examining how various car features relate to fuel efficiency and performance. Its simplicity and real-world applicability make it perfect for practicing data manipulation, visualization, and modeling in R.
# Load the mtcars dataset
data(mtcars)
# Display the first few rows of the dataset
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Objective: Describe the variables present in the “mtcars” dataset.
# Display column names and their corresponding variables
names(mtcars)
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
Interpretation of these results:
The “mpg” column represents miles per gallon (continuous variable).
The “cyl” column represents the number of cylinders (categorical variable).
The “disp” column represents displacement (continuous variable).
The “hp” column represents horsepower (continuous variable).
The “drat” column represents rear axle ratio (continuous variable).
The “wt” column represents weight (continuous variable).
The “qsec” column represents quarter mile time (continuous variable).
The “vs” column represents engine (0 = V-shaped, 1 = straight; categorical variable).
The “am” column represents transmission (0 = automatic, 1 = manual; categorical variable).
The “gear” column represents number of forward gears (categorical variable).
The “carb” column represents number of carburetors (categorical variable).
Objective: Discuss the number of observations in the “mtcars” dataset.
# Check the number of rows/observations in the dataset
nrow(mtcars)
## [1] 32
Each row in the “mtcars” dataset represents a different car model. There are a total of 32 rows/observations in the dataset.
Objective: Analyze if there are any missing values in the dataset.
# Check for missing values
any(is.na(mtcars))
## [1] FALSE
There are no missing values in “mtcars” dataset.
summary(mtcars[, c("mpg", "disp", "hp", "drat", "wt", "qsec")])
## mpg disp hp drat
## Min. :10.40 Min. : 71.1 Min. : 52.0 Min. :2.760
## 1st Qu.:15.43 1st Qu.:120.8 1st Qu.: 96.5 1st Qu.:3.080
## Median :19.20 Median :196.3 Median :123.0 Median :3.695
## Mean :20.09 Mean :230.7 Mean :146.7 Mean :3.597
## 3rd Qu.:22.80 3rd Qu.:326.0 3rd Qu.:180.0 3rd Qu.:3.920
## Max. :33.90 Max. :472.0 Max. :335.0 Max. :4.930
## wt qsec
## Min. :1.513 Min. :14.50
## 1st Qu.:2.581 1st Qu.:16.89
## Median :3.325 Median :17.71
## Mean :3.217 Mean :17.85
## 3rd Qu.:3.610 3rd Qu.:18.90
## Max. :5.424 Max. :22.90
The summary statistics provide information about the central tendency, spread, and distribution of the numeric variables in the dataset, including minimum, maximum, median, mean, and quartiles.
table(mtcars$cyl)
##
## 4 6 8
## 11 7 14
table(mtcars$vs)
##
## 0 1
## 18 14
table(mtcars$am)
##
## 0 1
## 19 13
table(mtcars$gear)
##
## 3 4 5
## 15 12 5
table(mtcars$carb)
##
## 1 2 3 4 6 8
## 7 10 3 10 1 1
The frequency tables displays the count of observations for each category of the categorical variables in the dataset, providing insights into the distribution of these variables.
We do these descriptive aalytics to gain a better understanding of the “mtcars” dataset’s characteristics and distribution of variables.