This student practice uses Rstudio and Rnotebook to understand data frames and basic data visualization.

Data Understanding

Displaying the First Few Rows

 df <- read.csv("mtcars-3.csv")
head(df)

Use summary() function to get some details of this dataset

summary(df[, c("model", "mpg", "hp", "am")])
##     model                mpg              hp              am        
##  Length:32          Min.   :10.40   Min.   : 52.0   Min.   :0.0000  
##  Class :character   1st Qu.:15.43   1st Qu.: 96.5   1st Qu.:0.0000  
##  Mode  :character   Median :19.20   Median :123.0   Median :0.0000  
##                     Mean   :20.09   Mean   :146.7   Mean   :0.4062  
##                     3rd Qu.:22.80   3rd Qu.:180.0   3rd Qu.:1.0000  
##                     Max.   :33.90   Max.   :335.0   Max.   :1.0000

Change the datatype/class of variable ‘am’ from integer to boolean/logical

# Convert the 'am' column to logical
df$am <- df$am == 1

print(class(df$am))
## [1] "logical"

Data Visualization:

Scatter Plot

The scatter plot displays the relationship between horsepower hp and miles per gallon mpg.

# Create a scatter plot for 'hp' vs 'mpg'
plot(df$hp, df$mpg, main="Scatter Plot of mpg vs hp",
     xlab="Horsepower (hp)", ylab="Miles per Gallon (mpg)",
     pch=19, col=rgb(0,0,0,0.5))

Bar chart

This bar chart displays the distribution of cars based on the number of cylinders they have. Each bar represents a category (4, 6, or 8 cylinders), and the height of each bar indicates the number of cars in that category.

# Calculate the frequency of each number of cylinders
cyl_frequency <- table(df$cyl)

# Create a bar chart
barplot(cyl_frequency, main="Distribution of Cars by Number of Cylinders",
        xlab="Number of Cylinders", ylab="Frequency", 
        col="blue", beside=TRUE)

Histogram

This histogram provides a visual representation of the distribution of miles per gallon (mpg) for the cars in the “mtcars” dataset. Each bar in the histogram represents a bin, and the height of the bar corresponds to the frequency of cars that fall within that bin’s range of mpg values.

# Create a histogram for 'mpg'
hist(df$mpg, main="Histogram of Miles per Gallon (mpg)",
     xlab="Miles per Gallon (mpg)", ylab="Frequency",
     col="green", breaks=10)