Questions

Find the mpg data in R. This is the dataset that you will use for the first three questions.

  1. Create a box plot using ggplot showing engine displacement displ for each transmission type trans from the mpg data set. Hint: Can you figure out how to rotate the x-axis categories so they are all readable?

#Use geom_boxplot and add coord_flip() to flip the x-axis and y-axis so we can make the words reafable.

data("mpg")
ggplot(mpg, aes(x = trans, y = displ)) + geom_boxplot() + labs(title = "Engine displacement for each transmission type",
         x = "transmission type",
         y = "engine displacement") + coord_flip()

  1. Create a histogram or bar graph using ggplot, that shows the frequency of each class type in mpg.

#Apply geom_bar and add geom_text to show the data of frequency

ggplot(mpg, aes(x= class)) + geom_bar() + geom_text(stat='count', aes(label=..count..), vjust=-0.25)+
    labs(title = "The frequency of each `class` type",
         x = "Class",
         y = "Frequency")

  1. Next show a stacked bar graph using ggplot, that shows the frequency of each cyl type within class. Hint:You might have to use (group) or convert cyl to a factor (as.factor).

#On top of the frequency of class type, use (as.factor) to convert the cyl to into a factor. Then ad position “stack” to show the number within ‘class’.

ggplot(mpg, aes(x = class)) +
    geom_bar(aes(fill = as.factor(cyl)), position = "stack") +
    labs(title = "The frequency of each `cyl` type within `class`",
         x = "Class",
         y = "Frequency")

4. Draw a scatter plot using ggplot showing the relationship between cty and hwy. Explain the utility or lack of utility of this graphic.

#Use geom_point to create a scatter plot. The values of cty and ‘hwy’ data points may overlap each other, which causes the problem - overplotting.

ggplot(mpg, aes(cty,hwy)) +
    geom_point()

  1. Design a visualization of your choice using ggplot using mpg and write a brief summary about why you chose that visualization.

#I want to know the relationship between hwy and displ by using scatter plot. To avoid overplotting, I apply geom_jitter. I also define the color of class (the type of car) to provide a deeper level of information, and I use geom_smooth to show a guided smoothed line.

ggplot(mpg, aes(x = displ, y = hwy, color = class)) + 
    geom_point() +
    geom_jitter(position = position_jitter(width = 0.5, height = 0.5)) +
    geom_smooth(se =FALSE, method = lm) + 
    labs(title = "The relationship between `Highway miles` of fuel and `Engine displacement` in differen types of car",
         x = "Engine displacement, in liters",
         y = "Highway miles per gallon") + 
    theme(title=element_text(size=8))