exercise 3.2.4

library(ggplot2)
ggplot(data=mpg)

  1. Run ggplot(data = mpg). What do you see?

  2. How many rows are in mpg? How many columns?

  3. What does the drv variable describe? Read the help for ?mpg to find out.

4.Make a scatterplot of hwy vs cyl.

5.What happens if you make a scatterplot of class vs drv? Why is the plot not useful?
  1. I see nothing. It’s empty.

  2. 234*11 234 rows and 11 columns.

  3. the type of drive train, where f = front-wheel drive, r = rear wheel drive, 4 = 4wd.

  4. scatterplot of hwy vs cyl

ggplot(data=mpg)+
  geom_point(aes(x=cyl,y=hwy))

5.scatterplot of class vs drv

ggplot(mpg)+
  geom_point(aes(x=class,y=drv))

It only explains different types of car have different types of wheel drive.

#exercise 3.3.1 1. What’s gone wrong with this code? Why are the points not blue? ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = “blue”))

Ans: Ans: Color=“blue” is included in the mapping argument. It is as a whole treated as aesthetic.

ggplot(mpg)+
  geom_point(mapping=aes(x=displ,y=hwy),color="blue")

  1. Which variables in mpg are categorical? Which variables are continuous? (Hint: type ?mpg to read the documentation for the dataset). How can you see this information when you run mpg?

Ans: categories: manufacturer,model, trans,drv, fl,class. continuous: cty,hwy,displ, cly, year

We can use (print) function. Those categories variables are marked as “chr”; continuoous variables are marked as “int”.

print(mpg)
## # A tibble: 234 x 11
##    manufacturer model    displ  year   cyl trans   drv     cty   hwy fl    class
##    <chr>        <chr>    <dbl> <int> <int> <chr>   <chr> <int> <int> <chr> <chr>
##  1 audi         a4         1.8  1999     4 auto(l… f        18    29 p     comp…
##  2 audi         a4         1.8  1999     4 manual… f        21    29 p     comp…
##  3 audi         a4         2    2008     4 manual… f        20    31 p     comp…
##  4 audi         a4         2    2008     4 auto(a… f        21    30 p     comp…
##  5 audi         a4         2.8  1999     6 auto(l… f        16    26 p     comp…
##  6 audi         a4         2.8  1999     6 manual… f        18    26 p     comp…
##  7 audi         a4         3.1  2008     6 auto(a… f        18    27 p     comp…
##  8 audi         a4 quat…   1.8  1999     4 manual… 4        18    26 p     comp…
##  9 audi         a4 quat…   1.8  1999     4 auto(l… 4        16    25 p     comp…
## 10 audi         a4 quat…   2    2008     4 manual… 4        20    28 p     comp…
## # … with 224 more rows
  1. Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs. continuous variables?

#Map a continuous variable to color

ggplot(mpg)+
  geom_point(mapping=aes(x=displ,y=hwy,color=cyl))

#Map a continuous variable to size.

ggplot(mpg)+
  geom_point(mapping=aes(x=displ,y=hwy,size=cyl))

#map a continuous variable by shape: a continuous variable cannot mapped to shape.

  1. What happens if you map the same variable to multiple aesthetics?
ggplot(mpg,aes(x=hwy,y=displ, color=cyl, size=cyl))+
  geom_point()

Ans: Here “cyl” (number of cylinders) is mapped to color and size in the same time. The lighter of the color the more number of cylinders. The larger of the size the more number of cylinders. It is redundant and the goal can be achieved by only mapping the variables once by the aesthetic.

  1. What does the stroke aesthetic do? What shapes does it work with? (Hint: use ?geom_point)

Ans:# For shapes that have a border (like 21), you can colour the inside and outside separately. Use the stroke aesthetic to modify the width of the border

ggplot(mtcars, aes(wt, mpg)) +
  geom_point(shape = 21, colour = "black", fill = "white", size = 5, stroke = 5)

  1. What happens if you map an aesthetic to something other than a variable name, like aes(colour = displ < 5)? Note, you’ll also need to specify x and y.
ggplot(mpg)+
  geom_point(mapping=aes(x=displ,y=hwy, color=displ<5))

Ans: in the scatter plot, the displ(engine displacemnet) less than 5 is mapped to blue which is also assigned as “True”. “displ” more than 5 is mapped to red and assigned as logical variable “Fales”.

3.5.1 Exercises

1.What happens if you facet on a continuous variable?
ggplot(mpg)+
  geom_point(mapping=aes(x=displ,y=hwy))+
  facet_grid(drv~cty)

Ans: the continuous variable “displ” is now become a categorical variable in the x-axis, and we can see a trend regarding the relation of highway miles per gallon and engine displacement for different type of drive train.

2.What do the empty cells in plot with facet_grid(drv ~ cyl) mean? How do they relate to this plot?
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = drv, y = cyl))

Ans: There is no combination for the facet thus this scatter plot merely shows the relation of “drv” and “cyl” without any other observation.

3.What plots does the following code make? What does . do?

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(drv ~ .)

Ans1: The “.” means ignoring the variables that will be faceted in the x-axis. This scatter plot shows if we run “drv ~.”, only the value of “drv” is demonstrated on the y-axis leaving x-axis empty.

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(. ~ cyl)

Ans2: Compared with the first plot, facet_grid(. ~ cyl) only puts “cyl” on the y-axis, leaving the value of x-axis emoty.

4.Take the first faceted plot in this section: What are the advantages to using faceting instead of the colour aesthetic? What are the disadvantages? How might the balance change if you had a larger dataset?

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)

Ans: 1)Advantages of Facet: When there are many categories, facets can demonstrate the relationship between the x and Y value across different categories in a separate plot table, which makes it clearer and easier to figure out the trend. Also, when the size of the observations is extremely large, the aesthetic mapping will be mixed up together, making it difficult to tell the trend and pattern from one category to another in the same plot.

2).Disadvantage of facet compared to facets: Take mapping to color as an example, when the categories and the size of observation are moderate, plotting in the same table allows us to comparing the different patterns between different categories directly and sharply by merely observing layers and colors.

3). when the database is increasingly larger, the advantages of facets will be more highlighted.

5.Read ?facet_wrap. What does nrow do? What does ncol do? What other options control the layout of the individual panels? Why doesn’t facet_grid() have nrow and ncol arguments?

1). In the facet_wrap, nrow/ncol indicate how many rows/columns are used to lay your plot.

2).in the facet_grid, the rows and columns are determined by the characters of the variables chosen in the facet_grid() function. For example, in mpg, “drv” has three types of train; facet_grid(drv~.), this function will give a result of a plot with three rows.

  1. When using facet_grid() you should usually put the variable with more unique levels in the columns. Why?

Ans: It can better demonstrate the horizontal trend of those variables putting in column. If the variables with more levels are placed in the rows, the horizontal trend will be squeezed tight.

3.6.1 Exercises

  1. What geom would you use to draw a line chart? A boxplot? A histogram? An area chart?

line charts: geom_line() boxplots: geom_box() a histogram: geom_histogram() a area chart: geom_area()

  1. Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + 
  geom_point() + 
  geom_smooth(se = FALSE) 
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Ans: the points and smooth lines reflect the value of “drv”; different colors represent different groups of drive train.

3.What does show.legend = FALSE do? What happens if you remove it? Why do you think I used it earlier in the chapter?

the function “show.legend= False” will hide the legend in the plot. If we remove the command, it pops up again like the plot below.

ggplot(mpg, mapping=aes(x=displ,y=hwy))+
  geom_point(mapping=aes(color=drv))+
    geom_smooth(mapping=aes(color=drv))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

4.What does the se argument to geom_smooth() do?

Ans: Display whether confidence level around smooth. It shows the sd lines.

ggplot(mpg,mapping=aes(x=displ, y=hwy, color=drv))+
  geom_point()+
  geom_smooth(se=T)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

5.Will these two graphs look different? Why/why not?

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot() + 
  geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Ans: No. They look the same because same function, same data values but just different expression.

  1. Recreate the R code necessary to generate the following graphs.
#1
ggplot(mpg,mapping=aes(x=displ,y=hwy,))+
  geom_point()+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#2.
ggplot(mpg,mapping=aes(x=displ,y=hwy, group=drv))+
  geom_point()+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#3.
ggplot(mpg,mapping=aes(x=displ,y=hwy, color=drv))+
  geom_point()+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#4.
ggplot(mpg,mapping=aes(x=displ, y=hwy))+
  geom_point(mapping=aes(color=drv))+
  geom_smooth(se=F)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#5.
ggplot(mpg,mapping=aes(x=displ,y=hwy,))+
  geom_point(mapping=aes(color=drv))+
  geom_smooth(mapping=aes(linetype=drv,se=F))
## Warning: Ignoring unknown aesthetics: se
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#6.
ggplot(mpg,aes(x=displ,y=hwy))+
  geom_point(size=6,color="white")+
  geom_point(aes(color=drv))