1+2
## [1] 3
library(tidyverse)
mpg
## # A tibble: 234 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
## 3 audi a4 2 2008 4 manu… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
## 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
## 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
## 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
## # ℹ 224 more rows
ggplot ( data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
ggplot ( data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color ="blue"))
1. What’s gone wrong with this code? Why are the points not blue? The
points aren’t blue because of in the expression blue is interpreted as
an categorical value which only takes a single value. 2. Which variables
in mpg are categorical? Which variables are continuous? The values that
are catergorical variable in mpg are as follows, manufacturer, model,
trans, drv, fl, and class The values that are continous variables are as
follows, displ, year, cyl, cty, and hwy. 3. Map a continuous variable to
color, size, and shape. How do these aesthetics behave differently for
categorical vs. continuous variables?
ggplot (mpg, aes(x = displ, y = hwy, color = cty))+
geom_point()
ggplot (mpg, aes(x = displ, y = hwy, size = cty)) +
geom_point()
When answers question 3 it does not allow you to make a graph for shape as it will give you an error simple if you do. This is becuase shapes do not have a number value. Now while doing this continous variables uses a scale that varies light to a dark blue color. With this a categorical variable does not do this and keeps it all one color 4 . What happens if you map the same variable to multiple aesthetics?
ggplot (mpg, aes(x = displ, y = hwy, color = hwy, size = displ))+
geom_point()
When answering question 4 the map will plot the code even if its a bad
one if you map all of the same variables. This makes the plot look bad
and very tight and unorganized and with this you want to avoid doing
this. 5. What does the stroke aesthetic do? What shapes does it work
with? With this stroke aesthetic is changing sizes of the borders for
shapes and they are filled shapes in which color and size of the border
can differ from the filled interior shape.
ggplot (mtcars, aes(wt, mpg))+
geom_point(shape = 21, color = "black", fill = "white", size = 5, stroke = 5)
6. What happens if you map an aesthetic to something other than a
variable name, like aes(colour = displ < 5)? Note, you’ll also need
to specify x and y. What happens when you do this is the function will
behvae as if a temporary variable was added to the data with values that
will equal to the result of th expression.
ggplot (mpg, aes(x = displ, y = hwy, color = displ < 5))+
geom_point()
How to get help
1.What happens if you facet on a continuous variable? With this the graph below shows what happens if you facet an continous variable
ggplot (mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_grid(. ~ cty)
ggplot ( data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class)) +
facet_wrap(~class, nrow = 2)
ggplot ( data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class)) +
facet_grid(drv ~ cyl)
ggplot ( data = mpg) +
geom_point(mapping = aes(x = drv, y = cyl))
ggplot ( data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))+
facet_grid(drv ~.)
ggplot ( data = mpg) +
geom_point(mapping =aes(x = displ, y = hwy))+
facet_grid(. ~ cyl)
4.What are the advantages to using faceting instead of the colour aesthetic? What are the disadvantages? How might the balance change if you had a larger dataset?
The advantages of using this is encoding class with facets instead of color include the ability to eoncod more distinct categories. The disadvantges are encoding the class variable with facets instead of color aesthetic include the difficulty of comparing values of observations between categories. With this also it is still more difficult than if they had been displayed on the same plot.
ggplot ( data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
different visual object to represent data
ggplot ( data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
ggplot ( data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))
not every aesthetic works with every geom
two geoms in the same graph!
ggplot ( data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth()
local vs. global mappings This makes it possible to display different aesthetics in different layers.
specify different data for each layer
3.6 Exercises 1. What geom would you use to draw a line chart? A boxplot? A histogram? An area chart?
linechart - geom_line() boxplot - geom_boxplot() histogram - geom_histogram() area chart - geom_area()
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(se = FALSE)
What does show.legend = FALSE do? What happens if you remove it? Why do you think I used it earlier in the chapter? Removing the show. legend will change how the plot sizes making them smaller or larger. You should us this to show us that plot sizes can be changed easily
What does the se argument to geom_smooth() do? It adds standard error bands to the lines
Will these two graphs look different? Why/why not? These graphs will look the same because both geom_point() and geom_smooth() will be using the same data and mappings. These graphs will inherit the options from ggplot() so the mappings don’t need to specified again
ggplot(data = mpg, mapping = aes(x = displ, y = hwy))+
geom_point()+
geom_smooth()
ggplot()+
geom_point(data = mpg, mapping = aes(x = displ, y = hwy))+
geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(se = FALSE)
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(group = drv), se = FALSE) +
geom_point()
ggplot(mpg, aes(x = displ, y = hwy, colour = drv)) +
geom_point() +
geom_smooth(se = FALSE)
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(colour = drv)) +
geom_smooth(se = FALSE)
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(colour = drv)) +
geom_smooth(aes(linetype = drv), se = FALSE)
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(size = 4, color = "white") +
geom_point(aes(colour = drv))
## statistical transformation
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
adjustments for bar charts
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity),position = "dodge")
adjustments for scatterplots
switch x and y
set the aspect ratio correctly for maps
Polar coordinates reveal an interesting connection between a bar chart and a Coxcomb chart.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut))+
coord_polar()
3.8 exercise questions 1 What is the problem with this plot? How could you improve it? You can improve this by get rid of the overplotting and I would improve the plot by using jitter position to get rid of the overplotting
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point()
2.What parameters to geom_jitter() control the amount of jttering? Width
controls the amount of horizontal displacement height controls the
amount of vertical displacement.
Compare and contrast geom_jitter() with geom_count(). The geom geom_jitter() adds random variation to the locations points of the graph The geom geom_count() sizes the points relative to the number of observations. Combinations of (x, y) values with more observations will be larger than those with fewer observations.
What’s the default position adjustment for geom_boxplot()? Create a visualisation of the mpg dataset that demonstrates it.
The default position for geom_boxplot() is “dodge2”, which is a shortcut for position_dodge2. This position adjustment does not change the vertical position of a geom but moves the geom horizontally to avoid overlapping other geoms.
ggplot(data = mpg, aes(x = drv, y = hwy, colour = class)) +
geom_boxplot()
The grammar of graphics is based on the insight that you can uniquely describe any plot as a combination of:
a dataset, * a geom, * a set of mappings, * a stat, * a position adjustment, * a coordinate system, and * a faceting scheme.
3.9.1 Exercises 2. What does labs() do? Read the documentation. The labs function adds axis titles, plot titles, and a caption to the plot.
What’s the difference between coord_quickmap() and coord_map()? The coord_map() function uses map projections to project the three-dimensional Earth onto a two-dimensional plane while the coord_quickmap() function uses an approximate but faster map projection.
What does the plot below tell you about the relationship between city and highway mpg? Why is coord_fixed() important? What does geom_abline() do?
p <- ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point() +
geom_abline()
p + coord_fixed()