library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(ggplot2)

Introduction

The purpose of this notebook is to <explain how to format and turn in an assignment. This example assumes you have been asked to do a set of problems,we will us R4DS “R for Data Science](”https://r4ds.had.co.nz/index.html“)

To turn in your work, you will create an RPus file and send me the like. We’ll talk about that at the end of this presentation.”>

<Text contained between <> are instructions from me and would not be required in your submission.> ## Doing an Exercise from a Book

Exercise 3.3.1

Whats gone wrong with this code? Why are the points not blue?¹

ggplot(data = mpg) +
  geom_point(mapping=aes(x = displ,y = hwy, color = "blue"))

#### Answer The color keyword is in the aes function and color is not a variable in mpg. To fix it we must move the color assignmet out of the argument of aes but keep it in the argument of geom_point.

ggplot(data = mpg) +
  geom_point(mapping=aes(x = displ,y = hwy), color = "blue")

Exercise 3.3.1

2$. Which variables in mpg are categorical? Which variables are continuous? (Hint: type ?mpg to read the documentation for the dataset). How can you see this information when you run mpg? ²

Answer:

variable name	Cat or Con
manufacturer	categorical
model	character variable
displ	continuous

3.2.4 Exercises

Run ggplot(data = mpg). What do you see?

Answer

when we Run library(ggplot2) and data(mpg) then we have ggplot(data = mpg) function creates the background of the plot, but no layer is specified with geom function, nothing is drawn.

2.How many rows are in mpg? How many columns?

Answer

'dim()'

## [1] "dim()"

To get the dimension of a data matrix, we can simply use function ‘dim()’. the rows and columns in mpg is[1] 234 11

3.What does the drv variable describe? Read the help for ?mpg to find out.

Answer

The drv variable is categorical variable which is used to categorize cars into front wheels, rear wheels or four wheel drive. for instance, f = font-wheel drive, r = rear wheel drive, 4 = four wheel drive.

Make a scatterplot of hwy vs cyl.

Answer

this function creats scatterplot of hwy and cyl

ggplot(mpg, aes(x = hwy, y = cyl)) + geom_point()

What happens if you make a scatterplot of class vs drv? Why is the plot not useful?

Answer

When we make a scatterplot of class vs drive, the resulting scatterplot will have only few points.

ggplot(mpg, aes(x = class, y = drv)) + geom_point()

Additionally, since drv and class variables are categorical variables, they typically take a small number of values so there are a limited number of unique combinations of (x and y values) can be display. As a result, sctterplote is not use to display this values.

3.5.1 Exercises

What happens if you facet on a continuous variable?

Answer

If we try to facet a continuous variable, then the continuous variable is converted to a categorical variable and the plot continuous a facet for each distinct value. see example below

ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() + facet_grid(~ cty)

What do the empty cells in plot with facet_grid(drv ~ cyl) mean? How do they relate to this plot?

Answer

Empty cells in the facets mean that there is nothing that fits in that combination of variables

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = drv, y = cyl))

Empty cells in the facets mean that there is nothing that fits in that combination of variables. For Instance, see from both of the above plots that there are no cars with fours-wheel drive that have five cylinders.

What plots does the following code make? What does . do?

Answer

As seen below, the first plot has facets arranged in rows and the second plot has facet arranged in columns. The dot(.) fills in for one variables in “facet_grid” so the one variable can be faceted in a specific orientation.

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(drv ~ .)

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(. ~ cyl)

Take the first faceted plot in this section: What are the advantages to using faceting instead of the colour aesthetic? What are the disadvantages? How might the balance change if you had a larger dataset?

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)

Answer

One advantage to using faceting instead of colour aesthetic is that, the data is separated so that trend for each value of the faceted variable can be analyzed individually. Also to reduce overlapping data.The disadvantages is that, no direct comparisons can not be made as easily between values of the faceted variables. With a larger dataset, faceting would be much more useful than color to prevent the graph from becoming too crowed.

Read ?facet_wrap. What does nrow do? What does ncol do? What other options control the layout of the individual panels? Why doesn’t facet_grid() have nrow and ncol arguments?

Answer

Because the two variables sets how many rows and columns are there. For instance

ggplot(mpg) + geom_point(aes(displ, hwy)) + facet_wrap(~ cyl, dir = "v", as.table = FALSE)

When using facet_grid() you should usually put the variable with more unique levels in the columns. Why?

Answer

when using facet_grid(), you should always put the variable with more unique leves in the columns. see plot blow

ggplot(mpg) + geom_point(aes(displ, hwy)) + facet_grid(drv ~ class)

As the plots display, R seems to have a standard plot size that is wider than it is taller. Therefore, there is better visibility when more unique levels are in the columns instead of rows.

3.6.1 Exercises

What geom would you use to draw a line chart? A boxplot? A histogram? An area chart?

Answer

‘geom_line()’ = line
‘geom_boxplot()’ = boxplot
geom_histogram() = histogram
‘geom_area()’ = area chart

Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + 
  geom_point() + 
  geom_smooth(se = FALSE)

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#### Answer

Graph of engine displacement on x axis and highway miles per gallon on y axis with everything colored by the type of drive, so three colores total. Each car will be visualized as a point(colored by drive type) plus three will be three fitted trend like separated and colored by drive type which do not include shading for standard error.

What does show.legend = FALSE do? What happens if you remove it? Why do you think I used it earlier in the chapter?

Answer

when show.legend is remove = FALSE from a graph, the legend will be displayed with the graph.The legend was likely removed earlier in the chapter because when including the legend changes the scale of the graph. It also makes it less comparable to other simililar graphs. For instance see example

ggplot(mpg) + geom_smooth(aes(displ, hwy, color = drv))

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

      show.legend = FALSE

What does the se argument to geom_smooth() do?

Answer

‘se’ shades the standard error around the trend line for ‘geom_smooth()’

Will these two graphs look different? Why/why not?

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth()

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot() + 
  geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Answer

These two graphs will look the same because they contain the same specificatons. For Example, the first graph the aesthestics are just specified once for the entire graph, while the second graph has the same aesthestics specified twice for each layer

Recreate the R code necessary to generate the following graphs.

graph I.

ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth(se = FALSE)

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

##Graph II.

ggplot(mpg, aes(displ, hwy)) + geom_point() + 
  geom_smooth(aes(group = drv), se = FALSE)

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

##Graph III.

ggplot(mpg, aes(displ, hwy, color = drv)) +
  geom_point() + geom_smooth(se = FALSE)

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

##Graph iv

ggplot(mpg, aes(displ, hwy)) + geom_point(aes(color = drv)) + geom_smooth(se = FALSE)

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

##Graph v

ggplot(mpg, aes(displ, hwy)) + 
  geom_point(aes(color = drv)) + 
  geom_smooth(aes(linetype = drv), se = FALSE)

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

##Graph vi

ggplot(mpg, aes(displ, hwy)) + geom_point(color = "white", size = 4) + geom_point(aes(color = drv))

3.7.1 Exercises

I. What is the default geom associated with stat_summary()? How could you rewrite the previous plot to use that geom function instead of the stat function?

Answer

See plots below

ggplot(data = diamonds) +
  geom_pointrange(
    mapping = aes(x = cut, y = depth),
    stat = "summary"
  )

## No summary function supplied, defaulting to `mean_se()`

The resulting message says that stat_summary() uses the mean and sd to calculate the middle point and endpoints of the line. However, in the original plot the min and max values were used for the endpoints. To recreate the original plot we need to specify values for fun.min, fun.max, and fun.

ggplot(data = diamonds) +
  geom_pointrange(
    mapping = aes(x = cut, y = depth),
    stat = "summary",
    fun.min = min,
    fun.max = max,
    fun = median
  )

What does geom_col() do? How is it different to geom_bar()?

Answer

The geom_col() function has different default stat than geom_bar(). The default stat of geom_col() is stat_identity(), which leaves the data as is. The geom_col() function expects that the data contains x values and y values which represent the bar height.

Most geoms and stats come in pairs that are almost always used in concert. Read through the documentation and make a list of all the pairs. What do they have in common?

Answer

The following tables lists the pairs of geoms and stats that are almost always used in concert.

Complementary geoms and stats
geom	stat
`geom_bar()`	`stat_count()`
`geom_bin2d()`	`stat_bin_2d()`
`geom_boxplot()`	`stat_boxplot()`
`geom_contour_filled()`	`stat_contour_filled()`
`geom_contour()`	`stat_contour()`
`geom_count()`	`stat_sum()`
`geom_density_2d()`	`stat_density_2d()`
`geom_density()`	`stat_density()`
`geom_dotplot()`	`stat_bindot()`
`geom_function()`	`stat_function()`
`geom_sf()`	`stat_sf()`
`geom_sf()`	`stat_sf()`
`geom_smooth()`	`stat_smooth()`
`geom_violin()`	`stat_ydensity()`
`geom_hex()`	`stat_bin_hex()`
`geom_qq_line()`	`stat_qq_line()`
`geom_qq()`	`stat_qq()`
`geom_quantile()`	`stat_quantile()`

These pairs of geoms and stats tend to have their names in common, such stat_smooth() and geom_smooth() and be documented on the same help page. The pairs of geoms and stats that are used in concert often have each other as the default stat (for a geom) or geom (for a stat).

The following tables contain the geoms and stats in ggplot2 and their defaults as of version 3.3.0. Many geoms have stat_identity() as the default stat.

ggplot2 geom layers and their default stats.
geom	default stat	shared docs
`geom_abline()`	`stat_identity()`
`geom_area()`	`stat_identity()`
`geom_bar()`	`stat_count()`	x
`geom_bin2d()`	`stat_bin_2d()`	x
`geom_blank()`	None
`geom_boxplot()`	`stat_boxplot()`	x
`geom_col()`	`stat_identity()`
`geom_count()`	`stat_sum()`	x
`geom_countour_filled()`	`stat_countour_filled()`	x
`geom_countour()`	`stat_countour()`	x
`geom_crossbar()`	`stat_identity()`
`geom_curve()`	`stat_identity()`
`geom_density_2d_filled()`	`stat_density_2d_filled()`	x
`geom_density_2d()`	`stat_density_2d()`	x
`geom_density()`	`stat_density()`	x
`geom_dotplot()`	`stat_bindot()`	x
`geom_errorbar()`	`stat_identity()`
`geom_errorbarh()`	`stat_identity()`
`geom_freqpoly()`	`stat_bin()`	x
`geom_function()`	`stat_function()`	x
`geom_hex()`	`stat_bin_hex()`	x
`geom_histogram()`	`stat_bin()`	x
`geom_hline()`	`stat_identity()`
`geom_jitter()`	`stat_identity()`
`geom_label()`	`stat_identity()`
`geom_line()`	`stat_identity()`
`geom_linerange()`	`stat_identity()`
`geom_map()`	`stat_identity()`
`geom_path()`	`stat_identity()`
`geom_point()`	`stat_identity()`
`geom_pointrange()`	`stat_identity()`
`geom_polygon()`	`stat_identity()`
`geom_qq_line()`	`stat_qq_line()`	x
`geom_qq()`	`stat_qq()`	x
`geom_quantile()`	`stat_quantile()`	x
`geom_raster()`	`stat_identity()`
`geom_rect()`	`stat_identity()`
`geom_ribbon()`	`stat_identity()`
`geom_rug()`	`stat_identity()`
`geom_segment()`	`stat_identity()`
`geom_sf_label()`	`stat_sf_coordinates()`	x
`geom_sf_text()`	`stat_sf_coordinates()`	x
`geom_sf()`	`stat_sf()`	x
`geom_smooth()`	`stat_smooth()`	x
`geom_spoke()`	`stat_identity()`
`geom_step()`	`stat_identity()`
`geom_text()`	`stat_identity()`
`geom_tile()`	`stat_identity()`
`geom_violin()`	`stat_ydensity()`	x
`geom_vline()`	`stat_identity()`

ggplot2 stat layers and their default geoms.
stat	default geom	shared docs
`stat_bin_2d()`	`geom_tile()`
`stat_bin_hex()`	`geom_hex()`	x
`stat_bin()`	`geom_bar()`	x
`stat_boxplot()`	`geom_boxplot()`	x
`stat_count()`	`geom_bar()`	x
`stat_countour_filled()`	`geom_contour_filled()`	x
`stat_countour()`	`geom_contour()`	x
`stat_density_2d_filled()`	`geom_density_2d()`	x
`stat_density_2d()`	`geom_density_2d()`	x
`stat_density()`	`geom_area()`
`stat_ecdf()`	`geom_step()`
`stat_ellipse()`	`geom_path()`
`stat_function()`	`geom_function()`	x
`stat_function()`	`geom_path()`
`stat_identity()`	`geom_point()`
`stat_qq_line()`	`geom_path()`
`stat_qq()`	`geom_point()`
`stat_quantile()`	`geom_quantile()`	x
`stat_sf_coordinates()`	`geom_point()`
`stat_sf()`	`geom_rect()`
`stat_smooth()`	`geom_smooth()`	x
`stat_sum()`	`geom_point()`
`stat_summary_2d()`	`geom_tile()`
`stat_summary_bin()`	`geom_pointrange()`
`stat_summary_hex()`	`geom_hex()`
`stat_summary()`	`geom_pointrange()`
`stat_unique()`	`geom_point()`

What variables does stat_smooth() compute? What parameters control its behaviour?

Answer

The function stat_smooth() calculates the following variables:

y: predicted value
ymin: lower value of the confidence interval
ymax: upper value of the confidence interval
se: standard error

The parameters that control the behavior of stat_smooth() include:

method: This is the method used to compute the smoothing line. If NULL, a default method is used based on the sample size: stats::loess() when there are less than 1,000 observations in a group, and mgcv::gam() with formula = y ~ s(x, bs = "CS) otherwise. Alternatively, the user can provide a character vector with a function name, e.g. "lm", "loess", or a function, e.g. MASS::rlm.
formula: When providing a custom method argument, the formula to use. The default is y ~ x. For example, to use the line implied by lm(y ~ x + I(x ^ 2) + I(x ^ 3)), use method = "lm" or method = lm and formula = y ~ x + I(x ^ 2) + I(x ^ 3).
method.arg(): Arguments other than than the formula, which is already specified in the formula argument, to pass to the function inmethod`.
se: If TRUE, display standard error bands, if FALSE only display the line.
na.rm: If FALSE, missing values are removed with a warning, if TRUE the are silently removed. The default is FALSE in order to make debugging easier. If missing values are known to be in the data, then can be ignored, but if missing values are not anticipated this warning can

In our proportion bar chart, we need to set group = 1. Why? In other words what is the problem with these two graphs?

Answer

If group = 1 is not included, then all the bars in the plot will have the same height, a height of 1. The function geom_bar() assumes that the groups are equal to the x values since the stat computes the counts within the group.

See examples of charts below

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, y = ..prop..))

The problem with these plots is that the proportions are calculated within the groups.

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, y = ..prop..))

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, fill = color, y = ..prop..))

The following code will produce the intended stacked bar charts for the case with no fill aesthetic.

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))

With the fill aesthetic, the heights of the bars need to be normalized.

ggplot(data = diamonds) + 
  geom_bar(aes(x = cut, y = ..count.. / sum(..count..), fill = color))

### Exercise 3.8.1

What is the problem with this plot? How could you improve it?

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
  geom_point()

Answer

There is overplotting because there are multiple observations for each combination of cty and hwy values.

I would improve the plot by using a jitter position adjustment to decrease overplotting. for instance,

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_point(position = "jitter")

As you seen, the relationship between cty and hwy is clear even without jittering the points but jittering shows the locations where there are more observations.

What parameters to geom_jitter() control the amount of jittering?

From the geom_jitter() documentation, there are two arguments to jitter:

width controls the amount of horizontal displacement, and
height controls the amount of vertical displacement.

The defaults values of width and height will introduce noise in both directions. Here is what the plot looks like with the default values of height and width.

However, we can change these parameters. Here are few a examples to understand how these parameters affect the amount of jittering. Whenwidth = 0 there is no horizontal jitter.

Example one

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_jitter(width = 0)

Example two

When width = 20, there is too much horizontal jitter.

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_jitter(width = 20)

Example three

When height = 0, there is no vertical jitter

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_jitter(height = 0)

Example four

When height = 15, there is too much vertical jitter.

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_jitter(height = 15)

Example 5

When width = 0 and height = 0, there is neither horizontal or vertical jitter, and the plot produced is identical to the one produced with geom_point().

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_jitter(height = 0, width = 0)

## in summary

Note that the height and width arguments are in the units of the data. Thus height = 1 (width = 1) corresponds to different relative amounts of jittering depending on the scale of the y (x) variable. The default values of height and width are defined to be 80% of the resolution() of the data, which is the smallest non-zero distance between adjacent values of a variable. When x and y are discrete variables, their resolutions are both equal to 1, and height = 0.4 and width = 0.4 since the jitter moves points in both positive and negative directions.

Also, The default values of height and width in geom_jitter() are non-zero, so unless both height and width are explicitly set set 0, there will be some jitter.

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_jitter()

Compare and contrast geom_jitter() with geom_count().

Answer

The geom geom_jitter() adds random variation to the locations points of the graph. In other words, it “jitters” the locations of points slightly. This method reduces overplotting since two points with the same location are unlikely to have the same random variation.

Examples

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_jitter()

However, the reduction in overlapping comes at the cost of slightly changing the x and y values of the points.

The geom geom_count() sizes the points relative to the number of observations. Combinations of (x, y) values with more observations will be larger than those with fewer observations.

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_count()

The geom_count() geom does not change x and y coordinates of the points. However, if the points are close together and counts are large, the size of some points can itself create overplotting. For example, in the following example, a third variable mapped to color is added to the plot. In this case, geom_count() is less readable than geom_jitter() when adding a third variable as a color aesthetic.

ggplot(data = mpg, mapping = aes(x = cty, y = hwy, color = class)) +
  geom_jitter()

ggplot(data = mpg, mapping = aes(x = cty, y = hwy, color = class)) +
  geom_count()

ggplot(data = mpg, mapping = aes(x = cty, y = hwy, color = class)) +
  geom_count(position = "jitter")

From the charts above, Combining geom_count() with jitter, which is specified with the position argument to geom_count() rather than its own geom, helps overplotting a little.

But as this example shows, unfortunately, there is no universal solution to overplotting. The costs and benefits of different approaches will depend on the structure of the data and the goal of the data scientist.

What’s the default position adjustment for geom_boxplot()? Create a visualisation of the mpg dataset that demonstrates it.

Answer

The default position for geom_boxplot() is "dodge2", which is a shortcut for position_dodge2. This position adjustment does not change the vertical position of a geom but moves the geom horizontally to avoid overlapping other geoms. See the documentation for position_dodge2() for additional discussion on how it works.

For example, When we add colour = class to the box plot, the different levels of the drv variable are placed side by side, i.e., dodged.

ggplot(data = mpg, aes(x = drv, y = hwy, colour = class)) +
  geom_boxplot()

If position_identity() is used the boxplots overlap.

Example below

ggplot(data = mpg, aes(x = drv, y = hwy, colour = class)) +
  geom_boxplot(position = "identity")

Exercise 3.9.1

Turn a stacked bar chart into a pie chart using coord_polar().

Answer

A pie chart is a stacked bar chart with the addition of polar coordinates. Take this stacked bar chart with a single category.

ggplot(mpg, aes(x = factor(1), fill = drv)) +
  geom_bar()

Now add `coord_polar(theta="y")` to create pie chart.

ggplot(mpg, aes(x = factor(1), fill = drv)) +
  geom_bar(width = 1) +
  coord_polar(theta = "y")

The argument `theta = "y"` maps `y` to the angle of each section.

If coord_polar() is specified without theta = "y", then the resulting plot is called a bulls-eye chart.

ggplot(mpg, aes(x = factor(1), fill = drv)) +
  geom_bar(width = 1) +
  coord_polar()

What does labs() do? Read the documentation.

Answer

The labs function adds axis titles, plot titles, and a caption to the plot.

For instance

ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
  geom_boxplot() +
  coord_flip() +
  labs(y = "Highway MPG",
       x = "Class",
       title = "Highway MPG by car class",
       subtitle = "1999-2008",
       caption = "Source: http://fueleconomy.gov")

The arguments to labs() are optional, so you can add as many or as few of these as are needed.

Another example plot

ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
  geom_boxplot() +
  coord_flip() +
  labs(y = "Highway MPG",
       x = "Year",
       title = "Highway MPG by car class")

The labs() function is not the only function that adds titles to plots. The xlab(), ylab(), and x- and y-scale functions can add axis titles. The ggtitle() function adds plot titles.

What’s the difference between coord_quickmap() and coord_map()?

Answer

The coord_map() function uses map projections to project the three-dimensional Earth onto a two-dimensional plane. By default, coord_map() uses the Mercator projection. This projection is applied to all the geoms in the plot. The coord_quickmap() function uses an approximate but faster map projection. This approximation ignores the curvature of Earth and adjusts the map for the latitude/longitude ratio. The coord_quickmap() project is faster than coord_map() both because the projection is computationally easier, and unlike coord_map(), the coordinates of the individual geoms do not need to be transformed.

See the coord_map() documentation for more information on these functions and some examples.

What does the plot below tell you about the relationship between city and highway mpg? Why is coord_fixed() important? What does geom_abline() do?

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_point() + 
  geom_abline() +
  coord_fixed()

Answer

The function coord_fixed() ensures that the line produced by geom_abline() is at a 45-degree angle. A 45-degree line makes it easy to compare the highway and city mileage to the case in which city and highway MPG were equal.

for example

p <- ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_point() +
  geom_abline()
p + coord_fixed()

If we didn’t include coord_fixed(), then the line would no longer have an angle of 45 degrees.

Exercise 4.1-3

1.Why does this code not work?

Answer

The variable being printed is my_varıable, not my_variable: the seventh character is “ı” (“LATIN SMALL LETTER DOTLESS I”), not “i”.

While it wouldn’t have helped much in this case, the importance of distinguishing characters in code is reasons why fonts which clearly distinguish similar characters are preferred in programming. It is especially important to distinguish between two sets of similar looking characters:

the numeral zero (0), the Latin small letter O (o), and the Latin capital letter O (O), the numeral one (1), the Latin small letter I (i), the Latin capital letter I (I), and Latin small letter L (l). In these fonts, zero and the Latin letter O are often distinguished by using a glyph for zero that uses either a dot in the interior or a slash through it. Some examples of fonts with dotted or slashed zero glyphs are Consolas, Deja Vu Sans Mono, Monaco, Menlo, Source Sans Pro, and FiraCode.

Error messages of the form “object ‘…’ not found” mean exactly what they say. R cannot find an object with that name. Unfortunately, the error does not tell you why that object cannot be found, because R does not know the reason that the object does not exist. The most common scenarios in which I encounter this error message are

I forgot to create the object, or an error prevented the object from being created.

I made a typo in the object’s name, either when using it or when I created it (as in the example above), or I forgot what I had originally named it. If you find yourself often writing the wrong name for an object, it is a good indication that the original name was not a good one.

I forgot to load the package that contains the object using library().

my_variable <- 10
#> Error in eval(expr, envir, enclos): object 'my_varıable' not found

2.Tweak each of the following R commands so that they run correctly:

The error message is argument “data” is missing, with no default. This error is a result of a typo, dota instead of data.

ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy))

R could not find the function fliter() because we made a typo: fliter instead of filter.

We aren’t done yet. But the error message gives a suggestion. Let’s follow it.

filter(mpg, cyl == 8)

## # A tibble: 70 × 11
##    manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 audi         a6 quattro   4.2  2008     8 auto… 4        16    23 p     mids…
##  2 chevrolet    c1500 sub…   5.3  2008     8 auto… r        14    20 r     suv  
##  3 chevrolet    c1500 sub…   5.3  2008     8 auto… r        11    15 e     suv  
##  4 chevrolet    c1500 sub…   5.3  2008     8 auto… r        14    20 r     suv  
##  5 chevrolet    c1500 sub…   5.7  1999     8 auto… r        13    17 r     suv  
##  6 chevrolet    c1500 sub…   6    2008     8 auto… r        12    17 r     suv  
##  7 chevrolet    corvette     5.7  1999     8 manu… r        16    26 p     2sea…
##  8 chevrolet    corvette     5.7  1999     8 auto… r        15    23 p     2sea…
##  9 chevrolet    corvette     6.2  2008     8 manu… r        16    26 p     2sea…
## 10 chevrolet    corvette     6.2  2008     8 auto… r        15    25 p     2sea…
## # … with 60 more rows

#> # A tibble: 70 x 11
#>   manufacturer model      displ  year   cyl trans  drv     cty   hwy fl    class
#>   <chr>        <chr>      <dbl> <int> <int> <chr>  <chr> <int> <int> <chr> <chr>
#> 1 audi         a6 quattro   4.2  2008     8 auto(… 4        16    23 p     mids…
#> 2 chevrolet    c1500 sub…   5.3  2008     8 auto(… r        14    20 r     suv  
#> 3 chevrolet    c1500 sub…   5.3  2008     8 auto(… r        11    15 e     suv  
#> 4 chevrolet    c1500 sub…   5.3  2008     8 auto(… r        14    20 r     suv  
#> 5 chevrolet    c1500 sub…   5.7  1999     8 auto(… r        13    17 r     suv  
#> 6 chevrolet    c1500 sub…   6    2008     8 auto(… r        12    17 r     suv  
#> # … with 64 more rows

filter(diamonds, carat > 3)

## # A tibble: 32 × 10
##    carat cut     color clarity depth table price     x     y     z
##    <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
##  1  3.01 Premium I     I1       62.7    58  8040  9.1   8.97  5.67
##  2  3.11 Fair    J     I1       65.9    57  9823  9.15  9.02  5.98
##  3  3.01 Premium F     I1       62.2    56  9925  9.24  9.13  5.73
##  4  3.05 Premium E     I1       60.9    58 10453  9.26  9.25  5.66
##  5  3.02 Fair    I     I1       65.2    56 10577  9.11  9.02  5.91
##  6  3.01 Fair    H     I1       56.1    62 10761  9.54  9.38  5.31
##  7  3.65 Fair    H     I1       67.1    53 11668  9.53  9.48  6.38
##  8  3.24 Premium H     I1       62.1    58 12300  9.44  9.4   5.85
##  9  3.22 Ideal   I     I1       62.6    55 12545  9.49  9.42  5.92
## 10  3.5  Ideal   H     I1       62.8    57 12587  9.65  9.59  6.03
## # … with 22 more rows

#> Error in filter(diamond, carat > 3): object 'diamond' not found

Press Alt + Shift + K. What happens? How can you get to the same place using the menus?

Answer

It knit the file also, This gives a menu with keyboard shortcuts. This can be found in the menu under Tools -> Keyboard Shortcuts Help.

R4DS↩︎
R4DS↩︎

How to Do and Turn in an Assignment

Bai Sesay DATA101 Summer 2022

07/21/2022

Introduction

Exercise 3.3.1

Exercise 3.3.1

Answer:

3.2.4 Exercises

Answer

Answer

Answer

Answer

Answer

3.5.1 Exercises

Answer

Answer

Answer

Answer

Answer

Answer

3.6.1 Exercises

Answer

Answer

Answer

Answer

graph I.

3.7.1 Exercises

Answer

Answer

Answer

Answer

Answer

### Exercise 3.8.1

Answer

Example one

Example two

Example three

Example four

Example 5

Answer

Answer

Exercise 3.9.1

Answer

Now add coord_polar(theta="y") to create pie chart.

The argument theta = "y" maps y to the angle of each section.

Answer

Another example plot

Answer

Answer

Exercise 4.1-3

Answer

Answer

Now add `coord_polar(theta="y")` to create pie chart.

The argument `theta = "y"` maps `y` to the angle of each section.