Problem 1.
Data visualization is an important tool for exploring and understanding our datasets. Humans are outstanding at capturing visual patterns and data visualization tries to capitalize on these abilities.

Problem 2.
The standard graphics
The grid graphics

Problem 3.
The one provided by standard plotting functions that build upon the facilities provided by the graphics system and the tools available in the ggplot2 package that build upon the grid graphics system.

Problem 4.
Facets – divide a plot into subplots based on the values of one or more discrete variables.

Problem 5.
(a) 1. When the labels of the variables of the nominal variable are too long
2. One sometimes faces of having a few of the bar differences being almost indistinguishable.
(b) 1. It may be easier to read the graph if you plot the bars horizontally.
2. The difference are not noticeable because that is what they are. Do not use the graphs to lie about your data.

Problem 6.
False

Problem 7.
It deals with the visualization of the values of a nominal variable. A graph shows as many bars as there are different values of the variable, with the height of the bars corresponding to the frequency of the values.

Problem 8.
(a) It provides functions such as ggparis() and ggparcoord().
(b) The function will automatically take advantage of the symmetry of the graphs and use the upper part of the matrix to show the correlation values between each pair of variables. Moreover, in the diagonal we get a continuous approximation of distribution of the respective variable.

Problem 9.
(a) This function allows you to indicate a nominal variable whose values will create a set of subplots that will be presented sequentially with reasonable wrapping around the screen space.

(b)This function allows us to set up a matrix of plots with each dimension of the matrix getting as many plots as there are values of the respective variable. For each cell of this matrix the graph specified before the facet is shown using only the subset of rows that have the respective values on the variables defining the grid. < br/>

Problem 10.
The function aes maps Sepal.Length to x-axis and Sepal.Width to y-axis in a ggplot plot.

Problem 11.
It is used to relate two variables. For example, y ~ x means y is regressed on x in the context of regression.

Problem 12.
Each is a mapping of the data to the visual. For example, mapping the value of student ages to the y-axis of a graph.

Problem 13.
(a) They are what you see on the plots, whatever you see in your visualization.
(b) Data, aesthetic mappings, a statistical transformation,a geometric object, a positional adjustment.

Problem 14.
Scaling allows you to standardize or normalize your data.

Problem 15.
Layers are what you see on the plots (e.g., points, lines, …), whatever you see in your visualization (or your viz). Layers are responsible for creating the objects that we perceive on the plot.

Problem 16.
Themes are a powerful way to customize the non-data components of your plots: i.e. titles, labels, fonts, background, gridlines, and legends. Themes can be used to give plots a consistent customized look.

Problem 17.
APIs, or Application Programming Interfaces, are a set of tools, guidelines, and protocols that help different software components interact with each other. They simplify software development by providing a standardized way for different systems to share information. For example, a weather app on a phone can use APIs to communicate with the weather bureau’s software system to display daily weather updates. < br/>

Problem 18.
geom_histogram(), geom_boxplot(), they are used to obtain a histogram and a boxplot respectively for the ggplot graph.

Problem 19.
The ggplot function begins a plot that you finish by adding layer to.There are two arguments for the function. The first is the dataset, “iris”. The second one is the aes function, it maps the values of Species to x-axis and the values of Sepal’s Length to y-axis. Lastly, we need to include the shape of the plot, geom_boxplot gives us the boxplot as the shape of the plot.

Problem 20.
The layered architecture allows users to almost completely ignore the output devices, because to obtain a plot in, or on, the screen (or as a PDF file), the process is the same minus one specific instruction where you “tell” R where to “show” the plot.

On one end of the architecture, we have concrete graphics devices where the plots will be shown. On the other end of the architecture, we have the graphics functions we will use to produce concrete statistical plots.

Problem 21.
It stands for grammar for graphics.

Problem 22.
(a) Color, shape, size, etc.
(b) Points, lines, bars, etc.

Problem 23.
A barplot

Problem 24.

library(ggplot2)
ggplot(iris, aes(x = factor(0), y = Sepal.Width)) + geom_boxplot() + xlab("") + ylab("Sepal Width") + theme(axis.txt.x = element_blank())


  1. The ggplot is very practical for displaying a plot with code relatively easy to understand.Therefore I chose it.

  2. The aes() function allows us to map relevant values to x-axis and y-axis respectively. The xlab(), ylab() functions provide labels to x-axis and y-axis. Theme allows us to display x-axis label correctly.

  3. Since a box_plot tries to convey information on the centrality of variable, its spread, and also the eventual existence of outliers. I picked geom_boxplot().

Problem 25.

library(ggplot2)
ggplot(iris, aes(y = Sepal.Width)) + geom_histogram()

  1. Because it allows me to map Sepal.Width values to y-axis so that we can display the plot correctly.
  2. Because Histograms are a popular graphing tool that can be used to summarize data sets and show their general distributional features. It’s a good choice for this problem.