Problem 1.
Data visualization is an important tool for
exploring and understanding our datasets. Humans are outstanding at
capturing visual patterns and data visualization tries to capitalize on
these abilities.
Problem 2.
The standard graphics
The grid graphics
Problem 3.
The one provided by standard plotting functions that
build upon the facilities provided by the graphics system and the tools
available in the ggplot2 package that build upon the grid graphics
system.
Problem 4.
Facets – divide a plot into subplots based on the
values of one or more discrete variables.
Problem 5.
(a) 1. When the labels of the variables of the
nominal variable are too long
2. One sometimes faces of having a
few of the bar differences being almost indistinguishable.
(b) 1.
It may be easier to read the graph if you plot the bars horizontally.
2. The difference are not noticeable because that is what they
are. Do not use the graphs to lie about your data.
Problem 6.
False
Problem 7.
It deals with the visualization of the values of a
nominal variable. A graph shows as many bars as there are different
values of the variable, with the height of the bars corresponding to the
frequency of the values.
Problem 8.
(a) It provides functions such as ggparis() and
ggparcoord().
(b) The function will automatically take advantage
of the symmetry of the graphs and use the upper part of the matrix to
show the correlation values between each pair of variables. Moreover, in
the diagonal we get a continuous approximation of distribution of the
respective variable.
Problem 9.
(a) This function allows you to indicate a nominal
variable whose values will create a set of subplots that will be
presented sequentially with reasonable wrapping around the screen
space.
(b)This function allows us to set up a matrix of plots with each dimension of the matrix getting as many plots as there are values of the respective variable. For each cell of this matrix the graph specified before the facet is shown using only the subset of rows that have the respective values on the variables defining the grid. < br/>
Problem 10.
The function aes maps Sepal.Length to x-axis and
Sepal.Width to y-axis in a ggplot plot.
Problem 11.
It is used to relate two variables. For example, y
~ x means y is regressed on x in the context of regression.
Problem 12.
Each is a mapping of the data to the visual. For
example, mapping the value of student ages to the y-axis of a graph.
Problem 13.
(a) They are what you see on the plots, whatever
you see in your visualization.
(b) Data, aesthetic mappings, a
statistical transformation,a geometric object, a positional adjustment.
Problem 14.
Scaling allows you to standardize or normalize
your data.
Problem 15.
Layers are what you see on the plots (e.g.,
points, lines, …), whatever you see in your visualization (or your viz).
Layers are responsible for creating the objects that we perceive on the
plot.
Problem 16.
Themes are a powerful way to customize the non-data
components of your plots: i.e. titles, labels, fonts, background,
gridlines, and legends. Themes can be used to give plots a consistent
customized look.
Problem 17.
APIs, or Application Programming Interfaces, are a
set of tools, guidelines, and protocols that help different software
components interact with each other. They simplify software development
by providing a standardized way for different systems to share
information. For example, a weather app on a phone can use APIs to
communicate with the weather bureau’s software system to display daily
weather updates. < br/>
Problem 18.
geom_histogram(), geom_boxplot(), they are used to
obtain a histogram and a boxplot respectively for the ggplot graph.
Problem 19.
The ggplot function begins a plot that you finish
by adding layer to.There are two arguments for the function. The first
is the dataset, “iris”. The second one is the aes function, it maps the
values of Species to x-axis and the values of Sepal’s Length to y-axis.
Lastly, we need to include the shape of the plot, geom_boxplot gives us
the boxplot as the shape of the plot.
Problem 20.
The layered architecture allows users to almost
completely ignore the output devices, because to obtain a plot in, or
on, the screen (or as a PDF file), the process is the same minus one
specific instruction where you “tell” R where to “show” the plot.
On one end of the architecture, we have concrete graphics devices
where the plots will be shown. On the other end of the architecture, we
have the graphics functions we will use to produce concrete statistical
plots.
Problem 21.
It stands for grammar for graphics.
Problem 22.
(a) Color, shape, size, etc.
(b) Points,
lines, bars, etc.
Problem 23.
A barplot
Problem 24.
library(ggplot2)
ggplot(iris, aes(x = factor(0), y = Sepal.Width)) + geom_boxplot() + xlab("") + ylab("Sepal Width") + theme(axis.txt.x = element_blank())
The ggplot is very practical for displaying a plot with code
relatively easy to understand.Therefore I chose it.
The aes() function allows us to map relevant values to x-axis and
y-axis respectively. The xlab(), ylab() functions provide labels to
x-axis and y-axis. Theme allows us to display x-axis label
correctly.
Since a box_plot tries to convey information on the centrality of variable, its spread, and also the eventual existence of outliers. I picked geom_boxplot().
Problem 25.
library(ggplot2)
ggplot(iris, aes(y = Sepal.Width)) + geom_histogram()