Instructions: Answer to the best of your knowledge, and write code when it is required.

1. What is data visualization? This is a big area, so try and give an overview.

Because humans are better at capturing visual patterns, it is good to represent our data using visual results and plots. Data visualization is an important tool to do this in order to explore and understand datasets.

2. List the two main graphics systems of R.

3. List the tools for visualizing

More packages are available.

4. Explain faceting/facets

Creating the variations of the same plot that are obtained with different subsets of a dataset.

5.

  1. What are the two basic problems of barplots or barcharts?
  1. If the labels for bars are too long, they cannot be shown nicely. It is better to show the bars horizontally so we have more space for the names.

  2. Sometimes a few bars cannot be distinguishable because the values are very close to each other. If we do not start the Y axis from zero we may be able to show the difference better, but this is not correct. If the bars are not distinguishable it means the difference is not noticeable.

  3. It is better to use a barplot instead of a piechart. Human’s eye captures the height difference easier than pie area difference.

  1. What are the solutions of each of the problems in (a) above?

Please check my answer in part (a)

6. It is generally a good idea to use the information provided by barcharts to create, or show, as a pie chart. TRUE or FALSE

FALSE

7. When analyzing data, explain what barcharts would be used for.

8.

  1. What does the package, GGally, provide? (e.g., what functions does it provide, etc.)

It provides a series of interesting additions to the graphs available in ggplot2 package. Some of its functions includes a pairwise plot matrix, a scatterplot plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks.

  1. What does the function, ggpairs, do?

This function consider all pairs of columns/variables given as its input and creates a matrix plot. Each dimension of the matrix contains all the columns/variables Depending on the type of the columns/variables we can have different type of plot for each pair. For example, scatterplot, barplot, boxplot, etc.

9.

  1. Explain the function, facet_wrap().

It uses for faceting and creating subplots. By giving a nominal variable to it, we can create a set of subplots presented sequentially. There will be a reasonable wrapping around the screen space.

  1. Explain the function, facet_grid(). This function creates a matrix plot. Each dimension is one of the variable we provided to this function. In each dimension we get as many plots as the values of that variable. If a is a variable with two values and b a variable with 3 values, using facet_grid(a ~ b) will generate a matrix of plots which has 2 (imaginary) columns and in each column we have 3 plots.

10. Explain the following argument:

aes(x = Sepal.Length, y = Sepal.Width)

It is used to set aesthetic of the plot. For example, the above code specifies Sepal.Length variable as the horizontal axis and Sepal.Width variable as the vertical axis of our plot.

11. Give an interpretation of the symbol, “ ~ ”

It is a tool for defining relationships between variables. For example using facet_grid(size ~ speed) means we want to create a grid of facets and all the levels of variable size on the horizontal axis and all the levels of variable speed are on the vertical axis.

12. What are the aesthetics in a plot?

The visual attributes of a plot, including the axes, the scales on the axes, the color, the fill, and other attributes concerning appearance.

13.

  1. What are layers?

These are what we see in our graph (points, bars, etc.), our visualization. In ggplot we add the different aspects of our plot layer by layer on top of each other. In ggplot we create our plot by putting one layer on top of the other one. It is different from the standard graphics of r. For example, one layer is ggplot(), the other one is the type of the plot (e.g., barplot), the other ones are labels, titles, etc.

  1. What are the five components of layers?

14. What is scaling?

This is the way that shrink or manipulate our data so it fits well on our computer screen.

15. Explain layers and what they are used for.

These are what we see in our graph (points, bars, etc.). In ggplot we add the different aspects of our plot layer by layer on top of each other.

16. What are themes?

Themes let us control the details of the display, for example its font.

17. What are APIs?

API stands for Application Programming Interface. It includes sets of rules/protocols that allow different software applications to communicate with each other. APIs different systems, services, or platforms work with each other.

18. Give two examples of “geom,” and explain what they do.

geom_histogram(): It sketches the histogram of the variable we specify in aes() part.

geom_point(): It sketches of a scatter plot of the two variables we specify in the aes() part for the horizontal and vertical axes.

19. Explain the following function and all of its arguments, etc.

library(ggplot2)
ggplot(iris, aes(x = Species, y = Sepal.Length)) + geom_boxplot( )

We create a plot using ggplot with different layers. In the first layer, we use ggplot() and specify iris as our data. We then indicate Species as our horizontal axis and Sepal.Length as the vertical axis. In the next layer we specify the type of the plot we want which is a boxplot in this example. What this code does, is sketching the boxplot of variable Sepal.Length for different levels of variable Species.

20. Explain the R graphics layered architecture.

The idea is to build plots by adding multiple layers, each specifying different aspects to the final visualization. The layered architechture works similar no matter what our output device is. Components of this layered architecture are:

To create a plot we put each of these layers on top of the previous layers. Some layers are optional though.

21. In “ggplot” what does “gg” stand for?

Grammar of graphics

22.

  1. List some aesthetic attributes.
  1. List some geometric objects that are defined by the grammar for graphics.

23. What statistical plot can we use to explore the distribution of the values of a nominal variable?

Histogram

24. Use the ggplot2 package to write an algorithm, or a chunk of code, that will create a plot of the distribution of the values of a continuous variable (use any geom except the histogram). Choose the correct geom. Use the iris dataset.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
data(iris)
ggplot(iris, aes(x= Sepal.Length)) + geom_bar() + xlab('Sepal Length')

  1. Then, explain why you chose your algorithm,

I used different levels of ggplot to create my plot. ggplot(), geom_bar(), xlab().

  1. explain why you chose the functions you used,

I first need to specify what data I want to use, then the variable I want to check its distribution, then the geom which sketches distribution, and finally the label of my horizontal.

  1.    explain why you chose your geom, and

geom_bar() can be used to create barplot which is a way to show the distribution of the values of a variable.

  1.   show your plot.

Please check the above plot.

25. Use the ggplot2 package to write an algorithm, or a chunk of code, that will create a plot of the distribution of the values of a continuous variable using a histogram. Use the iris dataset.

library(dplyr)
library(ggplot2)
data(iris)
ggplot(iris, aes(x= Sepal.Length)) + geom_histogram() + xlab('Sepal Length')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

  1. Then, explain why you chose to use your aes( ) function,

I needed to specify (at least) my horizontal axis.

  1. explain why you chose to use your particular geom, and

We were asked to sketch a histogram, that is why I used geom_histogram() which is the function in ggplot to sketch histogram.

  1. show your plot.

Please check the above plot