When completed, name your final output .html file as: YourName_ANLY512-2018.html and upload it to the “In-class Visualization Coding Exercise #4” assignment in Week #7 on Moodle.

These exercises will use the mtcars data frame that you worked with before. The mtcars dataset contains information about 32 cars from a 1973 Motor Trend magazine. This dataset is small, intuitive, and contains a variety of continuous and categorical variables.

Scatterplots and Jittering Part 1

You already saw a few examples using geom_point() where the result was not a scatter plot in some of your previous homework assignments.

For example, run the following code in the code chunk below. In the plot shown below, a continuous variable, wt, is mapped to the y aesthetic, and a categorical variable, cyl, is mapped to the x aesthetic. You need to make cyl a categorical variable.

This also leads to over-plotting, since the points are arranged on a single x position. You previously dealt with overplotting by setting the position = jitter inside geom_point(). Let’s look at some other solutions here.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
#run the code below
#overplotting in which the plot is not a scatterplot
ggplot(mtcars, aes(x = cyl, y = wt)) +
  geom_point()

Beginning with the code for the plot in the viewer (given), make these modifications.

  1. Use a shortcut geom, geom_jitter(), instead of geom_point().

  2. Unfortunately, the width of the jitter is a bit too wide to be useful. Adjust this by setting the argument width = 0.1 inside geom_jitter().

  3. Finally, return to geom_point() and set the position argument here to position_jitter(0.1), which will set the jittering width directly inside a points layer.

library(ggplot2)

#part a

ggplot(mtcars, aes(x = cyl, y = wt)) +
 geom_jitter()

#part b

ggplot(mtcars, aes(x = cyl, y = wt)) +
   geom_jitter(width = 0.1)

#part c

ggplot(mtcars, aes(x = cyl, y = wt)) +
  geom_point(position = position_jitter(0.1))

Scatterplots and Jittering Part 2

When we studied aesthetics you saw different ways in which you will have to compensate for overplotting. In class, you saw a dataset that suffered from overplotting because of the precision of the dataset.

Another example you saw is when you have interval data. This can be continuous data measured on an interval (i.e. 1 ,2, 3 …), as opposed to numeric (i.e. 1.1, 1.4, 1.5, …), scale, or two categorical (e.g. factor) variables, which are just type interval under-the-hood.

In such a case you’ll have a small, defined number of intersections between the two variables

You will be using the Vocab dataset. The Vocab dataset contains information about the years of education and integer score on a vocabulary test for over 21,000 individuals based on US General Social Surveys from 1972-2004.You have to install the cars R package. We will do this in-class together as we work through these problems.

Important Note: once you have installed the cars package, to refer to the Vocab data set, you have to specify carData:Vocab (case-sensitive!!).

  1. Perform the following.
  1. In the Vocab data frame, both the education and vocabulary variables are classified as integers. You can imagine these as factor variables, but here, integers are more convenient to work with. First, get familiar with the dataset by looking at its structure with str().

  2. Make a basic scatter plot of vocabulary (y) vs. education (x). Here it becomes apparent that you have issues with overplotting because of the integer scales.

  3. Create the same scatterplot as in part b, use geom_jitter() instead of geom_point().

  4. Using the jittered plot (plot from part b), set alpha to 0.2 (very low).

  5. Using the jittered plot (plot from part b), set shape to 1.

if(!require(CARS)){install.packages("CARS",repos="http://cran.rstudio.com/")}
## Loading required package: CARS
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'CARS'
## Installing package into 'C:/Users/Ravindra/Documents/R/win-library/3.4'
## (as 'lib' is unspecified)
## also installing the dependencies 'cubature', 'quadprog', 'np'
## package 'cubature' successfully unpacked and MD5 sums checked
## package 'quadprog' successfully unpacked and MD5 sums checked
## package 'np' successfully unpacked and MD5 sums checked
## package 'CARS' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\Ravindra\AppData\Local\Temp\RtmpQ94cqS\downloaded_packages
if(!require(carData)){install.packages("carData",repos="http://cran.rstudio.com/")}
## Loading required package: carData
## Warning: package 'carData' was built under R version 3.4.4
library(ggplot2)
library(CARS)
## Warning: package 'CARS' was built under R version 3.4.4
library(carData)

#part a
library(car)
## Warning: package 'car' was built under R version 3.4.4
str(Vocab)
## 'data.frame':    30351 obs. of  4 variables:
##  $ year      : num  1974 1974 1974 1974 1974 ...
##  $ sex       : Factor w/ 2 levels "Female","Male": 2 2 1 1 1 2 2 2 1 1 ...
##  $ education : num  14 16 10 10 12 16 17 10 12 11 ...
##  $ vocabulary: num  9 9 9 5 8 8 9 5 3 5 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:32115] 1 2 3 4 5 6 7 8 9 10 ...
##   .. ..- attr(*, "names")= chr [1:32115] "19720001" "19720002" "19720003" "19720004" ...
#part b
# Basic scatter plot of vocabulary (y) against education (x). Use geom_point()
ggplot(Vocab, aes(education, vocabulary)) + 
  geom_point()

#part c
# using same plot as in part b, use geom_jitter() instead of geom_point()
ggplot(Vocab, aes(education, vocabulary)) + 
  geom_jitter()

#part d
#Using the above plotting command, set alpha to a very low 0.2
ggplot(Vocab, aes(education, vocabulary)) + 
  geom_jitter(alpha = 0.2)

#part e
#Using the above plotting command, set the shape to 1
ggplot(Vocab, aes(education, vocabulary)) + 
  geom_jitter(alpha = 0.2, shape = 1)