Instructor: Edgar Franco

Outline

Preliminaries
Setting the working directory
Flow Control and Loops
Analyzing data and reporting results 4.1. Frequency and cross tables 4.2. Summary statistics 4.3. Summary statistics by groups
Graphics in R

1. Preliminaries

# Let's start with an empty working directory
rm(list = ls())   ### Remove all objects in the working environment
                  ### Similar to "clear all" in Stata

gc()    ### Garbage collector it can be useful to call gc after a large object has been removed, as this may prompt R to return memory to the operating system.

** PLEASE CHANGE YOUR WORKING DIRECTORY NOW **

setwd(“C://path”)

To display the current working directory use getwd()

getwd()

** PLEASE CREATE THE FOLLOWING SUBFOLDERS IN YOUR WORKING DIRECTORY ** * Tables * Graphs

Please install the following packages > install.packages(“dplyr”) > install.packages(“foreign”) > install.packages(“xtable”) > install.packages(“texreg”) > install.packages(“stargazer”) > install.packages(“ggplot2”)

You can do this in a loop

pkgs <- c(“foreign”, “dplyr”, “ggplot2”, “xtable”, “texreg”, “stargazer”) for (pkg in pkgs) { if (!(pkg %in% rownames(installed.packages()))) { install.packages(pkg) } lapply(pkg, library, character.only=T ) }

1. Flow control and Loops

1.1 If and else

This is the simples version of flow control. If takes a loogical value and executes the next statement only if that value is true.

### Let's pass a random uniform number.

if(runif(1) >0.5) message("This message appears with 50% chance")

### It is also possible to execute several statements at the same time using {}

x <- 4

if(x>1){
  y <- 2 *x
  z <-3 * y
  print(z)  ### To print the result (different than return)
}


x <- 0
## Nothing happens if:
if(x>1){
  y <- 2 *x
  z <-3 * y
  print(z)  
}

The else statement is a complement to the if statement. It is executed if the ‘if’ condition was false:

x <- 0
if(x>1){
  y <- 2 *x
  z <-3 * y
  print(z)  
} else{
  message("This won't execute :(")
}

### Note that the else statemente should occur in the same line than the closing brace. Otherwise this is an error

Another useful control flow command is ifelse, this can be vectorized.

## Here, rbinom generates random numbers from a binominal distribution
## Syntaxis: rbinom(n, size, prob)

ifelse(rbinom(10,1,0.5), "Head", "Tail")

1.2 Repeat loops

This is the easiest loop, also not very useful but helpful to understand what a loop does. This kind of loop repeats something until something else happens.

repeat{
  message("Welcome to R!!") #message to print
  n <- sample(c(1,2,3,4),1)  ## we sample one of these numbers
  message("number" = n)
  if(n == 3) break
}

1.3 While loops

Similar to repeat loops but instead of executing and then checking if the loop should end, they check the condition and then (maybe execute)

Syntax: while(statementis true){ do something}

## The following will execute while x is less than 10
x <- 0

while(x<10){
  message("The current number is ", x)
  x <- x+1
}

1.4 For loops

This loop is used when you know how many times you want the code to repeat. The loop accepts an iterator variable.

Syntax: for(local.variable in range){do something}

# In ths simplest case, the vector contains integers:
# The following are equivalent

for(i in 1:5) message("Current number = " , i)
for(i in 1:5) print(paste("Current number = " ,i))



# For multiple expressions braces are useful
for(i in 1:5){
  j <- i ^2
  message("j = ", j)
}



# For loops can also receive character vectors
## mont.name is a vector with names of the month
month.name

for(month in month.name){
  message("This month is ", month)
}

EXERCISE 6 (Loops)

Create a sequence from 1 to 10. Use ifelse statement to evaluate if the numbers are less than 6. If they are less than 6 return “Y”, otherwise return “N”
Create a function that receives an integer. The function should multiply by 2 all numbers from 1 to the integer you pass to this function. The syntax of your function should be symply my_function(integer). Hint: This is a for loop.
One problem with this function is that the results with values equal or less than zero are confusing. Add an if statement that returns a message if the number is zero or negative

2. Analyzing Data and Reporting Results in R

2.1. Frequency and cross tables

Let’s work with the full ‘turnout’ dataset.

rm(list=ls()) # Start with an empty workspace

library(“Zelig”) # Load package ‘Zelig’

data(“turnout”) # Load dataset ’turnout

library("Zelig")                
data("turnout")                 

## For the next exercises, we will create and use a variable for gender
turnout$gender <- sample(c("Male","Female"),size = 2000,replace = TRUE)
head(turnout)
table(turnout$gender)

3.1.1. Function ‘table()’

The function ‘table()’ creates frequency tables

##  Frequency table for 'race'
table(race=turnout$race)

##  Cross table for 'race' and 'turnout'
table(race=turnout$race,vote=turnout$vote)

## NOTE: This table can be summarized
summary(table(race=turnout$race,vote=turnout$vote)) 
## It can also be plotted
plot(table(race=turnout$race,vote=turnout$vote))

## Cross table for 'race' and 'vote' and 'gender'
## A table can be defined in more than two dimensions.
## The format of this table will be three-dimensional: [race,turnout,sex]
cross_table_1 <- table(race=turnout$race,                   # Dimensions
                       vote=turnout$vote,
                       sex=turnout$gender,
                       dnn = c("race","vote","gender"))     # Names of dimensions

cross_table_1

### Cross tables makes subsampling easier.
### Let's select some subtables
cross_table_1[race = "others",        ,                ]   # Cross table for 'race'   ="others"
cross_table_1[               ,vote = 1,                ]   # Cross table for 'vote'=1
cross_table_1[               ,        , gender = "Male"]   # Cross table for 'gender'="Male"

3.1.2. Function ‘xtabs()’

# Let's take a look to the dataset
head(turnout)
# 'xtabs()' uses a formula instead of vectors
# Syntax: 'xtabs(formula, dataset)'

##  Frequency table for 'race'
xtabs( ~ race, data = turnout)

## Cross table for 'race' and 'vote'
xtabs( ~ race + vote, data = turnout)

##  Cross table for 'race' and 'turnout' and 'gender'
cross_table_2 <- xtabs( ~ race + vote + gender, data = turnout)
cross_table_2

## ADVANTAGE: xtabs() allows for sums of other variables (not only frequencies)
xtabs(income ~ race + vote + gender, data=turnout)

3.2. Summary statistics

Summary statistics are used to summarize data using a few key statistics.

Let’s work with the full ‘turnout’ dataset.

rm(list=ls()) # Start with an empty workspace

library(“Zelig”) # Load package ‘Zelig’

data(“turnout”) # Load dataset ‘turnout’

3.2.1. Function ‘summary()’

# The simplest command to report summary statistics in R is 'summary()'. 
# Data should be passed into the function as data frame.
# Syntax: 'summary("data.frame.name")'

# Summary statistics for all variables
summary(turnout)

# Summary statistics  for 'income'
summary(turnout$income)

# Summary statistics for 'income' and 'educate'
# Equivalent notations: 'data.frame()' vs 'cbind()'
summary(data.frame(turnout$educate, turnout$income))

3.2.2. Function ‘apply()’

The apply() command applies a function to every row, column, or both of a matrix Syntax: ‘apply(matrix, MARGIN, FUNCTION)’, where * MARGIN = 1 applies function to every row, * MARGIN = 2 applies function to every colum, * MARGIN = c(1,2) applies function to both.

# Let's select the columns for schooling and income
head(turnout[,c("educate","income")])
# Mean of 'educate' and 'income'
apply(turnout[,c("educate","income")], MARGIN=2, FUN = mean)    # Mean


# There is a more general way to cast ANY function or set of functions, including user-defined functions and more complex ones.
# Syntax: 
#       'apply(data.frame, 
#                   MARGIN = c(1,2), 
#                   FUN = function(x) c(function.name1  = function(x,arguments), 
#                                       function.name2  = function(x,arguments),
#                                       function.name3  = function(x,arguments))
#             )

# Example: Mean of 'educate' and 'income'
apply(turnout[,c("educate","income")], MARGIN = 2, FUN = function(x) Mean = mean(x))

# Let's estimate the full set of summary statistics.
apply(turnout[,c("educate","income")], 
      MARGIN = 2, 
      FUN = function(x) c( Min = min(x) , FirstQu = quantile(x,probs=0.25) , Median = median(x), 
                           Mean = mean(x) , ThirdQu = quantile(x,probs=0.75) , Max = max(x))
      )

# Compare with
summary(data.frame(turnout$educate, turnout$income))

# NOTE: Keep in mind the following functions too 
colSums(turnout[,c("educate","income")])    # Column sums
colMeans(turnout[,c("educate","income")])   # Columns means

3.3. Summary statistics by groups

3.3.1. Function ‘tapply()’

The function ‘tapply()’ applies a function “FUN” to a vector “X” by group “INDEX” Syntax: tapply(vector, grouping variable, function)

#  Mean of income by 'race'
tapply(turnout$income, INDEX = turnout$race, FUN = mean)  



# or using the more general notation
tapply(turnout$income, INDEX = turnout$race, FUN = function(x) Mean = mean(x))  

#  Mean of 'income' by groups of 'race' and 'vote'
# The grouping variables must be provided as a LIST.
tapply(turnout$income, INDEX = list(turnout$race,turnout$vote),  FUN = function(x) Mean = mean(x))


#  Mean of 'income' and 'educate' by groups of 'race' and 'vote'.
# DISADVANTAGE: 'tapply()' only works with vectors, we have to estimate the mean of each variable separately.

# NOTE: The function 'by()' has the same functionality and structure than 'tapply()'
#  Mean of income by 'race'
by(turnout$income, turnout$race, FUN = function(x) Mean = mean(x))  
#  Mean of 'income' by groups of 'race' and 'vote'
by(turnout$income, list(turnout$race,turnout$vote),  FUN = function(x) Mean = mean(x))

# Note that the output is a list
# To show the results as a vector, you can use 'as.vector()'
as.vector(by(turnout$income, turnout$race, FUN = function(x) Mean = mean(x)))

3.3.2. Function ‘aggregate()’

More flexible than ‘tapply()’

It uses a formula notation, example x ~ y.

The operator ‘~’ means “with respect to”

The output is a data frame. Syntax: ’aggregate(formula, data, function)

head(turnout)


#  Mean of income by 'race'
aggregate(income~race, data=turnout, FUN = function(x) Mean = mean(x))  

#  Mean of 'income' by groups of 'race' and 'vote'
# Use the '+' operator to add variable to the formula
aggregate(income ~ race + vote, data=turnout,  FUN = function(x) Mean = mean(x))

#  Mean of 'income' and 'educate' by groups of 'race' and 'vote'.
# 'aggregate()' can estimate summary statistics for different variables at the same time.
# The input must be provided as a matrix.
# Use 'cbind()' to bind all relevant variables together.
aggregate(cbind(income,educate) ~ race + vote, data=turnout,  FUN = function(x) Mean = mean(x))
         
# DISADVANTAGE: The same functions must be applied to all variables.

** NOTE: From the previous section, remember that we can use the dplyr package and the group_by function to perform summary statistics in dataset**

EXERCISE 7 (Summary Statistics)

Using tapply calcualte the median income and the standard deviation by race
Using the apply function calculate the mean age and the mean income
The lapply function is the equivalent function for a list of elements. Create the following list:

4. Graphs in R

In this session we will learn how to create and export plots in R.

In particular, we will use three different set of functions for plotting:

The baseline plotting functions.
The plotting functions available in the package “ggplot2”

7.1. Built-in functions in R for Plots

7.1.1. Histograms

To plot an histogram, you can use the function ’hist()

?hist

Histogram for income

summary(turnout$income)
library(Zelig)
data(turnout)
hist(turnout$income,                  ## Vector of data 
     main = "Income distribution",    ## Title
     breaks = 6,                      ## Number of breaks
     freq = TRUE,                     ## Density (FALSE) or frequency (TRUE)
     xlim = c(0,15),                  ## X-axis limits
   # ylim = c()                       ## Y-axis limits 
     xlab = "Income",                 ## Label for X-axis
     ylab = "Frequencies",            ## Label for Y-axis
     col = "red",                     ## Color of bars
     border = "blue")                 ## Color of the border

# Note that different parameters can be set for different elements in a graphs.
hist(turnout$income,                  ## Vector of data 
     main = "Income distribution",    ## Title
     breaks = 6,                      ## Number of breaks
     freq = TRUE,                     ## Density (FALSE) or frequency (TRUE)
     xlim = c(0,15),                  ## X-axis limits
     xlab = "Income",                 ## Label for X-axis
     ylab = "Frequencies",            ## Label for Y-axis
     col = c("red","blue","red","red","red", "red", "red"),      ## Color of bars
     border = c("black","green","black","black","black","black","black")
     )

#par(ask=F)

7.1.2. Barplots

?barplot

# Bar plot of income and education means by race
# This should be provided as a matrix
educate_means <-tapply(turnout$educate,turnout$race,mean)
educate_means

##   others    white 
## 11.03767 12.24268

income_means <-tapply(turnout$income,turnout$race,mean)
income_means

##   others    white 
## 2.926734 4.050746

cbind(income_means,educate_means)

##        income_means educate_means
## others     2.926734      11.03767
## white      4.050746      12.24268

barplot(cbind(income_means,educate_means),             ## Data frame
        horiz=FALSE,                                   ## Horizontal or vertical lines
        beside=TRUE,                                   ## Stacked or grouped bar plot
        main = "Income and education means, by race",  ## Title
        sub = "Turnout database",                      ## Subtitle
        xlab = "Income and education means",           ## Label for X-axis
        ylab = "Values",                               ## Label for Y-axis
        col=c("darkblue","red"),                       ## Bar colors
        names.arg = c("Income","Education"),           ## Bar labels
        legend=c("Other","White"),                     ## Legend labels
        args.legend = c(x=2.5,y=12,cex = 0.9))         ## (x,y) = position of the legend

                                                       ## cex = font size (with respect to default size 1)

7.1.3. Boxplots

he function ‘boxplot()’ produces box-and-whisker plot(s) of the given (grouped) values.

?boxplot

# Note that it uses a formula notation, example y ~ x.
# Remember, the operator '~' means "with respect to"

boxplot(turnout$income~turnout$race,
        main="Income distribution by race", 
        col="green",
        xlab="0= others, 1=white")

### Adding points to a graph ###

### Imagine you also want to add a point for the mean of income by race.
### How can we do this?
### A graph is also a two dimensional object in R
### This means that we can add any object if we know how to provide the relevant coordinates.

### And plot our boxplot again
boxplot(turnout$income~turnout$race,
        main="Income distribution by race", 
        xlab="0= others, 1=white")
### Note: we can add a second "layer" to this graph.
### In particular, we will add a layer of points using the function 'points()'.
?points()

## starting httpd help server ...

##  done

### What should the coordinates be?
### Since we estimate a vector of size two (mean of income for each race), we need to provide two sets of coordinates.
### What are the coordinates for y? What are the coordinates for x?
points(x=c(1,2), y=income_means)
### But we can do better than that
points(c(1,2), income_means, pch = 8 , col = "red")

#dev.off()

7.1.4. Density plots

The function ’density() computes kernel density estimates.

?density

# Plot the densities for income by race group.
# Remember, we need to subsample data using conditionals

# Vector of income for race == "white"
turnout[turnout$race=="white","income"]
# Vector of income for race == "others"
turnout[turnout$race=="others","income"]

# Now we can estimate both densities
density(turnout[turnout$race=="white","income"])
density(turnout[turnout$race=="others","income"])

# We will save these densities in objects 'd1' and 'd2'
d1 <- density(turnout[turnout$race=="white","income"])
d2 <- density(turnout[turnout$race=="others","income"])

# Note that a density object can be plotted
plot(d1, main="density of income for race == white")

plot(d2, main="density of income for race == others")

## Can we plot them together?
## Note: Another way to make graphs in R, is to start with an empty plot and add elements to it.
## To do so, you need to set the limits of the x and y axis and tell R that you are plotting an empty graph.
?plot

## How can we set this limits?
## You can look at the graphs or use the function 'range()'
plot(
    x = c(0,16),                        # Range of values for x
    y = c(0,0.23),                      # Range of values for y
    type = "n",                         # Plot type
    xlab = "Income",                        # Label for X-axis
    ylab = "Density",                   # Label for Y-axis
    main = "Density of income by race"  # Main title
    )
    
### Adding lines to a graph ###
    
## In addition to points, you can also add lines to a plot using a function "lines()"

?lines
## Note the parameters for color and line width

lines(d1, col = "blue", lwd=2)
lines(d2, col = "red" , lwd=2)

### Adding legends to a graph ###

## In addition to points and lines, you can also add a legend using the function 'legend()'
## Note that the legend has a string vector of size 2
legend(
    "topright",                     # Position: can be denoted with a keyword **OR** x/y coordinates
    col=c("blue","red"),            # Colors
    lwd=c(2,2),                     # Line width
    legend=c("white", "others"))    # Legend labels

7.1.5. Scatter plots

Scatter plots are some of the most flexible graphs in R To draw a scatter plot, set the plot type to “points”.

?plot

Remember: once you open a graph window using plot(), you can add multiple points or lines to that graph.

R will keep adding elements to the graph until a new plot is called or the graph window is closed

###

plot(income~educate,                       ### Formula to plot, you can also give coordinates
     data=turnout,                         ### Database
     type = "p",                           ### Type: "p" = point; "l" = lines
     col = "blue",                         ### Color
     pch = 20,                             ### Symbol type.
     main = "Scatter plot income vs education"  ### title
     )

### Adding reference lines to a graph ###
     
## In addition to 'points', 'lines', and 'legends', you can add reference lines using the function 'abline()'
## You can add vertical or horizontal lines, fitted lines, or lines with different slopes and intercepts.
## Note the different options for line types.
?abline
abline(v=10.5, lwd=1.5, lty=3, col="brown")
abline(h=10, lwd=2, lty=2 , col= "black")
abline(lm(income~educate,data=turnout), lty=1, lwd=2.5,col="orange")

## We can also add a legend for the 3 reference lines.
## Note, now our legend is a string vector of size 3.
legend("topleft",                                         # Position
       col=c("brown","black","orange"),                   # Color
       lty=c(3,2,1), lwd=c(1.5,2,3),                      # Line type and width
       legend=c("Reference line 1", "Reference line 2", "Fitted line"),    # Labels
       cex = 0.9)                                         # Font size (proportion wrt 1)

### Adding text to a graph ###

# Now suppose you want to change the color of a particular observation
points(x=12, y=15, pch=20, col="red", cex = 1.5)

# You can also add 'text' to a plot using the function 'text()'
# Note the options for position.
?text
text(x=12, y=15, pch=20, label = "This is an outlier", pos = 4)

7.1.6. Parameters

These all the possible parameters you can add to a graph

names(par()) ?par

## 1.7.1. Parameter pch
plot(1:25, pch = 1:25, xlab = "Symbol Number", ylab="")
lines(1:25, type="h", lty="dashed")

*** Source: Maindonald, John, and W. John Braun (2010). Data Analysis and Graphics Using R. Cambridge University Press.***

## Parameters lwd and lty
plot(c(1,7), c(0,1), type="n", axes=FALSE, xlab="Line Type (lty)", ylab = "", frame.plot=TRUE)
axis(1, at = 1:6)
abline(v=c(1:6), lwd=c(1:6), lty=c(1:6) , col= "black", main="Line Type and Width")

## Parameter color
# There is a set of colors you can refer by number
palette()

# For example
plot(income~educate, data=turnout, type = "p", pch = 19, col = "red")

plot(income~educate, data=turnout, type = "p", pch = 19, col = 2)

## Nevertheless, R has more than 600 colors you can call by their names
## For example, to see all the colors available, type
colors()

### Can we plot all of them?

# We start with an empty plot
plot(0,0, xlim = c(0,26), ylim= c(0,26), type = "n",main="Main colors")
# Add the colors
points(x=rep(1:26,26),y=rep(c(26:1),rep(26,26)), col=c(colors()), pch=19)
# And add some numbers
text(x=rep(1:26,26),y=rep(c(26:1),rep(26,26)), label=paste(1:676), pos=3, cex = 0.3)

## Now we should be able to refer to any color by number
## For example
colors()[84]
plot(income~educate, data=turnout, col = colors()[84], pch = 20)

Another useful document Run the following code to visualize different parameters for symbols and text.

##################################################################################
############ Plot symbols and text ###############################################
##################################################################################
plot(0,0, xlim = c(0,13), ylim= c(0,19), type = "n",main="Plot symbols and text")
xpos <- rep((0:12) + 0.5, 2)
ypos <- rep(c(14.15,12.75),c(13,13))
points(xpos, ypos, cex = 2.5, col = 1:26, pch = 0:25)
text(xpos, ypos, labels = paste(0:25), cex = 0.75)
text(6, 16, labels = c("Parameters 'pch' and 'col'"), cex = 1.4)

## Plot characters, vary cex (expansion)
text((0:4) + 0.5, rep(9,5), letters[1:5], cex = c(1 ,1.5, 2, 2.5, 3))
text((0:4) + 0.5, rep(8,5), paste(c(1,1.5, 2, 2.5, 3)))
text(3, 6.5, labels = c("Parameter 'cex'"), cex = 1.4)

## Position label with respect to point
xmid <- 10.5
xoff <- c(0,-0.5, 0 , 0.5)
ymid <- 5.8
yoff <- c(-1, 0 , 1, 0)
col4 <- colors()[ c(52, 116, 547, 610) ]
points(xmid+xoff, ymid+yoff, pch=16, cex=1.5, col=col4)
posText <- c("below (pos = 1)", "left (2)", "above (3)", "right (4)")
text(xmid+xoff, ymid+yoff, posText, pos = 1:4) 
rect(xmid - 2.3, ymid - 2.3, xmid + 2.3, ymid + 2.3 , border="red")
text(10.5, 2.5, labels = c("Parameter 'pos'"), cex = 1.4)

Source: Maindonald, John, and W. John Braun (2010). Data Analysis and Graphics Using R. Cambridge University Press.

Note that you can save this graph by selecting it and using the drop down menu File -> Save As

7.1.7. Saving graphs

PLEASE MAKE SURE YOU CHANGED THE WORKING DIRECTORY

PLEASE CREATE THE FOLLOWING SUBFOLDERS IN YOUR WORKING DIRECTORY

tables
graphs

You can save your graphs as .jpeg or .pdf by using the functions jpeg() and pdf(). There are other formats also available for saving graphs.

### Saving graphs as jpeg
### The function 'jpeg()' opens an empty .jpeg file in a given location.
#?jpeg
#jpeg("graphs/graph1.jpeg") 

### It fills the document with the contents graphing window.
plot(income~educate, data=turnout, type = "p", col = "blue", pch = 20,   
     main = "Scatter plot income vs education")    
abline(lm(income~educate,data=turnout), lwd=3,col="red")

### The .jpeg file is printed when the Graph window is closed.
dev.off()                                      


### Saving graphs as pdf
### The function 'pdf()' opens an empty .pdf file in a given location.
#?pdf
#pdf("Graphs/graph1.pdf")                     

### It fills the document with the contents graphing window.
plot(income~educate, data=turnout, type = "p", col = "blue", pch = 20,   
     main = "Scatter plot income vs education")    
abline(lm(income~educate,data=turnout), lwd=3,col="red")

### The .pdf file is printed when the Graph window is closed.
dev.off()

7.1.8. Multiple graphs

Set the parameters for ‘mfrow’ or ‘mfcol’ to plot multiple figures in a single array.

Syntax: par(mfrow=c(nrow,ncol)) to plot multiple graphs in an nrow-by-ncol array head(turnout)

# par(mfrow=c(2,2))
#   plot(income~educate, data=turnout, main = "Income vs schooling")
#   plot(income~age, data=turnout, main= "Income vs age")
#   barplot(tapply(turnout$income,turnout$race,mean),names.arg = c("Other","White"),main = "Income by race")
#   barplot(tapply(turnout$income,turnout$gender,mean),names.arg = c("Male","Female"), main = "Income by gender")
# 
# head(turnout)

# Note that now all graphs are plotted in a two by two matrix. You can reset this by
#par(mfrow=c(1,1))
#or 
##dev.off()

7.2. ggplot2 Package

Intro to ggplot2

ggplot2 is a plotting system for R.

The ggplot2 package offers a powerful graphics language for creating complex plots.

The grouping variables can define the color, symbol, size, and transparency of observations.

?ggplot2

library(“ggplot2”)

library(ggplot2)

# For this session we'll only show the function 'qplot()', which stands for quickplot.
# Quickplot provides most of the functionality of more complex functions in ggplot.
# Syntax:

# qplot(
#       x,              # Variable x
#       y,              # Variable y
#       data =,         # Database
#       color =,        # It can be defined with respect to a grouping variable
#       shape =,        # It can be defined with respect to a grouping variable
#       size =,         # It can be defined with respect to a grouping variable
#       fill =,         # It can be defined with respect to a grouping variable
#       alpha =,        # Degree of transparency
#       geom =,         # Type of plot or 'geometry'.
#       method =,       # If adding a linear model or a lowess smooth
#       formula =,      # Formula for linear model (if adding one)
#       facets =,       # Conditioning variables, in formula notation: x~y, .~y, x~.
#       )
# Note: To set a numeric color, shape, size, or fill, use the function I(), which stands for "as is".

# # Histogram for income
# qplot(income, data= turnout, geom="histogram", main="Histogram for income (ggplot2)")
# # Histogram for income conditioning on race
# qplot(income, data= turnout, geom="histogram", facets = .~race, main="Histogram for income (ggplot2)")
# # Histogram for income conditioning on race and gender
# qplot(income, data= turnout, geom="histogram", facets = race~gender, main="Histogram for income (ggplot2)")
# 
# # Density for income
# qplot(income, data= turnout, geom="density", main="Density for income (ggplot2)")
# # Density of income with filling according to race
# qplot(income, data= turnout, geom="density", fill=race, alpha=I(0.5), main="Density for income (ggplot2)")
# # Density of income conditioning on gender with filling colors according to race
# qplot(income, data= turnout, geom="density", facets = .~gender, fill=race, alpha=I(0.5), main="Density for income (ggplot2)")
# 
# # Boxplot for income with respect to race
# qplot(race, income, data= turnout, geom = "boxplot", main="Boxplot for income by race")
# # Boxplot for income with respect to race with filling colors by race'
# qplot(race, income, data= turnout, geom = "boxplot", fill=race, main="Boxplot for income by race" )
# 
# # Bar plot of income by race
# qplot(race, data= turnout, geom = "bar", main="Bar plot for income by race", fill=race)
# # Bar plot of income by race conditioning on gender 
# qplot(race, data= turnout, geom = "bar", main="Bar plot for income by race", fill=race, facets = .~gender)
# dev.off()
# 
# # Scatter plot of income with respect to education
# qplot(educate, income, data= turnout, geom = "point", main="Income ~ schooling")
# # Scatter plot of income with respect to education, color defined by race
# qplot(educate, income, data= turnout, color = race, geom = "point", main="Income ~ schooling")
# # Scatter plot of income with respect to education, color and shape defined by race
# qplot(educate, income, data= turnout, color = race, shape = race, geom = "point", main="Income ~ schooling")
# # Scatter plot of income with respect to education, color, shape, and size defined by race
# qplot(educate, income, data= turnout, color = race, shape = race, size = race, geom = "point", main="Income ~ schooling")
# 
# ### Including linear fits and lowess curves
# # Scatter plot of income with respect to education, linear fit included
# qplot(educate, income, data= turnout, geom = c("point","smooth"), method="lm", formula=y~x, main="Income ~ schooling")
# # Scatter plot of income with respect to education, color defined by race, linear fit included
# qplot(educate, income, data= turnout, color = race, geom = c("point","smooth"), method="lm", formula=y~x, main="Income ~ schooling")
# # Scatter plot of income with respect to education, lowess curve included
# qplot(educate, income, data= turnout, geom = c("point","smooth"),main="Income ~ schooling")
# # Scatter plot of income with respect to education, color defined by race, lowess curve included
# qplot(educate, income, data= turnout, color = race, geom = c("point","smooth"), main="Income ~ schooling")

*** Source: Quick R. http://www.statmethods.net*** *** Additional references: http://docs.ggplot2.org/current/ggplot2: Elegant Graphics for Data Analysis (Use R!) by Hadley Wickham** Useful link: https://www.rstudio.com/wp-content/uploads/2015/12/ggplot2-cheatsheet-2.0.pdf**

Intro to R workshop, Session 2

Center for Computational Social Science (Stanford University)

August 26, 2016