Yunkyu Sohn (Slides by Ethan Fosse)
Feburuary 15, 2017
Research Associate, Department of Politics
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
| Date | Topic |
|---|---|
| February 15 | Introduction to R and RStudio |
| February 22 | Data Wrangling in R |
| TBD | Base R Graphics |
| TBD | Data Visualization in R with ggplot2 |
| TBD | Programming Loops in R |
| TBD | Probability and Simulations in R |
| TBD | Monte Carlo Simulations in R |
| TBD | Text Analysis in R |
| TBD | Hypothesis Testing in R |
| TBD | Regression Analysis in R |
| TBD | Social Network Analysis in R |
Connect with Us:
Teaching Staff
Faculty Sponsors
MyFirstScript.RStates.RDataStatesHealth.dtaMyFirstMarkdown.RmdGo to: https://github.com/compass-workshops/IntroRWorkshop/blob/master/Data.zip
Did Obama or McCain win Ohio in 2008?
We'll use data to answer this question!
States.RData into our workspaceUsing RStudio's user-friendly interface:
C:/Folder/)You can also try this R Code:
load(C:/Folder/States.RData)
load() is a functionEnvironment tabclass() to see what we have!class(States)
View()View(States)
R functions: print(), head(), tail()
print(States)
head(States)
tail(States)
print() function?head() and tail()?str()str(States)
dim(), nrow(), ncol()dim(States)
nrow(States)
ncol(States)
rownames() and colnames()rownames(States)
colnames(States)
View(States)
The 1936 Election was a landslide (46 versus 2 states won):
Was 2008 a landslide? How many states did Obama win versus McCain?
Summarizing a data set using summary()
Try this R Code:
summary(States)
NE)?$) notation:Dataset$VariableStates$HouseholdIncome
States$Region
summary() function on single variables!summary(States$HouseholdIncome)
summary(States$Region)
class() and str()class(States$HouseholdIncome)
class(States$Region)
str(States$HouseholdIncome)
str(States$Region)
<-) to copy a data setNewDataset <- DatasetStatesCopy <- States
str(StatesCopy)
class(StatesCopy)
StatesCopy the same as States?<-” to create a standalone variableNewVariable <- Dataset$VariableHouseholdIncome <- States$HouseholdIncome
str(HouseholdIncome)
class(HouseholdIncome)
HouseholdIncome the same as States$HouseholdIncome?Environment tab to see all the variables and data sets in the R workspacels()rm()ls()
Population <- States$Population
summary(Population)
rm(Population)
Was 2008 a landslide? How many states did Obama win versus McCain?
R Code Hint:
Winner <- States$ObamaMcCain
summary(Winner)
Or you can just do:
summary(States$ObamaMcCain)
Winner?MyFirstScript.Rnrow(States)
ncol(States)
# symbolnrow(States) # number of rows
ncol(States) # number of columns
mean()median()mean(States$HouseholdIncome)
median(States$HouseholdIncome)
range()
sd()IQR()range(States$HouseholdIncome)
sd(States$HouseholdIncome)
IQR(States$HouseholdIncome)
hist()breaks hist(States$HouseholdIncome)
hist(States$HouseholdIncome, breaks=3)
hist(States$HouseholdIncome, breaks=15)
plot()plot(States$College, States$ObamaVote)
text() immediately after we use plot() we can add labels to the scatter plotlabels is used to specify the variable with the labels plot(States$College, States$ObamaVote)
text(States$College, States$ObamaVote, labels=States$State)
levels()To create a table of counts: table()
Try this R Code:
levels(States$Region)
table(States$Region)
W)?table()table(States$Region, States$ObamaMcCain)
W) and voted for McCain?plot()The height of the bars equals the number of observational units in each category (or level)
Try this R Code:
plot(States$ObamaMcCain)
mean(States$ObamaVote)
table(States$Region, States$ObamaMcCain)
[ , ] General format:
data[row, ]data[ , column]data[row, column]Ask yourself:
States.RData?data[row, ]View(States) # look for the 1st row
States[1, ]
States["Alabama", ]
data[ , column]View(States) # look for the 2nd column
States[ , 2]
States[ , "HouseholdIncome"]
data[row, column]View(States) # look for the 1st row and 2nd column
States[1, 2]
States["Alabama", "HouseholdIncome"]
c() function, which combines (or concatenates) a set of elementsStates[c(1, 5), ]
States[c("Alabama", "California"), ]
States[c(1, 5), c(3, 9)]
States[c("Alabama", "California"), c("McCainVote","College")]
What percentage voted for Obama in Mississippi compared to Massachusetts?
R Code Hint:
States[c("Mississippi", "Massachusetts"), c("ObamaVote")]
StatesHealth.dta
.dta extension)foreign() R packagetm package) to data visualzation (ggplot2 package) to data wrangling (dplyr package)Packages tab and click Installinstall.packages("package_name")
package_name is just the name of the R package in quotesinstall.packages("foreign")library(foreign)getwd()setwd()dir()read.dta()getwd()
setwd("C:/Folder/")
dir()
StatesHealth <- read.dta(StatesHealth.dta)
"C:/Folder/" should be changed to the location of the Stata data set on your computerView(StatesHealth)
head(StatesHealth)
tail(StatesHealth)
States.RData?.RData file, we can use save()dir()save(StatesHealth, file="StatesHealth.RData")
dir()
save.image()
save.image("Everything.RData").RData extensionBut we can the States data set as a Stata data set
Try this R Code:
write.dta(States, file="States.dta")
dir()
States.dta with Stata!.xlsx)
xlsxread.xlsx().sav)
foreignread.spss().xpt)
foreignread.xport()Did “healthier” states vote for McCain or Obama in 2008?
R Code Hint:
plot(StatesHealth$Obese, StatesHealth$ObamaVote)
text(StatesHealth$Obese, StatesHealth$ObamaVote,
labels=StatesHealth$State)
URL: https://compass-workshops.github.io/info/
Email List: Send an email to listserv@lists.princeton.edu with “Subscribe COMPASSWORKSHOPS” in the body and all other lines blank, including the subject
Please fill out this survey so we know how we can improve the workshop
MyFirstMarkdown.RmdThe initial chunk of text contains instructions for R
title: "My First Markdown"
author: "Ethan Fosse (COMPASS Workshops)"
date: "September 20, 2016"
output: html_document
MyFirstMarkdown.Rmd:# loading the R data set
# load("C:/Folder/States.RData")
# examining the data set
head(States)
nrow(States)
ncol(States)
{r} and ends with 3 single back quotes**bold**, and you make things italics by using single asterisks, like this: *italics*.mean(States$ObamaVote)
{r}ObamaMcCain <- States$ObamaMcCain
plot(ObamaMcCain)
# load("C:/Folder/States.RData")
# so that R will run this line of code load("C:/Folder/States.RData") to reflect the correct location of States.RData on your computerload() specifies the appropriate location