Why use R?
- R is free and open, and can be installed on any environment
- Incredibly flexible
- Actively developed

- Amazing Graphics
- Reproducible Research
Yphtach Lelkes

a <- lm(obama ~ age, data = egss)
b <- lm(obama ~ age + pid, data = egss)
apsrtable(a, b, caption = "OLS Predicting Obama Favorability")


You can work directly in the console.
But working with source files lets you save your work.
History window will show you the list of commands.
cmd-enter/cntrl-enter in a mac will run the current line.
Try some simple math.
+,- #addition, subtraction
*,/ #multiplication, division
Normal rules of arithmetic apply.
2 + 2
## [1] 4
2 + 2 * 3
## [1] 8
(2 + 2) * 3
## [1] 12
log (10) # Natural logarithm with base e=2.7182
log10(5) # Common logarithm with base 10
52 # 5 raised to the second power
sqrt (16) # Square root
abs (3-7) # Absolute value
pi # 3.14
exp(2) # Exponential function
floor(15.9) # Rounds down
ceiling(15.1) # Rounds up
cos(.5) # Cosine Function
sin(.5) # Sine Function
tan(.5) # Tangent Function
acos(0.8775826) # Inverse Cosine
asin(0.4794255) # Inverse Sine
atan(0.5463025) # Inverse Tangent
Can integrate functions Create random samples from distributions (e.g., normal, binomial, poisson). Useful for writing examples and testing new tools.
rnorm(10, 0, 1)
rbinom(5, 1, 0.5)
rpois(3, 2)
hist(rnorm(10, 0, 1))
hist(rnorm(1e+05, 0, 1))
John Chambers on S, the precursor to R:
[W]e wanted users to be able to begin in an interactive environment, where they did not consciously think of themselves as programming. Then as their needs became clearer and their sophistication increased, they should be able to slide gradually into programming, when the language and system aspects would become more important
Lots of things are objects: data sets, plots, variables.
Let's start with variables.
<- assignment operator, something is something else
x <- 2
x
## [1] 2
x <- 2
x
## [1] 2
<- is preferable to =
x * x
## [1] 4
y = x + x
y
## [1] 4
o1 <- c(1, 2, 3, 4)
o2 <- c(4:1)
o1
o2
o1 * o2
weight <- c(77, 62, 54, 82, 72, 76, 75.34)
height <- c(1.75, 1.65, 1.56, 1.77, 1.82, 1.69, 1.72)
bmi <- weight/height^2
bmi
sum(weight)
length(weight)
sum(weight)/length(weight)
mean(weight)
data(cars)
cars
cars$speed
speedhist <- hist(cars$speed, breaks = 10)
plot(speedhist)
usefulfunction <- function(x) {
rep("ASCOR", x)
}
usefulfunction(5)
ls()
usefulfunction
rm(usefulfunction)
usefulfunction
ls.str() #list and describe
a <- c(1, 2, 3, 4)
b <- c(1, 3, 2, 5)
a > b
a == b
a != b
time1 <- sample(1:100, 25)
time2 <- sample(1:100, 25)
time1
time2
time1[time1 > 75]
time1[time1 > mean(time2)]
length(time1[time1 < time2])
time1[time1 < 10 | time1 > 90]
time1[time1 > 10 & time1 < 20]
time1[time1 >= time2] = NA
time1
numericvector <- c(1:10)
is.numeric(numericvector)
charactervector <- c("ascor", "r", "political communication")
is.character(charactervector)
logicalvector <- c(TRUE, FALSE, T, F, T, T)
is.logical(logicalvector)
is.numeric(logicalvector)
factorvector <- as.factor(sample(c("High Knowledge", "Low Knowledge"), 10, 1))
factorvector
levels(factorvector)
dv <- sample(1:10, 10)
lm(dv ~ factorvector)
factorvectornew <- relevel(factorvector, ref = "Low Knowledge")
levels(factorvectornew)
lm(dv ~ factorvectornew)
day1 <- as.Date("10/2/2012", format = "%m/%d/%Y")
day1 + 10
day2 <- as.Date("10/2/2013", format = "%m/%d/%Y")
day2 - day1
LETTERS
# What is the 16th element of the vector LETTERS
LETTERS[16]
# What is the 1ST through 3RD and the 10th element of the vector LETTERS
LETTERS[c(1:3, 10)]
# How long is the vector LETTERS
length(LETTERS)
Combination of vectors of the same length and type.
## I have a vector of numbers 1:36, I want to put them in a square 6 x 6
## matrix.
mymat <- matrix(1:36, nrow = 6, ncol = 6)
rownames(mymat) <- LETTERS[1:6]
colnames(mymat) <- LETTERS[7:12]
mymat
X[n,k] n indexes rows k indexes columns What is the number in the 4th row, 2nd column of matrix my mat?
i <- array(c(1:10), dim = c(3, 3, 2))
i[, , 1]
i[, , 2]
list(LETTERS, 1:10, 12)
Combinations of vectors of any type of the same length, but can be of different types.
Most likely you'll spend most of your time working with dataframes.
data(ChickWeight)
## Look at the top N (default=6) rows of the dataframe
head(ChickWeight, 10)
## Look at the top 6 rows of the dataframe
tail(ChickWeight)
## Names of the variables in the dataframe
names(ChickWeight)
## Number of ways to call a variable in a dataframe.
ChickWeight$Time
ChickWeight[, 2]
Some descriptive statistics.
mean(data$variable)
mean(data[,N])
with(data, mean(variable))
median()
summary()
max()
min()
range()
var()
var(ChickWeight$weight)
var(ChickWeight[, 1])
aggregate(measuredvariable~clustering variable,FUN=function,dataframe)
Let's say you want the mean for each level of a variable.
aggregate(weight ~ Diet, FUN = mean, ChickWeight)
## Diet weight
## 1 1 102.6
## 2 2 122.6
## 3 3 142.9
## 4 4 135.3
We'll use the plyr package in a bit that does this better.
table(data$x)
table(data$x,data$y)
table(data[,N])
table(data[,N1],data[,N2])
with(data, table(x))
prop.table(table(x))
prop.table(table(x,y),1) # Row Proportions
prop.table(table(x,y),2) # Column Proportions
data(ToothGrowth)
names(ToothGrowth)
tab <- table(ToothGrowth$dose, ToothGrowth$supp)
write.csv(tab, "toothtable.csv")
There are packages which will print out spss-, sas-, stata-style tables.
install.packages("plyr")
library(plyr)
library(help = "plyr")
`?`(ddply)
ddply(ToothGrowth, .(supp, dose), summarise, mean = mean(len))
Let's put a dataset into workspace. ANES 2006 Data ANES 2006 Codebook
install.packages("foreign")
library(foreign)
## Type read. and hit the tab key to see the types of data that can be
## read into R using the foreign package
nes06 <- read.spss("NESPIL06.por", to.data.frame = T)
t.test
chisq.test or summary(table())
wilcox.test
Are men signficantly more conservative then women?
Gender: V06P005 Political Interest: V06P630
levels(nes06$V06P630)
Hint: Political Interest must be made numeric. Have to remove levels 8 and 9.
nes06$interest <- as.numeric(nes06$V06P630) nes06$interest[nes06$interest>5]=NA
lmmodel <- lm(y~iv1+iv2,data)
anova(lmmodel)
What is the effect of gender on political interest, controlling for age: V06P006? Want to see significance?
summary(lmmodel)
or
install.packages("arm")
library(arm)
display(lmmodel,detail=T)
plot(lmmodels)
lmmodel <- lm(interest~age*gender)
library(effects)
effects(lmmodel,term="age*gender")
plot(effects(lmmodel,term="age*gender"))
?glm #Generalized Linear Models
e.g., glm(formula,family=)
*binomial, Gamma, gaussian, inverse,gaussian, poisson, quasi, quasibinomial, quasipoisson.
*link can be: "logit", "probit", "cauchit", "cloglog", "identity", "log", "sqrt", "1/mu2", "inverse"
polr() in MASS package for ordered probit
multinom in nnet package for multinomial logit
lot's of other stuff out there for count models, censored outcomes, survival models, hazard models, time series
schwartz <- nes06[,154:163]
levels(schwartz[,1])
schwartz <- data.matrix(schwartz)
schwartz[schwartz>5]=NA
#OR
nes06[,154:163] <- data.matrix(nes06[,154:163])
nes06[,154:163][nes06[,154:163]>5]=NA
##try
factanal(schwartz)
out <- factanal(na.omit(schwartz),factors=4,scores="regression")
names(out)
out$scores[,1]
Three popular choices.

