Robert Norberg
Wednesday, Sep 02, 2015

Pros
Cons

An Integrated Development Environment (IDE) for R

model <- lm(mpg~hp, data=mtcars)
summary(model)
Call:
lm(formula = mpg ~ hp, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-5.7121 -2.1122 -0.8854 1.5819 8.2360
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.09886 1.63392 18.421 < 2e-16 ***
hp -0.06823 0.01012 -6.742 1.79e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.863 on 30 degrees of freedom
Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892
F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
library(ggplot2)
ggplot(mtcars, aes(x=hp, y=mpg, color=gear))+
facet_grid(.~cyl, labeller=label_both)+
geom_point(size=8)+
theme_bw(base_size=30)
A quick look at my blog for an example.
Making slideshows is very similar to making dynamic reports.
library(RCurl)
library(data.table)
myfile <- getURL("http://statistics.cos.ucf.edu/mjohnson/wp-content/uploads/2013/08/CH06PR09.txt", ssl.verifyhost=F, ssl.verifypeer=F)
mydat <- fread(myfile)
summary(mydat)
V1 V2 V3 V4
Min. :3998 Min. :211944 Min. :4.610 Min. :0.0000
1st Qu.:4193 1st Qu.:268759 1st Qu.:6.805 1st Qu.:0.0000
Median :4316 Median :291271 Median :7.325 Median :0.0000
Mean :4363 Mean :302693 Mean :7.371 Mean :0.1154
3rd Qu.:4472 3rd Qu.:321906 3rd Qu.:7.938 3rd Qu.:0.0000
Max. :5045 Max. :472476 Max. :9.650 Max. :1.0000
The Shiny package allows easy app creation.
See CRAN Task Views to see what R packages are available for your next project.
We can create an object, x, via assignment.
x <- 5
Then, when we ask for the value of x:
x
[1] 5
Logical operaters return boolean values.
x > 3
[1] TRUE
Conditional logic - a fancy term for if/else
if(x > 3){print('Yusssss')}
[1] "Yusssss"
x is a scalar. Objects can be lots of things, including vectors.
y <- c(1:10) # `c` is for concatenate
y
[1] 1 2 3 4 5 6 7 8 9 10
R is particularly adept at operating on vectors.
y > 5 # operates on each element of the vector
[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
Indexing allows us to retrieve individual elements from a vector.
y[1] # returns the first element of y
[1] 1
y[y > 5] # returns all elements of y where y > 5
[1] 6 7 8 9 10
Akin to macros in SAS, functions take input and return output.
sum(y) # sum() is a function
[1] 55
There is a function for just about everything in R. The sample() function randomly samples from a vector.
sample(y, 2) # sample from `y` two times
[1] 5 1
for(i in 1:5){
print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
The Monty Hall Problem
Allow Kevin Spacey to explain.
(Famously solved by Marilyn vos Savant in her Parade magazine coulmn)
"There are only two hard things in Computer Science: cache invalidation and naming things."
- Phil Karlton
doors <- c(1:3)
correctDoor <- sample(doors, 1)
guess1 <- sample(doors, 1)
Let's peak:
correctDoor; guess1
[1] 1
[1] 2
“Stubborn”
You make a first guess at random and no matter what the host does, you stick to your guns. You're sure he's only trying to bait you!
“Switch”
You pick a door at random, then the host reveals a door, eliminating a wrong answer. Then you switch from your original guess to the remaining door and hope for the best!
(Recall: correct door = 1, first guess = 2)
if(guess1==correctDoor){
result <- "Winner!"
}else{
result <- "Sorry :("
}
result
[1] "Sorry :("
if(guess1!=correctDoor){
doorToReveal <- doors[(doors!=correctDoor & doors!=guess1)]
}
(Recall: correct door = 1, first guess = 2)
If your first guess is correct, the host may reveal either of the two remaining doors.
if(guess1==correctDoor){
canReveal <- doors[doors!=guess1]
doorToReveal <- sample(canReveal, 1)
}
And the host reveals…
doorToReveal
[1] 3
(Recall: correct door = 1, first guess = 2)
Finally, you switch from your first guess to whichever remaining door that has not been revealed
remaining <- doors[(doors!=guess1 & doors!=doorToReveal)]
guess2 <- doors[remaining]
if(guess2==correctDoor){
result <- "Winner!"
}else{
result <- "Sorry :("
}
result
[1] "Winner!"
wins <- 0
for(i in 1:1e6){
... # insert "switch" strategy here
if(result=="Winner"){
wins <- wins + 1
}
}
wins/1e6
[1] 0.666485
Does this prove our hypothesis? Are you convinced?
What if there were 5 doors? (All the other rules remain the same)
Prove your answers via simulation and send me your code! First one to get it wins… something.