UCF Statistical Computing Club

Robert Norberg, Edgard Maboudou
February 14, 2014

What is R?

  • A programming language
  • Evolved from S+, which came from Bell Labs
  • Free
  • Open Source
  • Object oriented

Pros/Cons of R

Pros

  • Free
  • Packages = Flexibility
  • Statistical language of choice for academics
  • Code means reproducibility
  • Incredible user community

Cons

  • Difficult to learn at first
  • Not always double and triple checked
  • Constantly evolving
  • No help line

What is RStudio?

An Interactive Development Environment (IDE) for R

What can R/RStudio do?

Who am I and what do I know about R/RStudio?

  • I am a graduate student (see Slide 1 - Pros - Free)
  • I do consulting work on the side for RTI International
  • I have a blog about R programming

Statistics

model <- lm(mpg~hp, data=mtcars)
summary(model)

Call:
lm(formula = mpg ~ hp, data = mtcars)

Residuals:
   Min     1Q Median     3Q    Max 
-5.712 -2.112 -0.885  1.582  8.236 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  30.0989     1.6339   18.42  < 2e-16 ***
hp           -0.0682     0.0101   -6.74  1.8e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.86 on 30 degrees of freedom
Multiple R-squared:  0.602, Adjusted R-squared:  0.589 
F-statistic: 45.5 on 1 and 30 DF,  p-value: 1.79e-07

Graphics

library(ggplot2)
ggplot(mtcars, aes(x=hp, y=mpg, color=gear))+
  facet_grid(.~cyl, labeller=label_both)+
  geom_point(size=8)+
  theme_bw(base_size=30)

plot of chunk unnamed-chunk-3

3D Graphics

library(lattice)
cloud(mpg~wt*qsec|factor(cyl), data=mtcars, layout=c(3,1,1), main="3D Scatterplot by Cylinders") 

plot of chunk unnamed-chunk-4

Dynamic Report Generation

  • Switch between plain text and code
  • Automatic citations
  • Compile the report to a PDF, docx, html, or just about anything else
  • More than one person working on the report? Use GitHub for version control!

A quick look at my blog for an example.

Slide Shows

This slideshow was made in R!

Making slideshows is very similar to making dynamic reports.

Interface With the Web

library(RCurl)
library(data.table)
myfile <- getURL("http://statistics.cos.ucf.edu/mjohnson/wp-content/uploads/2013/08/CH06PR09.txt", ssl.verifyhost=F, ssl.verifypeer=F)
mydat <- fread(myfile)
summary(mydat)
       V1             V2               V3             V4       
 Min.   :3998   Min.   :211944   Min.   :4.61   Min.   :0.000  
 1st Qu.:4193   1st Qu.:268759   1st Qu.:6.80   1st Qu.:0.000  
 Median :4316   Median :291271   Median :7.33   Median :0.000  
 Mean   :4363   Mean   :302693   Mean   :7.37   Mean   :0.115  
 3rd Qu.:4472   3rd Qu.:321906   3rd Qu.:7.94   3rd Qu.:0.000  
 Max.   :5045   Max.   :472476   Max.   :9.65   Max.   :1.000  

Apps

The Shiny package allows easy app creation.

Example

More

See CRAN Task Views to see what R packages are available for your next project.

An Example of Problem Solving in R

The Monty Hall Problem

Allow Kevin Spacey to explain.

Verify by Simulation

doors <- c(1:3)
correctDoor <- sample(doors, 1)
firstGuess <- sample(doors, 1)

Two Strategies

“Stubborn”

You make a first guess at random and no matter what the host does, you stick to your guns. You're sure he's only trying to bait you!

“Switch”

You pick a door at random, then the host reveals a door, eliminating a wrong answer. Then you switch from your original guess to the remaining door and hope for the best!

"Stubborn" Strategy

if(guess1==correctDoor{
  result <- 'Winner!'
}else{
  result <- 'Sorry :(')
}

"Switch" Strategy

  • The host must reveal what is behind one door after your first guess.
  • He cannot reveal the prize.
  • If your first guess is wrong, he must reveal the only remaining incorrect door.
if(guess1!=correctDoor){
  doorToReveal <- doors[!correctDoor & !guess1]
}

"Switch" Strategy

  • If your first guess is correct, the host may reveal either of the two remaining doors
if(guess1==correctDoor){
  doorToReveal <- sample(doors[!guess1], 1)
}

"Switch" Strategy

  • Finally, you switch from your first guess to whichever remaining door that has not been revealed
guess2 <- doors[!doorToReveal & !guess1]
if(guess2==correctDoor){
  result <- 'Winner!'
}else{
  result <- 'Sorry :(')
}

Do It Yourself!

I created a web app that generalizes this logic to any number of doors (up to 15) and the host may reveal any number of doors (up to n-2). See how the odds play out.

I'm sold! How do I learn to use R like a pro?

  • Take a class with Dr. Maboudou
  • Take a class on Coursera
  • Buy a book on statistical analysis using R
  • Hang out with me!

Sources

R Core Team (2013). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. .

Dowle M, Short T, Srinivasan SLwcfA and Saporta R (2013). data.table: Extension of data.frame for fast indexing, fast ordered joins, fast assignment, fast grouping and list columns.. R package version 1.8.10, .

Sources

Andrew A, Zvoleff A, Diggs B, Pereira C, Wickham H, Jeon H, Arnold J, Stephens J, Hester J, Cheng J, Keane J, Allaire J, Toloe J, Takahashi K, Kuhlmann M, Caballero N, Salkowski N, Ross N, Vaidyanathan R, Cotton R, Francois R, Brouwer S, Bernard Sd, Wei T, Lamadon T, Torsney-Weir T, Davis T, Zhu W, Wu W and Xie Y (2013). knitr: A general-purpose package for dynamic report generation in R. R package version 1.5, .

Xie Y (2013). Dynamic Documents with R and knitr. Chapman and Hall/CRC. ISBN 978-1482203530, .

Xie Y (2013). “knitr: A Comprehensive Tool for Reproducible Research in R.” In Stodden V, Leisch F and Peng RD (eds.), Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595, .

Sources

Boettiger C (2014). knitcitations: Citations for knitr markdown files. R package version 0.5-0, .

Francois R (2013). bibtex: bibtex parser. R package version 0.3-6, .

Wickham H (2009). ggplot2: elegant graphics for data analysis. Springer New York. ISBN 978-0-387-98140-6, .

Sources

Lang DT (2013). RCurl: General network (HTTP/FTP/…) client interface for R. R package version 1.95-4.1, .

Sarkar D (2008). Lattice: Multivariate Data Visualization with R. Springer, New York. ISBN 978-0-387-75968-5, .

RStudio and Inc. (2013). shiny: Web Application Framework for R. R package version 0.8.0, .