RonR! First seminar

Introduction to R and its strengths

Jorge Cimentada
5th of March of 2016

What is R?

  • R is a language and environment for statistical computing and graphics

  • An effective data handling and storage facility;

  • A large, coherent, integrated collection of intermediate tools for data analysis;

  • Graphical facilities for data analysis and display either on-screen or on hardcopy;

  • Simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

Do you know what General Public License or GNU is?

  • You have the freedom to run the program as you wish, for any purpose (freedom 0).

  • You have the freedom to study how the program works, and change it so it does your computing as you wish (freedom 1).

  • You have the freedom to redistribute copies so you can help your neighbor (freedom 2).

  • You have the freedom to distribute copies of your modified versions to others (freedom 3).

How does R work?

Getting started

  • Go to www.r-project.org
  • Downloads: CRAN
  • Set your Mirror: Anywhere in Spain is fine.
  • Select 'Download R for Windows'.
  • Select base.
  • Select R 3.2.3 for Windows
    • The others are if you are a developer and wish to change the source code.

Let's go to R!

R interface

Downloading Rstudio

How does Rstudio look?

R studio interface

What are R's strengths?

Adapted from 'R for Stata users' by Muenchen and Hilbe

  • Comprehensible repertoire of statistical techniques
  • R is directly accessible from many statistics packages like SAS, SPSS and Stata.
  • Every aspect of R is open for anyone to modify in any way they like.
  • R also offers the very flexible and powerful Grammar of Graphics approach. Developers have even gone so far as replacing R’s core graphical system.
  • R is free. This helps to attract developers and is a major reason that there are so many add-on packages for it.

R graphics - Teaching

Problem = Explanatory graphs for teaching

plot of chunk unnamed-chunk-1

R graphics - Teaching

Problem = Showing overlapping variation using distributions plot of chunk unnamed-chunk-2

R graphics - Teaching

Problem = Showing overlapping variation using Venn diagrams Venn Diagram

R graphics - Exploratory analysis

Problem = Nicely formatted barplot with minimum code

## Run this in your console
library(ggplot2)
ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar()

plot of chunk unnamed-chunk-3

R graphics - Exploratory analysis

Problem = Visualizing a bivariate relationship filtered by two variables

## Run this in your console
library(ggplot2)
qplot(wt, mpg, data=mtcars, size=qsec, color=factor(carb))

plot of chunk unnamed-chunk-4

R graphics - Exploratory analysis

Problem = Visualizing a bivariate relationship by a categorical variable

## Run this in your console
library(ggplot2)
qplot(mpg, wt, data = mtcars, facets = cyl ~ ., geom = c("point", "smooth")) + 
    coord_flip()

plot of chunk unnamed-chunk-5

R graphics - Exploratory analysis

Problem = Combine existing graphs in a nice format

## Do not run this. It's incomplete code.
multiplot(p1, p2, p3, p4, cols = 2)

plot of chunk unnamed-chunk-7

R graphics - Exploratory analysis

Problem = Visualize Likert type questions quickly likert

R graphics - Advanced

Problem = Visualizing statistical techniques

R graphics - Advanced

Problem = Plotting distributions next to scatterplots

R graphics - Advanced

Problem = Making graphical summaries

R graphics - Interactive graphs

Problem = Exploring your data visually in a fast way

## Run this in your console
library(openintro)
edaPlot(mtcars)

R graphics - Interactive graphs

Problem = Getting QUALITY visualizations with almost no work.

  • Let's go to the console to get to know ggvis

    Advantages of ggvis over other graph packages:

  • HTML and Javascript graphics are the ones actually doing the work behind ggvis.

For more graphs, go here

R graphics - Interactive graphs

Let's play around!

## Run this in your console
library(ggvis)
span <- waggle(0.2,1)
span <- left_right(0.2,1,step = 0.1)

mtcars %>%
    ggvis(~mpg, ~disp) %>%
    layer_lines() %>%
    layer_smooths(span=span)
  • First 'span' variable will adjust the smoothness automatically
  • Second 'span' variable will allow you to adjust smoothness with the left-right arrows

R graphics - Interactive graphs

Problem = Getting QUALITY visualizations exportable to websites

## Run this in your console
library(plotly)
set.seed(100)
d <- diamonds[sample(nrow(diamonds), 1000), ]
plot_ly(d, x = carat, y = price, text = paste("Clarity: ", clarity),
        mode = "markers", color = carat, size = carat)

For more, check out plotly

R graphics - Interactive graphs

R graphics - Interactive graphs

R graphics - Maps

plot of chunk unnamed-chunk-12

R graphics - Maps

R graphics - Maps

Crime density in Baltimore

R graphics - Maps

R graphics - Maps

R graphics - Maps

Problem = Showing relationships between data points R facebook map

R statistics - Summary statistics ouput

You can get summary statistics in text or in html format

## Run this in your console
library(stargazer)
stargazer(mtcars[c("mpg","hp","drat")], type = "text",
 title="Descriptive statistics/selected variables", digits=1, out="table2.txt", flip=TRUE,
 covariate.labels=c("Miles/(US)gallon","Gross horsepower","Rear axle ratio"))

R statistics - Regression output

## Run this in your console
library(stargazer)
mtcars$fast <- as.numeric((mtcars$mpg > 20.1))

m1 <- lm(mpg ~ hp + drat + factor(gear), data=mtcars)
m2 <- glm(fast ~ hp + drat + am, family=binomial(link="logit"), data=mtcars)

stargazer(m1, m2,type="text", dep.var.labels=c("Miles/(US) gallon","Fast car (=1)"), covariate.labels=c("Gross horsepower","Rear axle ratio","Four foward gears","Five forward gears","Type of transmission (manual=1)"))

Where did you think this presentation was done?

R can also produce written reports, PDFs' and Powerpoint presentations.

Checkout the fantastic Rmarkdown here and here

But of course, it's not built to write research papers, it's more for:

  • Blog posts;
  • Materials for training workshops;
  • Short consulting reports;
  • Exploratory analyses as part of a larger project;

There's also Powerpoint presentations with slidify

Exercises for your students

With the swirl package you can program exercises and homeworks that students can do anywhere.

Try it out yourself:

## Run this in your console
library("swirl")
swirl()

Thank you!