Introduction to R

Building the R Toolchain For Data Science

Jason Freels

05 October 2016

PRESENTATION OVERVIEW

In this presentation...

The R Toolchain for Data Science

First, Let's Define Some Terms

A data scientist is someone who knows more about computer science than the average statistician AND more about statistics that the average computer scientist.

And unlike proprietary-source tools, you can use this toolchain anywhere

Components of the R Toolchain

The R Project For Statistical Computing

The RStudio IDE

\(\LaTeX\) - A complete installation is required to build pdf documents

Rtools - A set of compilers for building R packages (Windows only)

Git/GitHub

"It's become so important...that if GitHub goes down, the software development world practically stops."

Why Should You Use This Toolchain (1)?

You can merge ALL of your content into a single file

Censoring type Range Likelihood
\(d_{i}\) observations interval censored in \(t_{i-1}\) and \(t_{i}\) \(t_{i-1}<T\le t_{i}\) \([F(t_{i})-F(t_{i-1})]^{d_{i}}\)
\(l_{i}\) observations left censored at \(t_{i}\) \(T\le t_{i}\) \([F(t_{i})]^{l_{i}}\)
\(r_{i}\) observations right censored at \(t_{i}\) \(T>t_{i}\) \([1-F(t_{i})]^{r_{i}}\)

\[ \begin{array}{rrrrrrrrrrrr} \hline & mpg & cyl & disp & hp & drat & wt & qsec & vs & am & gear & carb \\ \hline Mazda RX4 & 21.0 & 6 & 160 & 110 & 3.90 & 2.62 & 16.46 & 0 & 1 & 4 & 4 \\ Mazda RX4 Wag & 21.0 & 6 & 160 & 110 & 3.90 & 2.88 & 17.02 & 0 & 1 & 4 & 4 \\ Datsun 710 & 22.8 & 4 & 108 & 93 & 3.85 & 2.32 & 18.61 & 1 & 1 & 4 & 1 \\ Hornet 4 Drive & 21.4 & 6 & 258 & 110 & 3.08 & 3.21 & 19.44 & 1 & 0 & 3 & 1 \\ Hornet Sportabout & 18.7 & 8 & 360 & 175 & 3.15 & 3.44 & 17.02 & 0 & 0 & 3 & 2 \\ Valiant & 18.1 & 6 & 225 & 105 & 2.76 & 3.46 & 20.22 & 1 & 0 & 3 & 1 \\ \hline \end{array} \]

Figure 2.6 - Likelihood contributions for different kids of censoring

Figure 2.6 - Likelihood contributions for different kids of censoring

x <- seq(0,2.4,by = .01)
y <- dweibull(seq(0,2.4,by = .01),shape = 1.7,scale = 1)
plot(   x = x, 
        y = y, 
     type = 'l', 
      lwd = 1.25, 
     xlab = 't', 
     ylab = 'f(t)', 
      las = 1)
polygon(  x = c(seq(0,0.5,.01),0.5),
          y = c(  dweibull(seq(0,0.5,.01),shape = 1.7,scale = 1),0), 
        col = 1)
polygon(  x = c(1,seq(1,1.5,.01),1.5),
          y = c(0,dweibull(seq(1,1.5,.01),shape = 1.7,scale = 1),0), 
        col = 1)
polygon(  x = c(2,seq(2,2.4,.01),2.4),
          y = c(0,dweibull(seq(2,2.4,.01),shape = 1.7,scale = 1),0), 
        col = 1)
text(x = .16,y = .75,'Left Censoring')
text(x = 1.3,y = .65,'Interval Censoring')
text(x = 2.2,y = .15,'Right Censoring')

Why Should You Use This Toolchain (2)?

Compile one input file to many different output formats

Why Should You Use This Toolchain (3)?

Add Animated plots and Interactive apps

Shiny

Shiny is a package for developing web-applications in R

Illustrating complex ideas

Presenting content and coding skills at the same time

Replacing lots of static plots

HTML Widgets

HTML widgets are R packages built around JavaScript libraries

HTMLwidgets can be combined with shiny

INSTALLING & BUILDING THE R/RSTUDIO TOOLCHAIN

Choose your own adventure

Still Confused?

But, I'm still having issues!


My toolchain IS set - Now what?