Introduction to R

Math 217
Spring 2015

Why R?

Commonly used, increasing in popularity (source)

Why R?

Commonly used, increasing in popularity (source)

Why R?

Associated with high salaries (source)

Why R?

  • Open source
  • Has, arguably, the best graphics engine of statistical software packages
  • Only does what you ask
  • Widely used in industry, academia, and government

Rstudio

On your machine

Rstudio

On the server

A few keyboard shortcuts

In console

Description Windows/Linux Mac
Attempt code completion tab tab
Run current line/selection Ctrl + enter Command + enter
move cursor to console Ctrl + 2 Ctrl + 2

In editor

Description Windows/Linux Mac
Attempt code completion tab tab
move cursor to editor Ctrl + 1 Ctrl + 1
Navigate command history Up/Down Up/Down
Popup command history Ctrl + Up Command + Up

Advice

  • Treat learning R like learning a new language, it takes a while to learn the

    • grammar
    • syntax
    • vocabulary
  • You don't need to memorize all of the commands, make your life easier by using

    • help files
    • cheat sheets
    • Google

R as a calculator

The console can be used as a calculator

1+1
15.3 * 23.4 
5^5

R as a calculator

There are many built-in math functions available

pi
cos(pi)
exp(1)
abs(-6) 
factorial(4) 
sqrt(16) 

R as a calculator

You can store results as named variables for later use

product <- 15.3 * 23.4 # save result
product                # display the result
[1] 358.02

Now you can use the variable product

0.5 * product
[1] 179.01
log(product) # natural log
[1] 5.880589

Entering data by hand

If you have very few observations you can enter them by hand for each variable as a vector

a <- c(3, 5, 9, 7, 8, 1)
a
[1] 3 5 9 7 8 1
b <- 1:6
b
[1] 1 2 3 4 5 6

Entering data by hand

For plotting and modeling, we will want to deal with data.frame objects

df <- data.frame(a = a, b = b)
df
  a b
1 3 1
2 5 2
3 9 3
4 7 4
5 8 5
6 1 6

Reading in data sets

  • If you already have a data set saved, then you can simply load the data set into R.
  • Example: If you wanted to read in the tips.csv data set, then we can run the command

      tips <- read.table(file = file.choose(), sep = ",", header = TRUE)
    
    • go find the tips.csv file once you get the pop-up window

Reading in data sets

  • read.table is our workhorse function, and can read in numerous file types

  • for different file types you will need to specify different field separator characters:

    Separator Description
    sep = " " white space separated
    sep = "\t" tab separated
    sep = "," comma separated files (.csv)
  • Use header = TRUE if there are column names

Examining data sets

Try out the following commands

dim(tips)
str(tips)
head(tips)
summary(tips)

Extracting columns

We can access columns of the data set using a command of the following form

data$col.name

or by using the column number

data$[,col.num]

Tipping in Restaurants

Variable name Description
totbill total bill (in dollars)
tip tip (in dollars)
sex sex of the bill payer (M or F)
smoker whether there were smokers in the party (Yes or No)
day day of the week
time time of day (Day or Night)
size size of the party

Univariate graphics

Review

Variable type Plot suggestions
Categorical Bar chart
Quantitative Histogram or boxplot

Graphics in R

  • We will use the ggplot2 package for graphics
  • If you are using your personal computer, you will need to install this package before you use it the first time

    install.packages("ggplot2")
    
  • You will need to load this package at the beginning of each R session:

    library(ggplot2)
    

Univariate graphics

With only one variable, qplot guesses that you want a bar chart or a histogram

qplot(tip, data = tips)
qplot(day, data = tips)

To obtain a boxplot use the following command

qplot(x = factor(0), y = tip, data = tips, geom = "boxplot") + 
  xlab("") +
  scale_x_discrete(breaks = NULL) + 
  coord_flip()

Histograms

Change the bin width of histograms

qplot(tip, data = tips, binwidth = 1)
qplot(tip, data = tips, binwidth = 0.5)
qplot(tip, data = tips, binwidth = 0.01)

Univariate graphics

Adding other variables

You can use aesthetics or faceting to add additional variables to your plots.

qplot(tip, data = tips, binwidth = .5)
qplot(tip, data = tips, binwidth = .5, fill = sex)
qplot(tip, data = tips, binwidth = .5, fill = factor(size))
qplot(tip, data = tips, binwidth = .5) + 
  facet_wrap(~size)

Bivariate graphics

Review

Variable type Plot suggestions
2 quantitative Scatterplot
1 quant. & 1 cat. Side-by-side boxplots or faceted histogram
2 categorical Mosaic plots or segmented bar charts

Scatterplots

qplot(x = totbill, y = tip, data = tips, geom = "point")

qplot(x = totbill, y = tip, data=tips, geom = "point") + geom_smooth(method = "lm")

Side-by-side Boxplots

Let's look at tipping percentage now. First, we'll have to define the new variable

tips$tip.pct <- 100 * tips$tip / tips$totbill

Now, let's look at pairs of variables

qplot(x = sex, y = tip.pct, geom = "boxplot", data = tips)

qplot(x = smoker, y = tip.pct, geom = "boxplot", data = tips)

qplot(x = sex:smoker, y = tip.pct, geom = "boxplot", data = tips)