0: Install / Update R and RStudio

Before you do anything else, make sure you’ve got the latest versions of R and RStudio installed!

0.5 Update Language

If you want to change the laguage in RStudio to English, run the following code:

Sys.setenv(LANG = "en")

1: Create and view data objects

Everything in R is an object with a name. Everything you do in R is a function. Let’s create a new object called my.data some experimental data. The data is stored as a text file at the web link https://dl.dropboxusercontent.com/u/7618380/priming.txt

priming <- read.table(file = "https://dl.dropboxusercontent.com/u/7618380/priming.txt",
                      sep = "\t",   # File is tab-delimited
                      header = T    # There is a header row
                      )

You can look at the data using the View() function applied to the priming object

# Show the priming dataset in a 'spreadsheet'
View(priming)

You can get summary information about the data using the summary() function

# Calculate summary statistics of each column
summary(priming)
       id         sex         age            prime         time       
 Min.   :  1.00   f:53   Min.   :14.00   elderly:50   Min.   : 7.200  
 1st Qu.: 25.75   m:47   1st Qu.:21.00   neutral:50   1st Qu.: 9.375  
 Median : 50.50          Median :22.00                Median :10.300  
 Mean   : 50.50          Mean   :22.21                Mean   :10.337  
 3rd Qu.: 75.25          3rd Qu.:23.00                3rd Qu.:11.225  
 Max.   :100.00          Max.   :27.00                Max.   :14.200  

2: Calculate summary statistics

You can calculate simple statistics using functions like mean(), sd(), and table()

# Calculate some basic summary statistics of specific columns

mean(priming$age)   # What was the mean age of the participants?
[1] 22.21
median(priming$time)  # What was the median time?
[1] 10.3
table(priming$sex) # How many people were there of each sex?

 f  m 
53 47 

3: Plotting

Plotting is super easy in R!

# Simple histogram of age
hist(x = priming$age)

You can customize the look of your plot with additional plotting arguments

# Age histogram with additional arguments

hist(x = priming$age,
     border = "white", 
     col = "salmon",
     main = "Age Distribution",
     xlab = "Age",
     yaxt = "n",
     ylab = ""
     )

Here’s a boxplot:

# Boxplot of times by priming condition

boxplot(formula = time ~ prime, 
        data = priming,
        col = c("skyblue1", "tomato"),
        ylab = "Walking Times",
        xlab = "Priming Condition",
        main = "Walking times by priming condition (Study 1)"
        )

Now a scatterplot!

# Scatterplot of age and time

plot(x = priming$age,
     y = priming$time,     
     pch = 16,             # point type
     col = gray(.5, .5),   # point color
     cex = 2,              # point sizes
     bty = "n",            # remove outer box
     xlab = "Age",
     ylab = "Time",
     main = "Age and Walking times\nStudy 1"
     )

# Add a regression line!

abline(lm(time ~ age, data = priming),
       col = "red", 
       lty = 2, 
       lwd = 2
       )

4: Inferential statistics

Let’s do a t-test

# Was there a signficant effect of prime on time?

t.test(formula = time ~ prime,
       data = priming)

    Welch Two Sample t-test

data:  time by prime
t = 4.5241, df = 97.236, p-value = 1.716e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.6028459 1.5451541
sample estimates:
mean in group elderly mean in group neutral 
               10.874                 9.800 

Regression? No problem!

# Which variables affected walking times?

summary(lm(time ~ prime + sex + id + age,
           data = priming
           ))

Call:
lm(formula = time ~ prime + sex + id + age, data = priming)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2206 -0.8448  0.1198  0.8926  3.1407 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)  12.1310136  1.2640597   9.597  1.2e-15 ***
primeneutral -1.0052648  0.4713175  -2.133   0.0355 *  
sexm         -0.3636308  0.2707522  -1.343   0.1825    
id            0.0003651  0.0082321   0.044   0.9647    
age          -0.0512793  0.0579237  -0.885   0.3782    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.177 on 95 degrees of freedom
Multiple R-squared:  0.2111,    Adjusted R-squared:  0.1779 
F-statistic: 6.355 on 4 and 95 DF,  p-value: 0.0001417

5: Installing external packages

One of the best things about R is that anyone can program new functions and share them using packages. Let’s install two packages:

# Install some external packages
install.packages("BayesFactor")  # BayesFactor package: For Bayesian analyses
install.packages("devtools")
devtools::install_github("ndphillips/yarrr")   # yarrr package: For R piratery

Let’s do a Bayesian t-test using the BayesFactor package

library(BayesFactor)

# Bayesian t-test comparing time between priming conditions
ttestBF(formula = time ~ prime, 
                     data = priming)
Bayes factor analysis
--------------
[1] Alt., r=0.707 : 1068.152 ±0%

Against denominator:
  Null, mu1-mu2 = 0 
---
Bayes factor type: BFindepSample, JZS

Now let’s plot the posterior distribution of group differences:

bf.test <- BayesFactor::ttestBF(formula = time ~ prime, 
                                data = priming)

samples <- BayesFactor::posterior(bf.test, iterations = 10000)

hist(x = samples[,2],
     xlim = c(-2, 2),
     xlab = "Difference in walking times between groups",
     yaxt = "n",
     ylab = "",
     main = "Posterior distribution of group differences"
     )

Now let’s load the yarrr package and make a pirateplot!

# Load the yarrr package
library(yarrr)

Here’s a ‘pirateplot’ from the yarrr package

# Pirateplot of time by priming condition
yarrr::pirateplot(formula = time ~ prime,
           data = priming,
           theme.o = 3,
           xlab = "Priming Condition",
           main = "Walking times by Priming Condition\nRectangles are 95% HDIs"
           )

Here’s another pirateplot of a dataset on chickwen weights:

yarrr::pirateplot(formula = weight ~ Time,
           data = ChickWeight,
           theme.o = 3,
           xlab = "Week",
           main = "Weights of chickens by week\nRectangles are 95% HDIs"
           )

Want to learn more?

Now wasn’t that easy? To learn more about R, check out YaRrr! The pirate’s guide to R: www.thepiratesguidetor.com