email: jc3181 AT columbia DOT edu

 

Introduction to R for Behavioral Scientists

This is a general outline/syllabus for the introductory R class that I’ll teach this Spring 2015 semester. The class meets for 2 hours each week, Mondays 4pm-6pm in my lab space - 352 Schermerhorn. The class is open to any grad students and post-docs in the Psychology Department plus undergraduates, research assistants and other guests who have contacted me beforehand.

 

What this class is…
The class is aimed at behavioral scientists (psychologists, epidemiologists, neuroscientists, ethologists, other-ists) who are used to handling data but usually via excel/SPSS. No prior knowledge of R is assumed and I hope that everyone who takes the class will become fully R-literate by the end of the semester (if not sooner).

The plan is to cover basic R programming, data analysis and visualization in the first half of the semester. After Spring Break, I will go into more detail about specific features of R that will lead to improved workflow/science/creativity/productivity.

 

What this class is not…

This class is not going to cover how to perform specific statistical analyses over and above the basic ones outlined below. There are other classes that cover more advanced statistical analyses (e.g. social network analyses, multi-level modeling, time-series analysis) - or you could read the vast resources that exist online or in books. However, this class will get you to the point where you will not feel intimidated by these online resources, but excited by them. You can always ask me about more advanced methods and I can hopefully answer your questions or tell you whom you should ask instead.

 

What you should bring to class…

 

your instructor…

This is me (on a good day):

 

 

Schedule:

Here is the general topic of each class plus some general themes/skills that I hope you’ll take away from each class.

The way I teach this class is such that there is a general progression with classes building upon one another, but everyone should be able to follow each class even if they have missed one of the previous ones. This is another way of saying that I will repeat myself a lot.

I would also add that, as we are behavioral scientists, most of us already have a fair bit of experience with data and data analysis. Therefore, I don’t teach this class the way that a programmer would. I’m assuming that most of us already know what we want to do - we just want to know how to do it most efficiently in R. Hopefully after we learn these ways, we’ll start to see how R can help us do even more things that we previously didn’t even think were possible !

The below is my working hypothesis as to how we will progress - but I’m happy to speed it up, slow it down, drop bits, add other bits - let’s just see how it works out. I know many students like to see a class that progresses from theme to theme, so I am following that pattern. However, my major issue with following a too linear route is that it misses the point of using R - which is that the sum of all of these skills is much greater than the individual part. Therefore I will always try to add little bits and pieces to each class to show how everything fits together. I may even spend 10-15 minutes right at the end of each class to do a very quick worked example putting together lots of different elements. If you don’t follow these bits don’t worry - but hopefully this will add something extra to some people.

 

Week 1: 26th Jan 2015 - Installing / Questions
  • installing R and R Studio
  • what do you guys want to get out of this class ?

 

Week 2: 2nd Feb 2015 - General Overview
  • Navigating around RStudio
  • Working directories
  • Importing data to RStudio
  • Brief overview of data formats/types
  • What are functions ? What are their inputs/outputs ?
  • Exporting R objects and data
  • What’s an R package ?

 

Week 3: 9th Feb 2015 - Data formats and datasets
  • Dataframes
  • Matrices
  • Lists
  • Vectors
  • Logical operators
  • Factors
  • Our first taste of the dplyr package !

 

Week 4: 16th Feb 2015 - Basic summary stats
  • How to generate basic summary stats
  • Writing simple custom functions
  • The apply function family
  • Grouping and aggregating datasets by factors
  • More dplyr !
  • Brief introduction to the data.table package

 

Week 5: 23rd Feb 2015 - Simple stats and basic plotting
  • Reshaping and restructuring data
  • learning ggplot2 for plotting data
  • Boxplots - yes please - show those data points!
  • Histograms
  • T-tests, One-Way ANOVAs, Wilcoxin Tests, Kruskal-Wallis tests
  • Conducting post-hoc tests
  • Testing for normality and equal variances
  • Correcting for unequal variances
  • Basic randomization techniques

 

Week 6: 2nd March 2015 - A little bit more stats and basic plotting
  • More ggplot2 !
  • Scatterplots
  • Parametric and non-parametric correlation tests
  • Simple linear regression
  • A tiny bit on multiple regression

 

Week 7 - 9th March 2015 - Writing Functions
  • Writing custom functions to improve data analysis and reproducibility
  • Specifying custom output of functions
  • Optional arguments
  • if else statements

 

Week 8 - 16th March 2015 - Writing (and trying to avoid writing) loops
  • Why and when to write a loop
  • Why and when to avoid writing a loop
  • for loops
  • while loops
  • using repeat and replicate

 

After Spring Break

I’m not going to set the schedule for after Spring Break just yet, but rather will suggest various things that we could have an individual class on. I love them all - but perhaps we’ll see what people would like to do most. It might also be worth starting with a class or two after Spring Break where we work on example datasets from someone’s lab to put together a lot of the skills that we will have covered in the first half of semester.

 

Data tidying and cleaning

Whilst by no means a ‘sexy’ topic, I think this is super-critical to good data science. Most people appreciate that sound data analysis and visualization are important, but these things can often be the things that take the least time in a study. The data that we collect or are given are often be messy, unstructured and full of typos and other formatting errors. Making sure your data is clean, free of errors, in a standard, reproducible format, can be painful. I’d love to talk about many of the options in R that will make this process almost fun - and will certainly save you hours and hours - if not days and weeks. Packages such as tidyr as well as dplyr, as well as the myriad of base-r and other package functions can make data reformatting a much smoother process. I promise that you will never have to cut and paste (or type) in excel again.

 

Interactive Data Visualization

Static graphs that we put in journals can be pretty but dull. Let’s learn how to make exciting interactive visualizations that make our data jump out of the internet and become much more informative. More and more options for this are becoming available - two packages that we shall look at are ggvis and rCharts.

 

RMarkdown

Something that should be emphasized more is reproducibility in our work. That doesn’t just mean somebody being able to follow how you performed your data analysis with your data - although that should be (in my opinion) easy. It also means being able to explain to your future self just how you did something! It’s not easy to remember in 6 months time what you did - you will save hours and days with better workflow! RMarkdown is a great way of intermixing written content with code and data visualizations to write reports as we go along. It’s a great way to share work with collaborators. Thanks to RStudio’s integration with RMarkdown, it’s also an awesome way of writing quick data summaries, how-to guides or other demonstrations and sharing them immediately with colleagues or the world on Rpubs. You can check some of mine out here. You’re also currently reading one - complete with excessive gif usage.

 

Building Shiny apps

Taking interactivity to the next level is to write a shiny app - an awesome feature of RStudio. With these apps you can let collaborators or the world interact with your data to their hearts’ content. You can check some of mine out here out here. Or even better, just check out some of the really, really, cool stuff at the RStudio shiny gallery. I personally think that Shiny apps have the potential to be an amazing teaching resource.

 

 

Writing your own package

It’s great that we can write all of these custom functions and scripts to analyze our data, but it can be annoying to have to load them all every time we restart R. What if we want to share them with other colleagues - emailing things back and forth can get messy. Writing and building our own R package is a great solution. We get to load up all our functions just by calling the package we made. If we store it on GitHub or CRAN, all our friends, colleagues and everybody else can load it too! Through RStudio and packages like roxygen writing and building a package is almost super-easy (I’ll settle with easyish). Following some simple guidelines we can write documentation and help-guides for the packges making sharing even easier! You can see some of my packages on my GitHub page.

 

Web-scraping

It takes us ages and ages to collect our data in our experiments! You can get data from the web in seconds ! We could do a class where we learn how to scrape data from the web using R packages such as XML and rvest.

 

Slide presentations in R

R provides alternatives for writing slide presentations to the traditional powerpoint slide show. There are pros and cons to using an R based slide show compared to powerpoint. I am most familiar with slidify. It provides very nice html5 based slide- shows that often look much nicer than powerpoint and crucially enable us to embed our code and figures from R. It’s also incredibly easy to post these slide shows to the web to share with others or for us for when we give the talk. The downside is that there is a bit of a learning curve and a lot of trial-and-error learning.

 

Writing flowcharts and diagrams with DiagrammeR

A new package called diagrammeR written by Rich Iannone is making it incredibly easy to make diagrams and flowcharts in R using high quality graphics. It’s fantastic and is incredibly useful for making high quality, reproducible, clean figures.

 

Text mining and analysis

This is by no means my ‘bread-and-butter’, but some of the resources available now for analyzing text passages for e.g. patterns, unique occurrences, and other types of frequency analysis are fantastic. I have followed the work of those who are very active in this field and would love to do a class on it - I know some of you use this method in your analyses. Ever wanted to text mine the complete works of Shakespeare ? - we can go there!

 

 

I hope that you’ll enjoy the class - and find it useful! I’m excited to talk about R with so many people. Any questions - just email or ask.