email: jc3181 AT columbia DOT edu
This is a general outline/syllabus for the introductory R class that I’ll teach this Spring 2015 semester. The class meets for 2 hours each week, Mondays 4pm-6pm in my lab space - 352 Schermerhorn. The class is open to any grad students and post-docs in the Psychology Department plus undergraduates, research assistants and other guests who have contacted me beforehand.
What this class is…
The class is aimed at behavioral scientists (psychologists, epidemiologists, neuroscientists, ethologists, other-ists) who are used to handling data but usually via excel/SPSS. No prior knowledge of R is assumed and I hope that everyone who takes the class will become fully R-literate by the end of the semester (if not sooner).
The plan is to cover basic R programming, data analysis and visualization in the first half of the semester. After Spring Break, I will go into more detail about specific features of R that will lead to improved workflow/science/creativity/productivity.
What this class is not…
This class is not going to cover how to perform specific statistical analyses over and above the basic ones outlined below. There are other classes that cover more advanced statistical analyses (e.g. social network analyses, multi-level modeling, time-series analysis) - or you could read the vast resources that exist online or in books. However, this class will get you to the point where you will not feel intimidated by these online resources, but excited by them. You can always ask me about more advanced methods and I can hopefully answer your questions or tell you whom you should ask instead.
What you should bring to class…
your instructor…
This is me (on a good day):
Here is a link to my GitHub repository for this class. I’ll put .csv and .R files in there that you’ll need to use:
R is so awesome largely thanks to all the great people who give up their time, energy and resources to giving back to others. This is the so-called “R community”! The following are a list to some help guides that I have found useful. Use these and you won’t need to show up to my class:
Here is the general topic of each class plus some general themes/skills that I hope you’ll take away from each class.
The way I teach this class is such that there is a general progression with classes building upon one another, but everyone should be able to follow each class even if they have missed one of the previous ones. This is another way of saying that I will repeat myself a lot.
I would also add that, as we are behavioral scientists, most of us already have a fair bit of experience with data and data analysis. Therefore, I don’t teach this class the way that a programmer would. I’m assuming that most of us already know what we want to do - we just want to know how to do it most efficiently in R. Hopefully after we learn these ways, we’ll start to see how R can help us do even more things that we previously didn’t even think were possible !
The below is my working hypothesis as to how we will progress - but I’m happy to speed it up, slow it down, drop bits, add other bits - let’s just see how it works out. I know many students like to see a class that progresses from theme to theme, so I am following that pattern. However, my major issue with following a too linear route is that it misses the point of using R - which is that the sum of all of these skills is much greater than the individual part. Therefore I will always try to add little bits and pieces to each class to show how everything fits together. I may even spend 10-15 minutes right at the end of each class to do a very quick worked example putting together lots of different elements. If you don’t follow these bits don’t worry - but hopefully this will add something extra to some people.
dplyr package !
apply function familydplyr !data.table package
ggplot2 for plotting data
ggplot2 !
for loopswhile loopsrepeat and replicate
I’m not going to set the schedule for after Spring Break just yet, but rather will suggest various things that we could have an individual class on. I love them all - but perhaps we’ll see what people would like to do most. It might also be worth starting with a class or two after Spring Break where we work on example datasets from someone’s lab to put together a lot of the skills that we will have covered in the first half of semester.
Whilst by no means a ‘sexy’ topic, I think this is super-critical to good data science. Most people appreciate that sound data analysis and visualization are important, but these things can often be the things that take the least time in a study. The data that we collect or are given are often be messy, unstructured and full of typos and other formatting errors. Making sure your data is clean, free of errors, in a standard, reproducible format, can be painful. I’d love to talk about many of the options in R that will make this process almost fun - and will certainly save you hours and hours - if not days and weeks. Packages such as tidyr as well as dplyr, as well as the myriad of base-r and other package functions can make data reformatting a much smoother process. I promise that you will never have to cut and paste (or type) in excel again.
Static graphs that we put in journals can be pretty but dull. Let’s learn how to make exciting interactive visualizations that make our data jump out of the internet and become much more informative. More and more options for this are becoming available - two packages that we shall look at are ggvis and rCharts.
Something that should be emphasized more is reproducibility in our work. That doesn’t just mean somebody being able to follow how you performed your data analysis with your data - although that should be (in my opinion) easy. It also means being able to explain to your future self just how you did something! It’s not easy to remember in 6 months time what you did - you will save hours and days with better workflow! RMarkdown is a great way of intermixing written content with code and data visualizations to write reports as we go along. It’s a great way to share work with collaborators. Thanks to RStudio’s integration with RMarkdown, it’s also an awesome way of writing quick data summaries, how-to guides or other demonstrations and sharing them immediately with colleagues or the world on Rpubs. You can check some of mine out here. You’re also currently reading one - complete with excessive gif usage.
Taking interactivity to the next level is to write a shiny app - an awesome feature of RStudio. With these apps you can let collaborators or the world interact with your data to their hearts’ content. You can check some of mine out here out here. Or even better, just check out some of the really, really, cool stuff at the RStudio shiny gallery. I personally think that Shiny apps have the potential to be an amazing teaching resource.