R, a statistical programming language, is a great resource for introducing both basic programming skills and elementary data literacy. While the R language itself dates back to the 1990’s, it is package-based, meaning it is endlessly extendable. As a result, R is enjoying new life as a powerful tool for ‘big data.’
What is big data? Think of data too big to fit on an Excel spreadsheet without making your computer explode. Accessing big data doesn’t require that we download it all onto our machines; instead, we can use APIs to access the data that’s stored on large servers.
R is excellent for such a task, as most publicly available data APIs have an R package already made for them, along with basic documentation on how to use it. In other words, if you teach students how to access any API, they should then be able to use the same logic to access all APIs; the Spotify API for musical valence is interesting to combine with the Genius API for music lyrics, for example. New APIs go online almost daily, and it’s hard to find a topic that doesn’t have an API with publicly available data on the subject. Interested in Art? Behance and Dribbble both have APIs, as does the Rijksmuseum and the Harvard Art Museum; what are you looking for?
While Data Science can often be misunderstood as falling only into particular fields of research (Finance, Sports, Statistics), data is now a powerful tool in nearly every field, including some that at first may seem illogical (literature analysis, astronomy). Most R users come to the field from elsewhere in academia out of a necessity to crunch large data sets, resulting in a variety of packages from multiple areas of research. The examples used in this paper focus mostly around the fields of Communications and Media Studies, but the approach can be expanded to nearly any field.
Why R? Can’t I do the same things using Tableau? In theory, yes. But practically speaking, Tableau is designed to do most of the work for you once your data is cleaned: basic statistical calculations, visualizations, and presentation. While the convenience is nice, it comes at a cost: Tableau Creator, the only Tableau package that comes close to the power of R, costs $70/month.
How much is R? It’s free, cross-platform, and open source. The best textbooks for introductory R programming are also free. And one more thing: if you can harness the power or R for data science, you will be able to work with Tableau in your sleep: it’s the difference between using software and programming: one limits our creativity, while the other requires it.
Also, learning software is not as useful of a long-term approach as learning programming. While Tableau is enjoying popularity now, it will at some point be replaced with a new software package, which you’ll then need to learn from scratch.
Applications come and go – remember Flash? What about Director? – but even if a programming language evolves or becomes obsolete, you can still apply all of the programming concepts you’ve learned to other languages.
Speaking of other languages, what about Python? While Python can do nearly anything that R can, it is not a language solely dedicated to data science (you can make websites, applications, and all kinds of non-data-based content with Python). Most data scientists using Python rely heavily on Pandas, a Python package for data manipulation. While Python is extremely powerful and a great data science tool, I’d argue that starting with R is easier, since all it really does is work with data. Once a person knows R, picking up Python is much easier.
Still not convinced? R is cross-platform, and its most popular IDE, R Studio, is also free, cross-platform, and lightweight. Since R is package based, the size and functionality of your copy of the R language is customizable. Input and Output in R is extremely easy. R Studio can also publish and host content, code, and visualizations; and also create interactive apps based on your code – still all free. It’s a one-stop shop. Download R at r-project.org, and then install R Studio from rstudio.com, and you’re up and running.
There is a wide variety of free, publicly accessible online databases that cover a very broad range of topics, allowing an instructor to introduce varying forms of data (numerical, character, logical, temporal, etc.) or choose a more specific focus. There is also an active, curated repository of R packages called CRAN (Comprehensive R Archive Network) from which you can extend R and make it more powerful (also free).
The most established R developers have not only written free textbooks1 on how to use their ideas, but they also make packages that replace some of the original, or ‘base’ R programming techniques with cleaner, easier to understand approaches (the tidyverse, as it’s called). Put another way, the leaders of modern R proselytize for it, too – and learning R now is easier than it ever was.