R-workshop-series-plan

P.Adames
July 13, 2020

The vision

Contribute a mini-series of R-centric workshops to the CalgaryR community with the following main goals:

  1. Explain the R programming language in a concise manner to beginners
  2. Introduce R as a tool for computation and data manipulation
  3. Highlight, not hide, what makes native R different and useful
  4. Introduce the abstractions that R makes possible by design
    1. Environments and scopes
    2. Closures
    3. Non-standard evaluation
  5. Show how modern R libraries allow data manipulation

The audience

Any body with an interest in learning an expressive computational language for statistical and data analysis, data science, and data visualization.

Any one curious about what modern R tools offer to solve old and new problems alike.

Anybody willing to put in some time to answer the question what makes R look so similar but sometimes behave so unexpectedly differently from most C-like languages.

The workshop's philosophy

Use the right tool for the job.

Not all [computer] languages are created equal.

Some are meant to just look prettier and sound almost like natural (English) language, others extremenly easy to use to the uninitiated but progressively more complex when used to solve specialized tasks, yet others are just expressive and focused and while docile in the hands of the expert, they can be harsh in the hands of the unprepared.

R-Workshop Series Plan

  • The basic data types in R (You need to know these four)
  • Vectorization (Did you know that R is like Matlab, a vectorized language, for data)
  • Environments (what they are and how they work)
  • Lexical scoping and what can do for you (use cases)
  • Libraries (all you ever wanted to know and never asked)
  • Reproducible R data analysis: Knitr-RStudio and Jupyter (via Anaconda)
  • Did you know R is a functional language? Here is why that can be good news.
  • Object-oriented R (S3 and S4 object models, can you code without ever knowing what they really are?)

R-Workshop Series Plan (Cont.)

  • The Tidyverse implementation of data analysis workflows, Part 1 (tidy data)
  • The Tidyverse implementation of data analysis workflows, Part 2 (table transformations, column-based ops)
  • The Tidyverse implementation of data analysis workflows, Part 3 (Mutate, Summarize, Group, Nest)
  • Part I: The Caret implementation of Machine Learning workflows
  • Part II: The Caret implementation of ML pipelines and recipes
  • R and DSLs (What's a DSL anyway?)
  • Part I: R Vs. Python. R native Vs. Python native
  • Part II: R Vs. Python. Tidyverse/Caret/ggplot Versus Numpy/Pandas/Sklearn/Matplotlib/SeaBorn
  • Part III: R Vs. Python. Use cases. Research/prototyping/production/packages/communities/documentation
  • Part IV: R Vs. Python. Notebooks/Jupyter/Spark/Kaggle/AWS

Here is a list of books used as reference fro these workshop series:

  • R in Action, Data Analysis and Graphics with R, 2nd Ed. Robert I. Kabacoff. Manning 2015
  • Advanced R. Hadley Wickham. CRC Press, 2015 ggplot, Elegant graphics for data analysis. Hadley Wickham. Springer, 2009
  • Text Mining in practice with R, Ted Kwartler. Wiley 2017
  • Probability, Decisions, and Games, a gentle introduction using R. Abel Rodriguez and Bruno Mendes. Wiley 2018
  • Statistical Data Cleaning with Applications in R. Mark van der Loo and Edwin de Jonge. Wiley 2018
  • Deep Learning with R. Francois Chollet with J.J. Allaire. Manning 2018