1 About these Courses

  • Welcome to Introduction to R. This three-day course will show you how to learn R from doing it. We try to teach you skills useful for statistics and data science. Certainly, achieving these goals in 3 seminars is very difficult. There are also a lot more topics to cover. We hope at least you will find R a nice tool as you explore data after these classes.

  • 2 Installing R and Packages

  • Please download R run on Mac OS X or Windows according to you operation system.
  • We would like to recommend RStudio to you after installing R. It is a powerful IDE, but it also takes a lot of memory.
  • Ater you install R and RStudio, you may find it useful to open a new project related to your working directory. From now on, you can easily work on R scripts and data without worrying about your working directory.
  • Please install the following packages (install.packages(“”)):
    • dplyr: tidy data
    • reshape2: long table
    • ggplot2: visualization
    • lattice: visualization
    • foreign: read spss and Stata data.
    • stargazer: output format
    • interflex: graph of marginal effect by Yiqing Xu and his colleagues
    • car: John Fox’ package, recode function
    • ISLR: based on Hastie et al. (2008)
    • UsingR: John Versani’s package

    3 Data

  • All data and Rmarkdown is uploaded to My Github. You can either fork or clone the whole directory to your Github or your computer. To be fair, I am also new to Github. I think it will help your research a lot so you may want to get familiar with it.
  • 4 Class html

  • I use Rmarkdown to write the class material. A html file is not easy to read but easy to update. Hopefully it won’t be changed too many times.
  • 5 Resources

  • If you are a beginner of R and struggle to convert to R from SAS, Stata or SPSS, UCLA’s IDRE provides annotated results, learning modules, and important documentation for free. This website is well-known for its convenience and capacity.
  • The 2014 Data Scientist Conference (DSC) provided a series of html slides. The idea of ETL(Extract-Transform-Load)is the main theme of these slides. You can step-by-step install the DSC2014Tutorial package and open them.
  • deps <- available.packages("http://taiwanrusergroup.github.io/R-2014/src/contrib")[1,"Imports"]
    pkgs <- strsplit(gsub("\\s", "", deps), ",")[[1]]
    for(i in seq_along(pkgs)) {
      # You can change your favorite repository
      if (require(pkgs[i], character.only = TRUE)) next
      install.packages(pkgs[i], repo = "http://cran.csie.ntu.edu.tw")
    }
    install.packages('DSC2014Tutorial', repo = 'http://taiwanrusergroup.github.io/R-2014', type = 'source')
  • If you successfully install the package, you can load the library and open the slides.
  • library(DSC2014Tutorial)
    slides("Basic")
    slides("ETL1")
    slides("ETL2")
    slides("DataAnalysis")
    slides("Visualization1")
    slides("Visualization2")
    slides("Visualization3")
  • Cookbook for R and Quick-R are worth of visiting if you need some quick help.
  • If you want to know how to apply R to machine learning, classification, cross-validation and other topics, please download Trevor Hastie, Robert Tibshirani, and Jerome Friedman, 2008. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (https://web.stanford.edu/~hastie/Papers/ESLII.pdf)