About these Courses
Welcome to Introduction to R. This three-day course will show you how to learn R
from doing it. We try to teach you skills useful for statistics and data science. Certainly, achieving these goals in 3 seminars is very difficult. There are also a lot more topics to cover. We hope at least you will find R
a nice tool as you explore data after these classes.
Installing R
and Packages
Please download R run on Mac OS X or Windows according to you operation system.
We would like to recommend RStudio to you after installing R
. It is a powerful IDE, but it also takes a lot of memory.
Ater you install R
and RStudio, you may find it useful to open a new project related to your working directory. From now on, you can easily work on R scripts and data without worrying about your working directory.
Please install the following packages (install.packages(“”)):
- dplyr: tidy data
- reshape2: long table
- ggplot2: visualization
- lattice: visualization
- foreign: read spss and Stata data.
- stargazer: output format
- interflex: graph of marginal effect by Yiqing Xu and his colleagues
- car: John Fox’ package, recode function
- ISLR: based on Hastie et al. (2008)
- UsingR: John Versani’s package
Data
All data and Rmarkdown is uploaded to My Github. You can either fork or clone the whole directory to your Github or your computer. To be fair, I am also new to Github. I think it will help your research a lot so you may want to get familiar with it.
Class html
I use Rmarkdown to write the class material. A html file is not easy to read but easy to update. Hopefully it won’t be changed too many times.
Resources
If you are a beginner of R
and struggle to convert to R
from SAS, Stata or SPSS, UCLA’s IDRE provides annotated results, learning modules, and important documentation for free. This website is well-known for its convenience and capacity.
The 2014 Data Scientist Conference (DSC) provided a series of html slides. The idea of ETL(Extract-Transform-Load)is the main theme of these slides. You can step-by-step install the DSC2014Tutorial
package and open them.
deps <- available.packages("http://taiwanrusergroup.github.io/R-2014/src/contrib")[1,"Imports"]
pkgs <- strsplit(gsub("\\s", "", deps), ",")[[1]]
for(i in seq_along(pkgs)) {
# You can change your favorite repository
if (require(pkgs[i], character.only = TRUE)) next
install.packages(pkgs[i], repo = "http://cran.csie.ntu.edu.tw")
}
install.packages('DSC2014Tutorial', repo = 'http://taiwanrusergroup.github.io/R-2014', type = 'source')
If you successfully install the package, you can load the library and open the slides.
library(DSC2014Tutorial)
slides("Basic")
slides("ETL1")
slides("ETL2")
slides("DataAnalysis")
slides("Visualization1")
slides("Visualization2")
slides("Visualization3")
Cookbook for R and Quick-R are worth of visiting if you need some quick help.
If you want to know how to apply R
to machine learning, classification, cross-validation and other topics, please download Trevor Hastie, Robert Tibshirani, and Jerome Friedman, 2008. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (https://web.stanford.edu/~hastie/Papers/ESLII.pdf)