October 14, 2015
All source code at https://github.com/mine-cetinkaya-rundel/rworkshop-mem
Slides at http://rpubs.com/minebocek/117428
R: Statistical programming language
Both are free and open-source
Traditionally you would install R and RStudio on your computer
We will skip over that step for now for efficiency and use the RStudio server at
(Log in with your Duke Net ID and password)
The version of R is text that pops up in the Console when you start RStudio
To find out the version of RStudio go to Help \(\rightarrow\) About RStudio
It's good practice to keep both R and RStudio up to date
Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and (often) sample data. (From: http://r-pkgs.had.co.nz)
We will use the
ggplot2 package for plots and
dplyr for data wrangling in this workshop
Install these packages by running the following in the Console:
R Markdown is an authoring format that enables easy creation of dynamic documents, presentations, and reports from R.
It combines the core syntax of markdown (an easy-to-write plain text format) with embedded R code chunks that are run so their output can be included in the final document.
R Markdown documents are fully reproducible (they can be automatically regenerated whenever underlying R code or data changes).
Create your first R Markdown document, knit it, and examine the source code and the output.
File \(\rightarrow\) R Markdown…
Enter a title (e.g. "My first R Markdown document") and author info
Choose Document as file type, and HTML as the output
Markdown is a simple formatting language designed to make authoring content easy for everyone.
Rather than writing complex markup code (e.g. HTML or LaTeX), Markdown enables the use of a syntax much more like plain-text email.
Within an R Markdown file, R Code Chunks can be embedded using the native Markdown syntax for fenced code regions.
How many code chunks are in your R Markdown document?
What does each code chunk do? You may not understand the R syntax yet, but you should be able to compare the source file and the output to answer this question.
You can also evaluate R expressions inline by enclosing the expression within a single back-tick qualified with ‘r’. For example, the following code:
Results in this output: "I counted 2 red trucks on the highway."
Suppose Sammy works on average 8.37 hours per day, 5 days per week. How many hours does Sammy work on average per week?
Add a sentence to your document that includes simple inline R code that answers this question, along the lines of…
"Sammy works 8.37 * 5 hours per week, on average."
R Markdown workspace and Console workspace are independent of each other
If you define a variable in your Console and it shows up in the Environment tab, it is not going to be automatically included in your R Markdown document
If you define a variable in your R Markdown document, it won't automatically be available in your Console
[ Demo ]
Tip: Use the Run all previous chunks in the source file and Run current chunk code functionality in the buttons in each code chunk to help manage workspaces.
The fact that the two workspaces do not automatically have access to the same variables might / will be frustrating at first.
But this is not a bug, in fact, it's a functionality that helps reproducibility, as it ensures that all variables, functions, etc. that are being used in the R Markdown document are explicitly defined or loaded.
x = 2 in the Console. Then, in your Console run
x * 3. Does your code run as expected?
Now, insert a new code chunk in your R Markdown document and in this chunk type
x * 3 only. Knit your document. Does the document compile, or do you get an error? If you get an error, what does the error say, and how can you fix it? Implement the fix and Knit your document. Make sure you are able to compile without errors before you move on.
Tip: Insert a new code chunk bu clicking Chunks \(\rightarrow\) Insert Chunk.
Next insert another code chunk in your R Markdown document and define
y = 4 and calculate
y + 5. Knit your document. Does everything work as expected?
y + 5 in your Console. Does your code run as expected or do you get an error? If you get an error, what does the error say, and how can you fix it? Implement the fix.
You can hide the code, hide the result, hide warnings, messages, etc.
Refer to the handy R Markdown cheatsheet
Another good reference: http://rmarkdown.rstudio.com/authoring_rcodechunks.html
bike <- read.csv("https://stat.duke.edu/~mc301/data/nc_bike_crash.csv", sep = ";", stringsAsFactors = FALSE, na.strings = c("NA", "", ".")) %>% tbl_df()
View the names of variables via
##  "FID" "OBJECTID" "AmbulanceR" "BikeAge_Gr" "Bike_Age" ##  "Bike_Alc_D" "Bike_Dir" "Bike_Injur" "Bike_Pos" "Bike_Race" ##  "Bike_Sex" "City" "County" "CrashAlcoh" "CrashDay" ##  "Crash_Date" "Crash_Grp" "Crash_Hour" "Crash_Loc" "Crash_Mont" ##  "Crash_Time" "Crash_Type" "Crash_Ty_1" "Crash_Year" "Crsh_Sevri" ##  "Developmen" "DrvrAge_Gr" "Drvr_Age" "Drvr_Alc_D" "Drvr_EstSp" ##  "Drvr_Injur" "Drvr_Race" "Drvr_Sex" "Drvr_VehTy" "ExcsSpdInd" ##  "Hit_Run" "Light_Cond" "Locality" "Num_Lanes" "Num_Units" ##  "Rd_Charact" "Rd_Class" "Rd_Conditi" "Rd_Config" "Rd_Defects" ##  "Rd_Feature" "Rd_Surface" "Region" "Rural_Urba" "Speed_Limi" ##  "Traff_Cntr" "Weather" "Workzone_I" "Location"
and see detailed descriptions at https://stat.duke.edu/~mc301/data/nc_bike_crash.html.
By default R will convert character vectors into factors when they are included in a data frame.
Sometimes this is useful, sometimes it isn't – either way it is important to know what type/class you are working with.
This behavior can be changed using the
stringsAsFactors = FALSE when loading a data drame.