library(dplyr)
library(ggplot2)
library(reshape2)
library(gridExtra)
library(maps)

Tasks you will complete to begin accomplishing the requirements of reproducible research.

Script and output figures from the R tutorial
Article 1: What are the main results and conclusions? Describe in two paragraphs.
Blog Site: Setup your account at https://edublogs.org/ Make your site nice! 3a) Include a screen shot of your blog as a .jpg
Writing prompt: Where does your used water go? What is reproducible research?

Making a Project

Step 1:

Make a folder called Homework on the C: or the D: drive.

Step 2:

Within the homework folder, make a folder for each assignment called Assignment_1, Assignment_2, etc. Assignment_12,

Make a folder for each assignment up until Assignment_12.

Step 3:

Open RStudio.You may have to run RStudio as administrator to install the packages discussed below. To run RStudio as administrator, right-click the RStudio icon and click on ‘Run as administrator’.

Step 4a:

Make an R project You will want to make a new R project for each assignment. This allows all of your code and data to stay organized. Click on the drop down menu at the top right and click ‘new project’.

Step 4b:

Click on existing directory (Because you already made the folder in step 2).

Step 4c:

Browse for your assignment 2 folder. Click on Assignment_2 for the folder for this R tutorial. Just click on the folder (don’t go inside the folder). Then click on ‘Select Folder’.

Step 4d:

Click on ‘Create Project’ to create the “Assignment_2.Rproj” which manages all of the files for your assignemnt_2.

Step 5:

We are going to open an R markdown document I already made for you. The R markdown document was used to create the pdf file for this tutorial you are reading now!

Open the R markdown document called ‘Assignment_2.Rmd’ by clicking on it on the right pane under files.

Step 6:

Step 6 here is for future reference. You do not need to create a new document or script right now, but you may need to in the future.

To create a new R Markdown document: Click on ‘File’ -> ‘New File’ -> ‘R Markdown’ to create a new markdown document.

To create a new R script: Click on ‘File’ -> ‘New File’ -> ‘R Script’ to create a new R script.

Understanding R Markdown

Here are some important things to note about R Markdown.

The purpose of R Markdown is to make reproducible research. Read about Reproducible research on this blog here https://ropensci.org/blog/2014/06/09/reproducibility/

In one of our class sections we will discuss this blog in detail.

Make sure that these important packages are installed in order to use R Markdown documents and to create pdf and html documents: Knitr and RMarkdown.
If you get an error that a library is not installed, then install it using the Packages tab. Click on packages, then click on install. Type in the name of the library you need (for example knitr is required to make pdf documents from R markdown). Then click on install to install it. If it requires you to select a CRAN mirror, choose one in California like Berkeley or UCLA.

R markdown works by using ‘Chunks’. 3 sections are required at the beginning. Section 1 is the title, name, and date section. Section 2 is a chunk. It sets the options for the document. Section 3 is a chunk. It sets the libraries required for the code to work. Everything else outside of the chunk is just plain text in paragraph form, or bullets, or numbered sections like you see here.

R Markdown does not do spell checking!

To make a pdf or html document

To make a pdf or an html from your R markdown document:

First make sure that the ‘KnitR’ package and the ‘R markdown’ packages are installed.

Notice below that we are looking at actual R code. Notice how nicely it is formatted in this pdf. R markdown takes our R code and makes it easy to read.

Below is some code that will check if a package is installed or not. To run this ‘chunk’ of code, place your cursor within the chunk, and click on ‘Chunks’ -> ‘Run current chunk’. Or click on a specific line within the chunk and click on ‘Run’ to run just one line at a time.

# Here the '#' symbol is a comment

# This comment is referencing the function below called 'is.installed'. 
#This is a function to check whether a package is installed. 
#A function is code that requires input to do something specific. 
#A function can be generic because it can take any input you send to it

is.installed <- function(mypkg){
    is.element(mypkg, installed.packages()[,1])
} 

# check if package is installed
check <- "knitr" #package of interest
if (!is.installed(check)){
    install.packages(check)
    out<-"installing now"
    out
} else(out<-"package already installed")

## [1] "package already installed"

check <- "rmarkdown" #package of interest
if (!is.installed(check)){
    install.packages(check)
    out<-"installing now"
    out
} else(out<-"package already installed")

## [1] "package already installed"

Next, check to make sure that Pandoc and MikTex are installed. Just use the Start -> search menu in windows to search for them.

Once you have all of your packages loaded, Pandoc, and MikTex are installed, and you are sure that your code works, then it is time to make your pdf or html. Click on the ‘knit pdf’ down arrow at the top of the page to see the options. Click on ‘Knit PDF’. Give it a minute or so to compile the markdown document and create the pdf.

Sometimes you will receive error and warning message in the console. Try and debug yourself. It is usually bad code and/or packages that may cause the warning.

If this is the first time Knit PDF had been run, then MikTEX will want to install some packages. Make sure to install every package. It may not install if RStudio is not run as administrator. See section ‘Issues and Troubleshooting’

Turning in your assignment

You will turn in 3 final products for each homework assignment: a) .Rmd file b) .pdf file b) .html file

Turn in all three products as a package file called ‘Assignment_2_Newcomer.zip’ or ‘Assignment_1_Thompson.zip’. Just label the zip file with the appropriate assignment # and your own last name.

To make a zip file, ctrl-click the three items you want to send to the zip file, right-click those items, go to ‘send to’ and click on ‘Compressed (zipped) folder’.

What to do if you need help!

FIRST Check online to see if someone else has had the same problem and found a solution.
Google problems like “R chunk options” to figure out what echo=FALSE, cache=FALSE, or eval=FALSE mean.
Notice how the code above for echo=FALSE, cache=FALSE, or eval=FALSE was embedded inline the text. Google “R markdown inline code” to figure out how to do that.
Check with your TA or fellow classmates and collaborate!
The compiler for knit PDF will not work if Pandoc and MikTex are not installed.

In Case of problems with software

Here is all of the software and versions that should be installed:

RStudio-Desktop 0.98.1103
R-base version 3.2.0
Notepad++ 6.7.8
Pandoc 1.13.2
MikTeX 2.9.5105

Links to software

Rstudio http://www.rstudio.com/products/rstudio/download/
The R version is under ‘Download R for windows’ ‘base’ ‘Download 3.2.0’ http://cran.rstudio.com/
Notepad++ https://notepad-plus-plus.org/download/v6.7.8.html
Pandoc https://github.com/jgm/pandoc/releases/tag/1.13.2 The pandoc download is at the bottom of the listed webpage
MikTeX http://miktex.org/download

Rstudio Issues and Trouble shooting

Issue 1: Cannot compile pdf

*Check if knitr and rmarkdown packages are installed within R studio

*Check to make sure pandoc and MikTEX are installed on the computer

*Add chunks one-by-one to debug where the problem is

*Sometimes the problem may be the written paragraphs. As an example, the dollar sign can cause the code to fail if it is used in regular text. The “tree$C” does not cause the code to fail however. I understand this can be frustrating. Your job is to insert code and text little by little to help you debug.

Issue 2: Cannot install packages in Rstudio

Sometimes you might need to run RStudio as administrator to install the packages. To do this, right-click the RStudio icon to start the program and click on ’Run as administrator.

Issue 3: Administrator access

We may not be able to get administrator access. In that case, ask the IT person to come in and install the stuff for you.

Task 1:R Tutorial

I am following the code for the excellent R tutorial created by Kelly Black at the University of Georgia in the Department of Mathematics. http://www.cyclismo.org/tutorial/R/

1. Input

1.1 Assignment to a variable

The most straight forward way to store a list of numbers is through an assignment using the c command. (c stands for “combine.”) The idea is that a list of numbers is stored under a given name, and the name is used to refer to the data. A list is specified with the c command, and assignment is specified with the “<-” symbols. Another term used to describe the list of numbers is to call it a “vector.”

The numbers within the c command are separated by commas. As an example, we can create a new variable, called “bubba” which will contain the numbers 3, 5, 7, and 9:

bubba <- c(3,5,7,9)

When you enter this command you should not see any output except a new command line. The command creates a list of numbers called “bubba.” To see what numbers is included in bubba type “bubba” and press the enter key:

bubba

## [1] 3 5 7 9

If you wish to work with one of the numbers you can get access to it using the variable and then square brackets indicating which number:

bubba[2]

## [1] 5

bubba[1]

## [1] 3

bubba[0]

## numeric(0)

bubba[4]

## [1] 9

1.2 Reading a csv file

The command to read the data file is read.csv. We have to give the command at least one argument, but we will give three different arguments to indicate how the command can be used in different situations.

The first argument is the name of file.
The second argument indicates whether or not the first row is a set of labels.
The third argument indicates that there is a comma between each number of each line.

The following command will read in the data and assign it to a variable called “heisenberg:”. The variable ‘heisenberg’ is actually a data frame. A data frame is like a matrix, but can have a combinations of characters, strings, or numeric values.

#First set your working directory
# The slashes on a windows machine must always be forward /
dir<-"D:/Users/Michelle Newcomer/Dropbox/Stanford/Assignments/Assignment_2_RTutorial"
setwd(dir)

#Read in the csv file and assign to data frame called heisenberg
heisenberg <- read.csv(file="simple.csv",head=TRUE,sep=",")

To see what is inside of the data.frame called heisenberg, there are three things you can do.

type the variable name into the console
Type the variable name into a code chunk and run the code chunk
Click on the data.frame “heisenberg” under Data to see all of the data in a spreadsheet form.

heisenberg #this line here will show you all of the values

##   trial mass velocity
## 1     A 10.0       12
## 2     A 11.0       14
## 3     B  5.0        8
## 4     B  6.0       10
## 5     A 10.5       13
## 6     B  7.0       11

If the data frame is really large type head(heisenberg) or tail(heisenberg) to see just the first few entries.

head(heisenberg)

##   trial mass velocity
## 1     A 10.0       12
## 2     A 11.0       14
## 3     B  5.0        8
## 4     B  6.0       10
## 5     A 10.5       13
## 6     B  7.0       11

Here is code to look at a summary of the mean, median, etc. or each variable within the data.frame called ‘heisenberg’

summary(heisenberg)

##  trial      mass          velocity   
##  A:3   Min.   : 5.00   Min.   : 8.0  
##  B:3   1st Qu.: 6.25   1st Qu.:10.2  
##        Median : 8.50   Median :11.5  
##        Mean   : 8.25   Mean   :11.3  
##        3rd Qu.:10.38   3rd Qu.:12.8  
##        Max.   :11.00   Max.   :14.0

To get help with a specific comman, use help. Here I made the chunk eval=FALSE because this will direct it to an internet browser to get help. Type help(read.csv) in the console to get the help within RStudio.

help(read.csv)

If R is not finding the file you are trying to read then it may be looking in the wrong folder/directory. If you are using the graphical interface you can change the working directory from the file menu. If you are not sure what files are in the current working directory you can use the dir() command to list the files and the getwd() command to determine the current working directory:

dir()

##  [1] "1.png"                         "2"                            
##  [3] "2.png"                         "2.Rproj"                      
##  [5] "2.zip"                         "3.png"                        
##  [7] "4.png"                         "Assignment_2.pdf"             
##  [9] "Assignment_2_Solutions.pdf"    "Assignment_2_Solutions.Rmd"   
## [11] "Assignment_2a_RTutorial.Rproj" "basic-miktex-2.9.5105.exe"    
## [13] "BeginnersGuide.pdf"            "BeginnersGuide.Rmd"           
## [15] "Final"                         "pandoc-1.14.0.1-windows.msi"  
## [17] "Picture1.png"                  "Picture10.png"                
## [19] "Picture11.png"                 "Picture12.png"                
## [21] "Picture13.png"                 "Picture14.png"                
## [23] "Picture15.png"                 "Picture16.png"                
## [25] "Picture17.png"                 "Picture18.png"                
## [27] "Picture2.png"                  "Picture3.png"                 
## [29] "Picture4.png"                  "Picture5.png"                 
## [31] "Picture6.png"                  "Picture7.png"                 
## [33] "Picture8.png"                  "Picture9.png"                 
## [35] "Presentation1.pptx"            "simple.csv"                   
## [37] "tex2pdf.33064"                 "trees91.csv"                  
## [39] "urls.txt"

getwd()

## [1] "D:/Users/Michelle Newcomer/Dropbox/Stanford/Assignments/Assignment_2_RTutorial"

The data.frame “heisenberg” contains the three columns of data each of which is a variable. Each column is assigned a name based on the header (the first line in the file). You can now access each individual column using a “$” to separate the two names:

heisenberg$trial

## [1] A A B B A B
## Levels: A B

heisenberg$mass

## [1] 10.0 11.0  5.0  6.0 10.5  7.0

heisenberg$velocity

## [1] 12 14  8 10 13 11

If you are not sure what columns are contained in the variable you can use the names command:

names(heisenberg)

## [1] "trial"    "mass"     "velocity"

We will look at another example which is used throughout this tutorial. we will look at the data found in a spreadsheet located at http://cdiac.ornl.gov/ftp/ndp061a/trees91.wk1 . A description of the data file is located at http://cdiac.ornl.gov/ftp/ndp061a/ndp061a.txt . The original data is given in an excel spreadsheet. It has been converted into a csv file, trees91.csv , by deleting the top set of rows and saving it as a “csv” file. This is an option to save within excel. (You should save the file on your computer.) It is a good idea to open this file in a spreadsheet and look at it. This will help you make sense of how R stores the data.

The data is used to indicate an estimate of biomass of ponderosa pine in a study performed by Dale W. Johnson, J. Timothy Ball, and Roger F. Walker who are associated with the Biological Sciences Center, Desert Research Institute, P.O. Box 60220, Reno, NV 89506 and the Environmental and Resource Sciences College of Agriculture, University of Nevada, Reno, NV 89512. The data is consists of 54 lines, and each line represents an observation. Each observation includes measurements and markers for 28 different measurements of a given tree. For example, the first number in each row is a number, either 1, 2, 3, or 4, which signifies a different level of exposure to carbon dioxide. The sixth number in every row is an estimate of the biomass of the stems of a tree. Note that the very first line in the file is a list of labels used for the different columns of data.

The data can be read into a variable called “tree” in using the read.csv command:

tree <- read.csv(file="trees91.csv",header=TRUE,sep=",");

This will create a new variable called “tree.” If you type in “tree” at the prompt and hit enter, all of the numbers stored in the variable will be printed out. Try this, and you should see that it is difficult to make any sense out of the numbers.

There are many different ways to keep track of data in R. When you use the read.csv command R uses a specific kind of variable called a “data frame.” All of the data are stored within the data frame as separate columns. If you are not sure what kind of variable you have then you can use the attributes command. This will list all of the things that R uses to describe the variable:

attributes(tree)

## $names
##  [1] "C"      "N"      "CHBR"   "REP"    "LFBM"   "STBM"   "RTBM"   "LFNCC" 
##  [9] "STNCC"  "RTNCC"  "LFBCC"  "STBCC"  "RTBCC"  "LFCACC" "STCACC" "RTCACC"
## [17] "LFKCC"  "STKCC"  "RTKCC"  "LFMGCC" "STMGCC" "RTMGCC" "LFPCC"  "STPCC" 
## [25] "RTPCC"  "LFSCC"  "STSCC"  "RTSCC" 
## 
## $class
## [1] "data.frame"
## 
## $row.names
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 54

The first thing that R stores is a list of names which refer to each column of the data. For example, the first column is called “C”, the second column is called “N.” Tree is of type data.frame. Finally, the rows are numbered consecutively from 1 to 54. Each column has 54 numbers in it.

If you know that a variable is a data frame but are not sure what labels are used to refer to the different columns you can use the names command:

names(tree)

##  [1] "C"      "N"      "CHBR"   "REP"    "LFBM"   "STBM"   "RTBM"   "LFNCC" 
##  [9] "STNCC"  "RTNCC"  "LFBCC"  "STBCC"  "RTBCC"  "LFCACC" "STCACC" "RTCACC"
## [17] "LFKCC"  "STKCC"  "RTKCC"  "LFMGCC" "STMGCC" "RTMGCC" "LFPCC"  "STPCC" 
## [25] "RTPCC"  "LFSCC"  "STSCC"  "RTSCC"

If you want to work with the data in one of the columns you give the name of the data frame, a dollar sign, and the label assigned to the column. For example, the first column in tree can be called using “tree$C:”

tree$C

##  [1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3
## [39] 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4

The Rest of Your Assignment

Your task for the rest of this assignment is to go through the rest of the tutorial on http://www.cyclismo.org/tutorial/R/

Tips: 1) Add code chunks one by one and test to make sure the knit pdf compiles properly. Sometimes even regular text can cause the compiler to fail. If the compiler fails, then close R Studio, and re-open to reset everything. As an example, the dollar sign can cause the code to fail if it is used in regular text. The “tree$C” does not cause the code to fail however. I understand this can be frustrating. Your job is to insert code and text little by little to help you debug.

2 Basic Data Types

3 Basic Operations

Try to get all the way through the tutorial (sections 4,5,6,…18) until section 18.

18 Case Study II

Task 2:

Article 1: What are the main results and conclusions? Describe in two paragraphs.

Task 3:

Blog Site: Setup your account at https://edublogs.org/ Make your site nice!

Task 3a:

Include a screen shot of your blog as a .jpg. You will need to uncomment the code below in order for R markdown to compile your image. Uncomment the code for the ScreenShotBlog.png.

Here is how you uncomment the code:

Task 4:

Writing prompt: Where does your used water go? What is reproducible research? Write one paragraph.

An R and R Markdown Tutorial

Michelle Newcomer

Thursday July 23, 2015