Homework 1: Introduction to R

This document reviews some preliminaries of R. We will discuss this document in class on January 22th, 2013. The assignment is due Tuesday January 29, submit a R script or a RMarkdown file via e-mail to Seth Spielman. If you are comfortable writing a RMarkdown file you may publish it to RPubs and send me the url (in lieu of e-mailing the file).

For help with RMarkdown see this link

A video walk through for this assignment is available here (If the link does not work, log into D2L and go to Content > Homework and in class exercises > Homework 1 Walkthrough, in 3 Parts.

Intro to R

R is a programming language. To use R you must type commands into the command line (the “>” symbol in the console window). In this assignment R code is surrounded by a grey box, the results of the R code are preceded by two hash symbols “##.” For example:

1 + 3
## [1] 4

The above example simply uses R like a calculator, the “code” is a simple arithmetic expression. It is possible to type R commands directly into the console but most people prefer to write a script and then send the commands to the console. In RStudio you can create a new script by going to File>New>R Script. Create a new Rstudio script now:

Type “1+3” into your new script window, in the upper right of the script window you should see a button that says “run” (the icon has a green arrow). Click this button, and the code from your script will be sent to the console.

This line of code just uses R as a glorified calculator. R is a calculator, but it also has a memory. In R we instantiate objects in the computer's memory, we give these objects names and this allows us to reference them. Generally speaking, objects are created using the assignment operator <-.

a <- 1 + 3

This creates an object called a that holds the result of the line of code you ran. Objects in R can be incredibly complicated- for example maps (shapefiles), data tables, and graphs are all types of R objects. The simple object a only holds the results of 1 + 3. To see the contents of an object one simply types its name into the console.

Two important things to note:

  1. Object names are case sensitive.
  2. The equal sign “=” is also an assignment operator, it is equivalent to the “<-” arrow assignment operator, but for clarity we will not use = as an assignment operator.

Using Functions in R

In R, functions accept objects as inputs, manipulate the inputs in some way, and return some output. For example, the function mean(object) would return the mean of an object (assuming the object was a list of numbers). The function c() is called the Combine Function and will combine a list of numbers (or words) into a new object.

## 'rain' contains actual rainfall data for Boulder, CO (2000-2011)
rain <- c(16, 18, 14, 22, 27, 17, 19, 17, 17, 22, 20, 22)

The object “rain” contains data, we can calculate some descriptive statistics:

mean(rain) #returns the average rainfall from 2000-2011 in Boulder, CO
sum(rain) #returns the total amount of rainfall during the study period
length(rain) #returns the length of the list, i.e. the number of years of data

Try using the functions sum() and length() to clacluate the mean amount of rainfall, check your answer using the mean function.

We can also calculate deviations from the mean for each year:

rain - mean(rain)  #Deviations from the mean; negative values indicate below average rainfall.
##  [1] -3.25 -1.25 -5.25  2.75  7.75 -2.25 -0.25 -2.25 -2.25  2.75  0.75
## [12]  2.75

We can use the assignment operator to save these deviations from the mean as a new object:

rainDeviations <- rain - mean(rain)

We can square these deviations or take their square root:

rainDeviations^2  #Squared deviations from the mean
##  [1] 10.5625  1.5625 27.5625  7.5625 60.0625  5.0625  0.0625  5.0625
##  [9]  5.0625  7.5625  0.5625  7.5625
sqrt(rain)  #Square root of rainfall values
##  [1] 4.000 4.243 3.742 4.690 5.196 4.123 4.359 4.123 4.123 4.690 4.472
## [12] 4.690

Conceptually, the standard deviation is like the average deviation from the mean. However, the average deviation from the mean is always zero. Thus, we calculate the standard deviation as:

\[ s_N = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \overline{x})^2} \]

The standard deviation is the Root Mean Square (RMS) of the deviations from the mean. The above formula can be broken down into a series of simple steps:
1. Calculate the deviations from the mean (see above R code).
2. Square the deviations from the mean, save the squared deviations as a new R object (use the “<-” assignment operator).
3. Take the mean of these squared deviations. Again, save the results as an object.
4. Finally, take the square root of the result from the prior step.

** Using the four steps above compute the standard deviation of the rainfall data.** You have the correct answer if you get something close to 3.4.

Writing functions in R

You now seen that functions accept object as inputs, manipulate those inputs, and return a value. You can very easily create your own functions in R:

## A function to calculate the mean of an object.
myMean <- function(someData) {
    return(sum(someData)/length(someData))
}

In the example above we create a function called “myMean”, which will calculate the mean of an object using the sum() and length() functions. The function should give results that are identical to the mean() function, these two lines should give equivalent results:

myMean(rain)
## [1] 19.25
mean(rain)
## [1] 19.25

It's important to realize that in the code used to create our function “someData” is just a place holder - not a real object. For example, typing myMean(someData) will produce an error:

myMean(someData)
## Error: object 'someData' not found

R gives an “object not found” error because we never created an object called “someData”, so R cannot find the object. In the function above, “someData” was just a place holder for the object that the user passes to the myMean function.

someFakeData <- c(1, 2, 3, 4, 5, 6)
myMean(someFakeData)
## [1] 3.5

The function mean() already exists so the myMean function that we just created is not that useful. However, what if we wanted to convert our rainfall data from inches to centimeters, a function to do this conversion might be useful. We can do this conversion simply in R:

rain/0.3937  #convert all rainfall values from in to cm.
##  [1] 40.64 45.72 35.56 55.88 68.58 43.18 48.26 43.18 43.18 55.88 50.80
## [12] 55.88

Now, complete the return statement below to create a function to convert inches to centimeters. Assume the input data are in inches and you want to return the same data converted to cm.

in_to_cm <- function(someDataInInches){
  return(_____) #complete the return statement 
}

Assignment

Write a function to compute the standard deviation, use the four steps outlined above. Do not use the internal sd() function to check your work because it uses \( \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \overline{x})^2} \), note the N-1 in the denominator. Your function is correct if you find a standard deviation of around 3.39.