Learning Objectives:

1) R and RStudio

2) R as a scientific calculator

Launch RStudio and notice the default position of the panes (or panels):

FYI: you can change the default location of the panes, among many other things: Customizing RStudio. If you have no experience working with R/RStudio, you don’t have to customize anything right now. It’s better if you wait some days until you get a better feeling of the working environment. You will probably be experimenting (trial and error) some time with the customizing options until you find what works for you.

2.1) First contact with the R console

If you have never used software in which you have to type commands and code, our best suggestion is that you begin typing basic things in the console, using R as a scientific calculator.

For instance, consider the monthly bills of Leia (a fictitious stats undergrad student):

  • cell phone $80
  • transportation $20
  • groceries $527
  • gym $10
  • rent $1500
  • other $83

You can use R to find Leia’s total expenses by typing these commands in the console:

# total expenses
80 + 20 + 527 + 10 + 1500 + 83
## [1] 2220

Often, it will be more convenient to create objects or variables that store one or more values. To do this, type the name of the variable, followed by the assignment operator <-, followed by the assigned value. For example, you can create an object phone for the cell phone bill, and then inspect the object by typing its name:

# objects with assigned values
phone <- 80
phone
## [1] 80

All R statements where you create objects are known as “assignments”, and they have this form:

object <- value

this means you assign a value to a given object; you can read the previous assignment as “phone gets 80”.

RStudio has a keyboard shortcut for the arrow operator <-: Alt + - (the minus sign).

Notice that RStudio automagically surrounds <- with spaces, which demonstrates a useful code formatting practice. So do yourself (and others) a favor by ALWAYS surrounding an assignment operator with spaces.

Alternatively, you can also use the equals sign = as an assignment operator:

coffee = 30
coffee
## [1] 30

You will be working with RStudio a lot, and you will have time to learn most of the bells and whistles RStudio provides. Think about RStudio as your “workbench”. Keep in mind that RStudio is NOT R. RStudio is an environment that makes it easier to work with R, while taking care of many of the little tasks than can be a hassle.

2.2) Your Turn

#2.2.1 - Make more assignments to create variables transportation, groceries, gym, rent, and other with their corresponding amounts.

# create variables
transportation=20
groceries=527
gym=10
rent=1500
other=83

#2.2.2 - Now that you have all the variables, create a total object with the sum of the expenses.

# total expenses
total_expenses= phone + transportation + groceries +gym + rent + other
total_expenses
## [1] 2220

#2.2.3 - Assuming that Leia has the same expenses every month, how much would she spend during a school “semester”? (assume the semester involves five months).

# semester expenses
semester_expenses=5*total_expenses
semester_expenses
## [1] 11100

#2.2.4 - Maintaining the same assumption about the monthly expenses, how much would Leia spend during a school “year”? (assume the academic year is 10 months).

# year expenses
year_expenses=total_expenses*10
year_expenses
## [1] 22200

2.3) Object Names

There are certain rules you have to follow when creating objects and variables. Object names cannot start with a digit and cannot contain certain other characters such as a comma or a space. You will be wise to adopt a convention for demarcating words in names.

i_use_snake_case
other.people.use.periods
evenOthersUseCamelCase

The following are invalid names (and invalid assignments)

# cannot start with a number
5variable <- 5

# cannot start with an underscore
_invalid <- 10

# cannot contain comma
my,variable <- 3

# cannot contain spaces
my variable <- 1

This is fine but a little bit too much:

this_is_a_really_long_name <- 3.5

2.4) Functions

R has many functions. To use a function type its name followed by parenthesis. Inside the parenthesis you pass an input. Most functions will produce some type of output:

# absolute value
abs(10)
abs(-4)

# square root
sqrt(9)

# natural logarithm
log(2)

2.5) Comments in R

All programming languages use a set of characters to indicate that a specifc part or lines of code are comments, that is, things that are not to be executed. R uses the hash or pound symbol # to specify comments. Any code to the right of # will not be executed by R.

# this is a comment
# this is another comment
2 * 9

4 + 5  # you can place comments like this

2.6) Case Sensitive

R is case sensitive. This means that phone is not the same as Phone or PHONE

# case sensitive
phone <- 80
Phone <- -80
PHONE <- 8000

phone + Phone
## [1] 0
PHONE - phone
## [1] 7920

2.7) Your turn

Take your objects (i.e. variables) phone, transportation, groceries, gym, rent, and other and pass them inside the combine function c() to create a vector expenses.

2.7.1

# vector expenses
Vector_expenses=c(phone,transportation,groceries,gym,rent,other)
Vector_expenses
## [1]   80   20  527   10 1500   83

#2.7.2 Now, use the graphing function barplot() to produce a barchart of expenses:

barplot(Vector_expenses)

#2.7.3 You can see the manual (help) document of barplot() by typing ?barplot; look at the arguments of this function and learn how to: use col to color the bars

# barchart with colored bars
barplot(Vector_expenses,col="green")

#2.7.4 use , horiz to get a barchart in horizontal orientation

# barchart horizontally oriented
barplot(Vector_expenses,horiz = TRUE)

#2.7.5 use names.arg to add labels to the bars (with names of the variables below each of the bars)

# barchart with bar labels
barplot(Vector_expenses,names.arg = c("phone","transportation","groceries","gym","rent","other"),cex.names = 0.5)

#2.7.6 Find out how to use sort() to sort the elements in expenses, in order to produce a bar-chart with bars in decreasing order. Also optional, see if you can find out how to display the values of each variable at the top of each bar.

# barchart in decreasing/increasing order
descending=sort(Vector_expenses, decreasing = TRUE)
barplot(descending,names.arg = c("phone","transportation","groceries","gym","rent","other"),cex.names = 0.5)

3) Introduction to R Markdown files

Most of the times you won’t be working directly on the console. Instead, you will be typing your commands in some source file. The most basic type of source files are known as R script files. But there are more flavors of source files. A very convenient type of source file that allow you to mix R code with narrative is an R markdown file commonly referred to as Rmd file.

3.1) Get to know the Rmd files

In the menu bar of RStudio, click on File, then New File, and choose R Markdown. Select the default option (Document), and click Ok.

Rmd files are a special type of file, referred to as a dynamic document, that allows to combine narrative (text) with R code. Because you will be turning in most homework assignments as Rmd files, it is important that you quickly become familiar with this resource.

Locate the button Knit HTML (the one with a knitting icon) and click on it so you can see how Rmd files are renderer and displayed as HTML documents.

R markdown files use a special syntax called markdown. To be more precise, Rmd files let you type text using either: 1) R syntax for code that needs to be executed; 2) markdown syntax to write your narrative, and 3) latex syntax for math equations and symbols.

You will have time to learn the basics of this syntax during the semester, and we expect that feel comfortable with markdown at the end of the course.

3.2) Your turn

Open a new Rmd file in the source pane,and include all the previous commands in separated code chunks. Knit the file to get an html document. These are the files that you have to submit to bCourses when you finish the lab work.

4) Installing Packages

R comes with a large set of functions and packages. A package is a collection of functions that have been designed for a specific purpose. One of the great advantages of R is that many analysts, scientists, programmers, and users can create their own pacakages and make them available for everybody to use them. R packages can be shared in different ways. The most common way to share a package is to submit it to what is known as CRAN, the Comprehensive R Archive Network.

You can install a package using the install.packages() function. To do this, we recommend that you run this command directly on the console. Do NOT include this command in a code chunk of an Rmd file: you will very likely get an error message when knitting the Rmd file.

To use install.packages() just give it the name of a package, surrounded by qoutes, and R will look for it in CRAN, and if it finds it, R will download it to your computer.

# installing (run this on the console!)
install.packages("knitr")

You can also install a bunch of packages at once:

# run this command on the console!
install.packages(c("readr", "ggplot2"))

Once you installed a package, you can start using its functions by loading the package with the function library(). By the way, when working on an Rmd file that uses functions from a given package, you MUST include a code chunk with the library() command.

# (this command can be included in an Rmd file)
library(knitr)

5) Getting Help

Because we work with functions all the time, it’s important to know certain details about how to use them, what input(s) is required, and what is the returned output.

There are several ways to get help.

If you know the name of a function you are interested in knowing more, you can use the function help() and pass it the name of the function you are looking for:

# documentation about the 'abs' function
help(abs)

# documentation about the 'mean' function
help(mean)

Alternatively, you can use a shortcut using the question mark ? followed by the name of the function:

# documentation about the 'abs' function
?abs

# documentation about the 'mean' function
?mean
  • How to read the manual documentation
    • Title
    • Description
    • Usage of function
    • Arguments
    • Details
    • See Also
    • Examples!!!

help() only works if you know the name of the function your are looking for. Sometimes, however, you don’t know the name but you may know some keywords. To look for related functions associated to a keyword, use double help.search() or simply ??

# search for 'absolute'
help.search("absolute")

# alternatively you can also search like this:
??absolute

Notice the use of quotes surrounding the input name inside help.search()

5.1) Your Turn: Pythagoras formula

The pythagoras formula is used to compute the length of the hypotenuse, \(c\), of a right triangle with legs of length \(a\) and \(b\).

\[ c = \sqrt{a^2 + b^2} \]

#5.1.1 Calculate the hypotenuse of a right triangle with legs of length 3 and 4. Use the sqrt() function, and create variables a = 3 and b = 4. If you don’t know what’s the symbol to calculate exponents, search for the help documentation of the arithmetic operators: ?Arithmetic.

length1=3
length2=4
hypotenuse=sqrt(length1^2+length2^2)
hypotenuse
## [1] 5

5.2) Your Turn: Binomial Formula

The formula for the binomial probability is:

\[ Pr(k; n, p) = Pr(X = k) = {n \choose k} p^k (1-p)^{n-k} \]

where:

  • \(n\) is the number of (fixed) trials
  • \(p\) is the probability of success on each trial
  • \(1 - p\) is the probability of failure on each trial
  • \(k\) is a variable that represents the number of successes out of \(n\) trials
  • the first term in parenthesis is not a fraction, it is the number of combinations in which \(k\) success can occur in \(n\) trials

R provides the choose() function to compute the number of combinations:

\[ {n \choose k} = \frac{n (n-1) \cdots (n - k +1)}{k (k-1) \cdots 1} \]

For instance, the number of combinations in which \(k\) = 2 success can occur in \(n\) = 5 trials is:

choose(n = 5, k = 2)
## [1] 10

Combinations are typically expressed in terms of factorials as:

\[ \frac{n!}{k! (n - k)!} \]

Conveniently, R also provides the function factorial() to calculate the factorial of an integer:

factorial(4)
## [1] 24

Let’s consider a simple example. A fair coin is tossed 5 times. What is the probability of getting exactly 2 heads?

#5.2.1 - Create the objects n, k, and p for the number of trials, the number of success, and the probability of success, respectively.

n=5
k=2
p=1/2

#5.2.2 - Use factorial() to compute the number of combinations “n choose k

combinations=factorial(n)/(factorial(k)*factorial(n-k))
combinations
## [1] 10

#5.2.3 - Apply the binomial formula, using factorial(), to calculate the probability of getting exactly 2 heads out of 5 tosses.

prob=combinations*(p^k)*(1-p)^(n-k)
prob
## [1] 0.3125

#5.2.4 - Recalculate the same probability but now using choose() (instead of factorial())

prob2=choose(n,k)*(p^k)*(1-p)^(n-k)
prob2
## [1] 0.3125

#5.2.5 - Consider rolling a fair die 10 times. What is the probability of getting exactly 3 sixes?

n=10
k=3
p=1/6
prob3=choose(n,k)*(p^k)*(1-p)^(n-k)
prob3
## [1] 0.1550454

#5.2.6 - Now look for help documentation (e.g. help.search() or ??) using the keyword binomial: binomial.

help.search('binomial')
## starting httpd help server ... done

#5.2.7 - You should get a list of topics related with the searched term binomial.

  • Choose the one related with the Binomial Distribution, which is part of the R package stats (i.e. stats::Binomial).

  • Read the documentation and figure out how to use the dbinom() function to obtain the above probabilities: 2 heads in 5 coin tosses, and 3 sixes in 3 rolls of a die.

#2 heads in 5 coin tosses
coin_dbinom=dbinom(2,5,1/2)
coin_dbinom
## [1] 0.3125
# 3 sixes in 3 rolls of a die
die_dbinom=dbinom(3,3,1/6)
die_dbinom
## [1] 0.00462963
  • How would you modify the previous binomial function to calculate the same probability (2 heads in 5 tosses) of a biased coin with a chance of heads of 35%?
#biased coin
biased_dbinom=dbinom(2,5,0.35)
biased_dbinom
## [1] 0.3364156
  • Finally, obtain the probability of getting more than 3 heads in 5 tosses with a biased coin of 35% chance of heads.
biased_dbinom_2=dbinom(4,5,0.35)+dbinom(5,5,0.35)
biased_dbinom_2
## [1] 0.0540225

5.3) Your turn

This part doesn’t need to be included in your Rmd file. Instead, type commands directly on the console:

#5.3.1

  • On the console, install packages "stringr", "RColorBrewer", and “xml2
 install.packages(c("stringr", "RColorBrewer", "XML"), repos = "http://cran.us.r-project.org")
## Installing packages into 'C:/Users/Owner/Documents/R/win-library/4.0'
## (as 'lib' is unspecified)
## package 'stringr' successfully unpacked and MD5 sums checked
## package 'RColorBrewer' successfully unpacked and MD5 sums checked
## package 'XML' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\Owner\AppData\Local\Temp\Rtmpu28V4i\downloaded_packages

#5.3.2 - Calculate: \(3x^2 + 4x + 8\) when \(x = 2\)

 X=2
 3*X^2+4*X+8
## [1] 28

#5.3.3 - Calculate: \(3x^2 + 4x + 8\) but now with a numeric sequence for \(x\) using x <- -3:3

 X <- -3:3
 3*X^2+4*X+8
## [1] 23 12  7  8 15 28 47

#5.3.4 - Find out how to look for information about math binary operators like + or ^ (without using ?Arithmetic).

help("Arithmetic")

#5.3.4 - There are several tabs in the pane Files, Plots, Packages, Help, Viewer. Find what does the tab Files is good for?

#Files enables to display all of the files that is running through the console.

#5.3.5 - What about the tab Help?

#It illustrates all the documentation, syntax, and all the packages. 

#5.3.6 - In the tab Help, what happens when you click the button with a House icon?

# It shows R Resources, manuals(Description, Usage, Arguments,Details, and Offline help), References, giving a support.

#5.3.7 - Now go to the tab History. What is it good for? and what about the buttons of its associated menu bar?

# It indicates the commands that launching in the console and it allows people to see previous contents that already run.

#5.3.8 - Likewise, what can you say about the tab Environment?

#It indicates all the variables(global,local) that already run through in the console.