STOR 455 Section 1 (Cunningham)

.

.

1. Download the Tools for Using R and Publishing with R Markdown

Assuming you are new to R, you will need to begin by downloading the lastest version of the programming language R as well as R Studio, the Integrated Development Environment (IDE) that we will use to do our computing work and some publishing of that work. I strongly recommend getting the free version of R Studio.

2. Open R-studio and Complete the Following Tutorial

(If you are experienced with R, skip whatever you like.)

Copy and paste the lines in the gray box into the “Console” box in R-studio and hit return.

#Introduction to R tutorial for STOR 455 Section 1
#authored by Robin Cunningham

#This script is based on "R Programming for Data Science"" by Roger Peng and work of Iain Carmichael

You will notice that nothing happens. The ‘#’ symbol is for putting comments in R and tells the compiler to ignor whatever comes after ‘#’ in a given line. This tool allows you to put comments in your programs and scripts, making them easier to read, change and grade.

R is a simple programming language that we will use extensively in STOR 455. One use of a programming language is to allow you to save numbers as variables. Type the following script into the Console:

a <- 2 #assign the number 2 to the variable a
b <- 1 
c = 3 #you can use <- or = 
d <- 4.5

Now type in the following and hit return after each line.

a
## [1] 2
b
## [1] 1
c
## [1] 3
d
## [1] 4.5

Fairly boring, but you should get the point, once you have assigned numbers to variables, R keeps them around. A programming language also lets us do computations with variables. Check the results for each of the following:

a + d
b*c
c/b
c^b
c^2

We can also assign text to variables. Note that text data is called a “string” or “characters”

msg <- "hello world"

You can print a variable by using the print() funciton

print(msg)

Or just running the variable,

msg

What other types of variables can we store? (There are many)

Logical or Boolean variables

You may remember these from truth tables in STOR 215 if you have taken that course.

A <- TRUE
B <- FALSE

“True or false”. Note “|” means “or” in R

A | B

“True and False”. Note “&” means “and”

A & B

Vectors

Let’s store the populations of the 5 largest cities in NC (from Google) and then run some standard statistical function calls. Note these commands, you will need them! The notation ‘c(3, 4, 6,)’ for a vector will be explained in the SWIRL tutorial. For the record, the letter ‘c’ stands for “concatenate”.

pop <- c(792862, 431746,279639, 245475,236441) 
pop
mean(pop)
median(pop)
min(pop)
max(pop)
sd(pop) #standard deviation
pop[2] #Get the second element of the vector

If you just want a vectors of the integers from 1 to 5, you could use

vec <- 1:5

We might as well have a look at it since this will be an important tool:

vec

You can also store vectors of strings

cities <- c("Charlotte", "Raleigh", "Greensboro", "Durham", "Winston-Salem")

Or a vector of booleans. Cities Caroline has visited. Note the order matters.

has_visited <- c(FALSE, TRUE, FALSE, T, F) #can use True or T (False or F)

Data frames (picture and Excel table)

This will be the last topic for this tutorial, but don’t worry there is plenty more practice available! (See the document “SWIRL Tutorial” on Sakai.)

df <- data.frame(cities, pop, has_visited)

print(df)
df #these are equivalent

We will do more with data frames later. For now notice the attributes

names(df)

We can access the colums of a data frame various different ways

df['pop']
df$pop
df[,1]

Or the rows,

df[1,]
df[2,]

Or an individual entry

df[1,2]

Simple Plot

Ok, this is REALLY the last topic. Now let’s make a plot (and highlight both lines and run). You will get more explicit instruction on this and all of the above topics later.

barplot(df$pop, names = df$cities, 
        main = "Five largest cities in NC", ylab = "population", xlab = 'city', col = 'blue')