Assuming you are new to R, you will need to begin by downloading the lastest version of the programming language R as well as R Studio, the Integrated Development Environment (IDE) that we will use to do our computing work and some publishing of that work. I strongly recommend getting the free version of R Studio.
(If you are experienced with R, skip whatever you like.)
Copy and paste the lines in the gray box into the “Console” box in R-studio and hit return.
#Introduction to R tutorial for STOR 455 Section 1
#authored by Robin Cunningham
#This script is based on "R Programming for Data Science"" by Roger Peng and work of Iain Carmichael
You will notice that nothing happens. The ‘#’ symbol is for putting comments in R and tells the compiler to ignore whatever comes after ‘#’ in a given line. This tool allows you to put comments in your programs and scripts, making them easier to read, change and grade.
R is a simple programming language that we will use in STOR 151. One use of a programming language is to allow you to save numbers as variables. Type the following script into the Console:
a <- 2 #assign the number 2 to the variable a
b <- 1
c = 3 #you can use <- or =
d <- 4.5
Now type in the following and hit return after each line.
a
## [1] 2
b
## [1] 1
c
## [1] 3
d
## [1] 4.5
Fairly boring, but you should get the point, once you have assigned numbers to variables, R keeps them around. A programming language also lets us do computations with variables. Check the results for each of the following:
a + d
b*c
c/b
c^b
c^2
We can also assign text to variables. Note that text data is called a “string” or “characters”
msg <- "hello world"
You can print a variable by using the print() funciton
print(msg)
Or just running the variable,
msg
What other types of variables can we store? (There are many)
#Logical or Boolean variables You can look at this section for fun if you like, but it is ok to skip it.
A <- TRUE
B <- FALSE
“True or false”. Note “|” means “or” in R
A | B
“True and False”. Note “&” means “and”
A & B
#Vectors Let’s store the populations of the 5 largest cities in NC (from Google) and then run some standard statistical function calls. Note these commands, you will need them! The notation ‘c(3, 4, 6,)’ for a vector will be explained in the SWIRL tutorial. For the record, the letter ‘c’ stands for “concatenate”.
pop <- c(792862, 431746,279639, 245475,236441)
pop
mean(pop) # we will define mean shortly
median(pop) #we will define median shortly
min(pop)
max(pop)
sd(pop) #standard deviation - a term we will learn soon
pop[2] #Get the second element of the vector
If you just want a vectors of the integers from 1 to 5, you could use
vec <- 1:5
We might as well have a look at it since this will be an important tool:
vec
You can also store vectors of strings
cities <- c("Charlotte", "Raleigh", "Greensboro", "Durham", "Winston-Salem")
Or a vector of booleans. Cities Caroline has visited. Note that the order matters. has_visited <- c(FALSE, TRUE, FALSE, T, F) #can use True or T (False or F)
#Simple Plot Ok, this is the last topic. Let’s make a plot (and highlight both lines and run). You will get more explicit instruction on this and all of the above topics later. Note that the bar plot that results from this command may look a little ‘wrong’. We will talk about how to make it look nice in class Friday.
barplot(pop, names = cities,
main = "Five largest cities in NC", ylab = "population", xlab = 'city', col = 'blue')