Of course, R can just be a calculator for things like
3 - 4 = -1 or 3 ^ 2 = 9. I’ll throw a couple
of common algebraic operations you might use in R. Note that there are
many more, but these are commonly used!
# This is a comment (words following the hashtag). I use this to annotate my code so that it is easier for me to look back and understand what I did.
1 + 7
## [1] 8
16 / 2
## [1] 8
2 * 4
## [1] 8
13 - 5
## [1] 8
2 ^ 3 # "2 raised to the 3rd power"
## [1] 8
100 %% 92 # remainder
## [1] 8
65 %/% 8 # number of times 8 goes into 65 cleanly
## [1] 8
log(256, 2) #log, base 2, of 256; default of log function is natural log (base e)
## [1] 8
You can also use R to compare objects, even characters! You will get a logical statement (TRUE, FALSE, or NA) in return
5 > 3
## [1] TRUE
6 == 7 #note that for comparisons, "==" is used instead of "="
## [1] FALSE
"cat" == "dog"
## [1] FALSE
4 <= 4
## [1] TRUE
6 != 2 # "!=" denotes "not equal to"
## [1] TRUE
AVERAGE(), the inputs that go into the parentheses are
manipulated by the function machine to produce the output of interest–in
this case, the mean of the set of numeric input values. R functions are
the same, though the function names may differ (often the case for
different programming languages). That’s why learning a programming
language is like learning a new language–the more you use it, the easier
and more intuitive it becomes!# Assign information into variables (x, y, z) to create vectors using "<-" or "=". The `c()` function is known as concatenation. Think of it as just linking the individual items together in a series.
w = c(TRUE, FALSE, NA)
x <- c(1, 2, 3, 4)
y <- c("a", "b", "c")
z = 5
# Do you see them in your global environment (to the right)?
Now say I want to view them and check what type of vector they are:
# Print vectors using `print()` function to see the values of x, y, and z
print(w); print(x) ; print(y); print(z)
## [1] TRUE FALSE NA
## [1] 1 2 3 4
## [1] "a" "b" "c"
## [1] 5
# Print the class (type) of each vector using `class()` function
print(class(w)) ; print(class(x)); print(class(y)); print(class(z))
## [1] "logical"
## [1] "numeric"
## [1] "character"
## [1] "numeric"
Note how I can embed functions within other functions. I can do this when the output of the inner function is an acceptale input for the outer function. Can you think of why I might embed functions in functions?
As I alluded to above, I can also perform operations on objects like vectors. Check out the examples below:
#recall x from above.
x - 1 ; x / 2 ; x ^ 2 ; x * 3
## [1] 0 1 2 3
## [1] 0.5 1.0 1.5 2.0
## [1] 1 4 9 16
## [1] 3 6 9 12
#recall x, y, and z from above
x - z # I can do this because they're both numeric
## [1] -4 -3 -2 -1
Checkpoint: Can I do x - y? Why or why not?
# List 1, `ls1`, is created using vectors x, y, and z
ls1 <- list(x, y, z)
ls2 <- list(x, ls1) #create list using ls1
print(ls1) ; print(ls2) #
## [[1]]
## [1] 1 2 3 4
##
## [[2]]
## [1] "a" "b" "c"
##
## [[3]]
## [1] 5
## [[1]]
## [1] 1 2 3 4
##
## [[2]]
## [[2]][[1]]
## [1] 1 2 3 4
##
## [[2]][[2]]
## [1] "a" "b" "c"
##
## [[2]][[3]]
## [1] 5
print(class(ls1)); print(class(ls2)) #class type is list
## [1] "list"
## [1] "list"
Checkpoint: how do lists and vectors differ and how are they similar? Can a vector be part of a list, and can a list be part of a vector?
matrix() with the following inputs: data, nrow, ncol,
byrow, and dimnames.This is a great time to introduce the help tool in R. I like to think of this as R’s dictionary. If your are unsure of a certain function’s mechanism, you can write ?[some_function] in your code and some helpful reading should pop-up to your bottom right. You can also write ??[some_idea] to see if you don’t know the specific function name. See below:
?matrix
Back to matrices. As you have read in the help section, the default
matrix is full of missing observations (NA) if you leave the
data argument blank. I can define the number of rows
(nrow) and columns (ncol), which are 1 by
default. The byrow argument is FALSE by
default, which tells me that my matrix will be filled with
data column-wise (TRUE for row-wise). Finally,
dimnames allows me to give my rows and columns names
(respectively) and should be a list.
matrix() #default empty matrix
## [,1]
## [1,] NA
#recall x from above
matrix(x) ; matrix(x, nrow = 2) ; matrix(x , nrow = 2, byrow = T)
## [,1]
## [1,] 1
## [2,] 2
## [3,] 3
## [4,] 4
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
mat <- matrix(x, nrow = 2, dimnames = list(c("row1", "row2"), c("column1", "column2")))
print(mat)
## column1 column2
## row1 1 3
## row2 2 4
class(mat)
## [1] "matrix" "array"
Checkpoint Use the code chunk below to learn about the
dim() function, and apply it to our matrix,
mat.
? and the code chunk below to gain some intuition for
arrays.?array
arr <- array(x, dim = c(4, 4)) ; print(arr)
## [,1] [,2] [,3] [,4]
## [1,] 1 1 1 1
## [2,] 2 2 2 2
## [3,] 3 3 3 3
## [4,] 4 4 4 4
Before we jump into our last data object for the day, we should touch
on the concept of “subsetting” for R objects. First, let’s introduce the
dim() function, which tells us the dimensions of our input
objects.
# recall mat from above
dim(mat) # output returns # of rows and # of columns (respectively)
## [1] 2 2
dim(arr) # another example using array
## [1] 4 4
dim(x) # note that it does not work on 1D objects; only matrices, arrays, and dataframes
## NULL
Subsetting allows us to pull specific values or subsets of
values from our object of interest. We do this using brackets
[ ].
For 1D objects, the notation [n] pulls out the nth item
in the object:
x[2] # 2nd item in the x vector
## [1] 2
x[5] # Why does this return NA?
## [1] NA
2D objects, the notation goes [row, column]. I can pull
out specific values, or values of entire rows/columns by leaving either
the row or column specification blank.
print(mat); mat[1, 2] # value from the 1st row, 2nd column
## column1 column2
## row1 1 3
## row2 2 4
## [1] 3
print(arr); arr[4, ] # all the values from the 4th row
## [,1] [,2] [,3] [,4]
## [1,] 1 1 1 1
## [2,] 2 2 2 2
## [3,] 3 3 3 3
## [4,] 4 4 4 4
## [1] 4 4 4 4
Checkpoint: subset array to get all the values
from the 4th column.
a <- 1:5 # another way to create a numeric vector that sequentially goes 1, 2, 3, 4, 5
b <- c(0.67, 0.23, 1.25, 3, .20)
c <- c("apples", "bananas", "cabbage", "dragonfruit", "eggs")
#create dataframe
df <- data.frame(a, b, c) ; print(df)
## a b c
## 1 1 0.67 apples
## 2 2 0.23 bananas
## 3 3 1.25 cabbage
## 4 4 3.00 dragonfruit
## 5 5 0.20 eggs
Now, data frames usually have informative column names because they
are representing things with context. Lets make our columns (VARIABLES)
more informative using colnames().
colnames(df) <- c("item_id", "cost_per_item", "grocery_item") ; print(df)
## item_id cost_per_item grocery_item
## 1 1 0.67 apples
## 2 2 0.23 bananas
## 3 3 1.25 cabbage
## 4 4 3.00 dragonfruit
## 5 5 0.20 eggs
What if I want to add a column that tells me how much of each item I will be purchasing? There’s a couple ways to do this, but I will define the fourth (new) column using subsetting techniques:
df[ , 4] <- c(4, 5, 1, 1, 12) ; print(df)
## item_id cost_per_item grocery_item V4
## 1 1 0.67 apples 4
## 2 2 0.23 bananas 5
## 3 3 1.25 cabbage 1
## 4 4 3.00 dragonfruit 1
## 5 5 0.20 eggs 12
colnames(df)[4] <- "quantity" ; print(df) #rename to more useful variable name
## item_id cost_per_item grocery_item quantity
## 1 1 0.67 apples 4
## 2 2 0.23 bananas 5
## 3 3 1.25 cabbage 1
## 4 4 3.00 dragonfruit 1
## 5 5 0.20 eggs 12
Finally, if I want to pull out a specific variable from the
dataframe, I can use the $ to do so. The general notation
is [dataframe]$[variable name]
df$grocery_item #extracts the variable named "grocery_item"
## [1] "apples" "bananas" "cabbage" "dragonfruit" "eggs"
df$cost_per_item[1] #extracts the value of "cost_per_item" for the first observation
## [1] 0.67
# CHALLENGE EXAMPLE
df$cost_per_item[df$grocery_item == "apples"] #any idea of what's going on here?
## [1] 0.67
Checkpoint: Create a 5th variable titled “total_cost” for each item. Then use a function to calculate the overall cost for all my groceries.
df[ , 5] <- df["cost_per_item"] * df["quantity"] #new variable using existing variables
colnames(df)[5] <- "total_cost" #rename
sum(df["total_cost"]) # the overall cost is $10.48
## [1] 10.48