Today we will review the basic elements of data analysis. To start we will examine:
## Lists
# We can make lists the same way we make vectors
test_list = list(1, "one", T)
test_list[4:6] <- c(0, "zero", F)
# And access elements the same way
test_list[c(2,5)] # returns the second and fifth elt.
## [[1]]
## [1] "one"
##
## [[2]]
## [1] "zero"
# Use lists when you're putting elts of many different types together.
## Matricies
# We can make arrays intro matrices using the matrix function
# syntax: matrix(data_vector, num_rows, num_columns)
# note the matrix is filled by the array column by column
# Ex.
random_sample = rnorm(0,1,20)
first_matrix = matrix(random_sample, 4, 5)
# We can access elements of the matrix using brackets
# Syntax for accessing matrix elts is matrix[row_indices, col_indices]
first_matrix[1,1] # gets the element in first row and first column
## [1] NA
first_matrix[1,] # gets the elements in the first row from ALL columns
## [1] NA NA NA NA NA
first_matrix[,1:2] # gets the elements in ALL rows and the first two columns
## [,1] [,2]
## [1,] NA NA
## [2,] NA NA
## [3,] NA NA
## [4,] NA NA
# Dataframes are the workhorse of R, they are the most important way for storing data.
?data.frame # putting '?' before a function will tell you about it!
# Setting up the data
names = c("Mark", "Mary", "Maddy", "Margret")
grades = c(100, 97, 99, 98)
cheater = c(T, F, F, F)
# The data.frame(vectors) command builds a dataframe out of vectors
gradebook = data.frame(names, grades, cheater) # Here we build the dataframe gradebook. Note the columns are named and inherit the names of the input vectors.
summary(gradebook) # summary looks at each column and tells us about it
## names grades cheater
## Length:4 Min. : 97.00 Mode :logical
## Class :character 1st Qu.: 97.75 FALSE:3
## Mode :character Median : 98.50 TRUE :1
## Mean : 98.50
## 3rd Qu.: 99.25
## Max. :100.00
gradebook[2,] # We can query dataframes like matricies
# In dataframes columns have names corresponding to the vectors they were built from
names(gradebook) # the names(dataframe) command returns all the column names
## [1] "names" "grades" "cheater"
# We can query dataframes by these names
gradebook$names
## [1] "Mark" "Mary" "Maddy" "Margret"
gradebook[gradebook$names == "Mary",] # More querying examples
trunc_gradebook = gradebook[c("names", "grades")] # When we request a two dimensional object from a dataframe, whats returned is itself a dataframe.
# Many native functions in R are designed to be run on dataframes. We will spend a lot of time with dataframes when we start working with real data. For now it's enough to think of them as general matrices with named columns
General info: A defined function consists of a a name, inputs and outputs.
To write a function we need to use the following syntax
Syntax:
function_name = function(input_1, input_2){ run code here return(variable) }
# The shell of function looks like this
func_name = function(input){
# code
# code
# return(result)
}
# Lets write a function that takes in a number and returns it's square
func_square = function(N){
ret = N^2
return(ret)
}
# Lets test our function
func_square(7)
## [1] 49
7^2 # of course we could just use the carrot, this is a simple example
## [1] 49
# Lets write a function that takes in two sides of a right triangle and computes the hypotenuse.
func_pythag = function(side_1, side_2){
hypot = side_1^2 + side_2^2
return(sqrt(hypot))
}
# Lets test our function
a = 3
b = 7
func_pythag(3,7)
## [1] 7.615773
func_pythag(7,3)
## [1] 7.615773
# func_pythag("3", "7") # What will happen when we give it characters?
# The nrows() counts the number of rows in a dataframe. Lets write a function that counts the number of columns
count_columns = function(df){
first_row = df[1,]
num_cols = length(first_row)
return(num_cols)
}
# Lets test our function
data = mtcars # This is a dataframe native to R, type ?mtcars to learn more!
count_columns(data)
## [1] 11