class: center, middle, inverse, title-slide # .center[.large[An Introduction to R]] ## .emphasize2[.center[Walking before you leaRn to crawl]] ### .center[.emphasize[Shea Fyffe]] ### .center[.font80[George Mason University
Amazon]] ### .center[.font60[08 June, 2019
(Updated:21 June, 2019)]] --- <style> p.caption { font-size: 0.6em; font-weight: 400; position: absolute; top: 305px; right: 128px; } .emphasize { font-size: 110%; color: #F97B64; } .emphasize2 { font-size: 105%; color: #019191; } </style> # Getting Started .emphasize[Do you have R? Check:] **Windows:**`C:\Program Files\R\R-3.5.2\bin\x64\Rgui.exe` **Mac:**`/Library/Frameworks/R.framework/Resources/library` .emphasize[If you don't that's okay:]<br>Install **R** from [CRAN for Windows](https://cran.r-project.org/bin/windows) **OR** [CRAN for Mac](https://cran.r-project.org/bin/macosx). I recommended to using the [RStudio IDE](https://www.rstudio.com/products/rstudio/download). This is merely a software that sits atop R, it provides more functionality, and is more customizable—they are essentially the same. [RStudio Cheat Sheet](https://resources.rstudio.com/rstudio-cheatsheets/rstudio-ide-cheat-sheet) --- # Objects The concept of an *object* is synonymous with Piaget's idea of a *schema*. R uses words and symbols to represent data. Here's an example: ```r #R has a schema for 1 1 ``` ``` ## [1] 1 ``` ```r #R doesn't know what x is x ``` ``` ## Error in eval(expr, envir, enclos): object 'x' not found ``` ```r #until you tell it x <- 2 x ``` ``` ## [1] 2 ``` --- # Objects Objects are merely data represented by a name consisting of symbols/characters. R *assimilates* and makes sense of objects by assigning *classes* and other *attributes*. So, if R learns a new letter, let's say **"z"**. ```r #let's assign the value "z" to the name `object` object <- "z" #R gives it a class class(object) ``` ``` ## [1] "character" ``` ```r #so it can treat it like objects it knows class("a") ``` ``` ## [1] "character" ``` ```r class("a") == class(object) ``` ``` ## [1] TRUE ``` --- #Vector A **vector** (i.e., variable, column) is the most basic data object in R—therefore, it’s called *atomic*. .emphasize[vectors are always the same mode.] ```r #create a numeric vector of length 7 vector_1 <- c(1,2,3,6,7,8,9) class(vector_1) ``` ``` ## [1] "numeric" ``` ```r vector_1[7] <- "A" ``` What do you think happens? ```r class(vector_1) ``` ``` ## [1] "character" ``` .emphasize[R calls this *coercion*] --- #Vector Many other programming languages use the term *scalar* to describe a vector of length 1; however, in R this too is a vector. When creating a vector longer than 1 element use the `c()` function. **Try this:** ```r x <- 5 y <- c(4, 7, 2) ``` ```r x > y ``` What do you think happens? ``` ## [1] TRUE FALSE TRUE ``` .emphasize[R calls this *recycling*] --- # More recycling and Coercion problems ```r #1 What happens when you run the code below? c(4,6) > c(5,18,11) #2 R also has some weird behaviors "a" > "b" "b" > "a" "c" > "b" #3 To create a true integer use L is.integer(4) is.integer(4L) #4 Indexing with recycling works differently depending on the class x <- c(6,7,8,9) x[1] x[TRUE] 1 == TRUE ``` --- #Vector To refer to an *element* within a vector you may use `vector[element]`. You can also remove or omit elements from vectors this way: `vector[-element]`. Let's try: ```r #what is the class of a? a <- c("d", 0, "g") #now let's pick the last element a[3] ``` ``` # [1] "g" ``` ```r #what if we want more than one element? a[c(1,2)] ``` ``` # [1] "d" "0" ``` ```r #lets replace it a[3] <- "bby" ``` --- ```r #now lets make a new vector "b" using "a" b <- c(a[1],"o",a[3]) #print b b ``` ``` # [1] "d" "o" "bby" ``` ```r #we can add to the vector by using c() and the original object b <- c(b, "the", "house elf") #print b ``` ``` # [1] "d" "o" "bby" "the" "house elf" ``` --- # Vector exercise ```r # 1 x <- c(1, 2, 3, 5, "a") sum(x) # 2 b <- 1:10 6 > b # 3 d <- seq(10) d[-3] # 4 -3:10 d[-3:10] # 5 e <- 6 > b f <- d[e] ``` --- #List What happens when we want to put multiple vectors together? ```r #let's put these vectors together Name <- c("Jack", "Jill", "Donkey", "The One they call Roger") Has_family <- c(TRUE, TRUE, FALSE) IQ <- c(rnorm(3, 100, 10), 9000) data <- c(Name, Has_family, IQ) ``` ```r data ``` ``` ## [1] "Jack" "Jill" ## [3] "Donkey" "The One they call Roger" ## [5] "TRUE" "TRUE" ## [7] "FALSE" "97.5474616593117" ## [9] "115.953480959132" "95.065600603624" ## [11] "9000" ``` What Happened: .emphasize[Coercion] --- #List A **list** (i.e., a *vector* of objects) allows the storage of different classes including another list; thus, it’s *recursive*. The parts of a vector are called **elements**, the parts of a list are called **components**. -- ```r #let's revisit our data using list instead of c() data_as_a_list <- list(Name, Has_family, IQ) #let's take a look data_as_a_list ``` ``` ## [[1]] ## [1] "Jack" "Jill" ## [3] "Donkey" "The One they call Roger" ## ## [[2]] ## [1] TRUE TRUE FALSE ## ## [[3]] ## [1] 97.54746 115.95348 95.06560 9000.00000 ``` --- # List ```r #test to see if it's recursive is.recursive(data_as_a_list) ``` ``` ## [1] TRUE ``` ```r #test to see if it's atiomic is.atomic(data_as_a_list) ``` ``` ## [1] FALSE ``` ```r #you can name the parts of a data structure by calling names() names(data_as_a_list) <- c("name", "family", "iq") ``` --- # List We can refer to the components of a list using `[[]]`, and multiple components with `[]`. However, where we named our list we can now use the lovely `$` operator (you all will get really close) ```r #this is the same data_as_a_list[[3]] ``` ``` ## [1] 97.54746 115.95348 95.06560 9000.00000 ``` ```r data_as_a_list$iq ``` ``` ## [1] 97.54746 115.95348 95.06560 9000.00000 ``` --- # List exercise ```r # 1 data_as_a_list[[3]] > data_as_a_list[3] # 2 list_b <- list(1:5, c(T,F,F,T,T)) list_b[1] > list_b[2] # 3 list_b[[2]] == 1 # 4 lx <- seq(5) identical(list_b[[1]][-1], x[-1]) ``` --- # Dataframe When a list contains vectors (i.e., variables) of the same length, it becomes a **dataframe**. Notice below: *How long are each of the vectors?* ```r df <- data.frame(seq(4), c("A","B","C")) df <- data.frame(seq(4), rep(TRUE, 4)) ``` Now with names ```r df <- data.frame(ID = seq(4), Finished = rep(TRUE, 4)) #how does it look? str(df) ``` --- # Dataframe .font90[You can access vectors (i.e., variables) in a dataframe just as you would a list. However, dataframes are special lists because they have a length and a *width*. We've been using `[]` with just one dimension. We can now use 2: .emphasize2[dataframe[rows, columns]]] ```r df <- data.frame(Col1 = seq(4), Col2 = c(TRUE,TRUE,FALSE,FALSE)) #to return the 3rd row in a dataframe df[3,] ``` ``` # Col1 Col2 # 3 3 FALSE ``` ```r #to return the 2nd row in a dataframe df[,2] ``` ``` # [1] TRUE TRUE FALSE FALSE ``` ```r #you can also index a dataframe like you did a list... identical(df[[2]],df[,2]) ``` ``` # [1] TRUE ``` --- **Try these:** ```r df_b <- data.frame(A = seq(4), B = rep(TRUE, 4)) #test to see if these are all the same identical(df_b[[1]], df_b$A, df_b[ ,1]) #test to see if this is a list is.list(df_b) #combine the dataframes in one dataframe (to rule them all) one_df_to_rule <- rbind(df, df_b) ``` -- ```r #rename df so rbind works names(df) <- c("A","B") #try again... one_df_to_rule <- rbind(df, df_b) ``` --- #Dataframe exercise ```r # 1 one_df_to_rule$A %% 2 evens <- one_df_to_rule$A %% 2 == 0 # 2 one_df_to_rule$A[evens] * 2 one_df_to_rule$A[evens, ] * 2 # 3 one_df_to_rule[evens, ] * 2 one_more_dftr <- one_df_to_rule[evens, ] * 2 # 4 one_more_dftr$B <- NULL one_more_vec <- rep(seq(2) * 4, 2) identical(one_more_vec, one_more_dftr$A) # 5 ls() rm(list = ls()) ``` --- # Matrices and Arrays There are **2** more data structures that I'm skipping (for time sake). **Matrices** are *atomic* vectors with 2 dimensions `matrix(c(1,4,7,8), nrow =2)` Many of you may use the *matrix* structure because it's utilized in advanced statistical packages. It's also—in comparison with a dataframe—less memory intensive. **Arrays** are *atomic* vectors with 3 dimensions `array(c(1,4,7,8,10,12), dim = c(2,2,2))` --- # Functions R functions have 4 major parts: - **Body:** This is the recipe or set of instructions performed on the inputs. - **Environment:** The kitchen where the cooking is done—R creates a new kitchen each time you make something. - **Arguments:** The inputs or ingredients in the recipe. - **Return/Result:** This is the delicious outcome of all that hard labor. --- # Functions When you *use* a function you are *calling* that function (i.e., you are holla'ing atcha function) You can classify R functions into 3 broad categories: - Prefix functions - Infix functions - Replacement functions --- # Functions: Prefix These are functions that occur **before** their arguments. R uses `function(args)` to signal *Prefix functions* use. What are the prefix functions in the code below? ```r #a boring regression fit <- lm(dist ~ 1 + speed, data = cars) #print weights fit$coef ``` ``` # (Intercept) speed # -17.579095 3.932409 ``` ```r #finding letters vector of words nsync <- c('Joey', 'Lance', 'JC', 'Chris', 'Justin') grep('^J', nsync) ``` ``` # [1] 1 3 5 ``` What are the arguments? --- # Functions: Infix These are functions that occur **between** their arguments and are usually represented by symbols. R uses `arg %function% arg` to signal *infix functions*. --- # Functions: Infix What are the infix functions in the code below? ```r #a boring regression fit <- lm(dist ~ 1 + speed, data = cars) #print weights fit$coef ``` ```r #looking for a word in a vector of words nsync <- c('Joey', 'Lance', 'JC', 'Chris', 'Justin') "Nick" %in% nsync ``` ``` # [1] FALSE ``` ```r #infix functions are also prefix functions shhhhhh 4 != 4L ``` ``` # [1] FALSE ``` ```r #notice I'm using backticks `` `!=`(4,4L) ``` ``` # [1] FALSE ``` ```r `!=`(4,11) ``` ``` # [1] TRUE ``` --- # Functions: Replacement These are functions that occur **between** something more than a symbol and assignment operator `<-`, so the left side isn't just a name...*What does that even mean shea?* ```r #we've been using replacement functions vec2 <- vector(length = 2) #NOT replacement function vec2 <- c(4,6) #a replacement function names(vec2) <- c("A","B") ``` ```r #make a dataframe df <- data.frame(ID = seq(10), X = runif(10), Y = sample(1:100, 10)) #is this replacement? df$Y <- NULL ``` Remember you can rewrite the statement above to look like: ```r `$`(df, Y) <- NULL ``` --- # Three Dataframes [3 Dataframes DOWNLOAD](https://drive.google.com/open?id=1NiK97ehce2w8r8OQCC-iQYHeTRAzqLkV) .font90[ 1. Read in the 3 text files - (Hint: `read.table(file, sep = '\t', header = TRUE)`) 2. Name the columns in the dataframe c("First", "Last", "Test1", "Test2", "Test3") 3. Combine the 3 dataframes into 1 large data frame - (Hint: `rbind()`) 4. Matthew wants to be called "Matt"... 5. Find the Average of Test 1 6. What is the correlation between Test 1 and Test2 - (Hint: `cor(x, use = "complete.obs")`)] --- # Packages **Packages** (also referred to as *libraries*) are collections of shareable software. These are books, like *SPSS*, written in the R-language that you can read. In simple terms: packages are functions and other data that people have created which you can use. This is the beauty of R—you're within community where sharing and collective development is the norm. Imagine a world where all of you: shared your notes, your papers, and lectures with one another...God forbid. ```r #or the swirl package which teaches you R! install.packages("swirl", dependencies = TRUE) ``` --- # Packages *So how do I install packages?* ```r install.packages("wordcloud", dependencies = TRUE) ``` -- Once you've installed a package you will have to use `library(package)` to load it to your environment. I prefer to use explicit calling, because of something called **masking**. Here is an example: `wordcloud::textplot()` -- Challenge yourself you create your own functions, that help you with small things. --- # Calling functions Keep in mind when you define function inputs (i.e., *arguments*), R reads them in particular ways. If you don't explicitly state them, R will read them by order. Here is an example of the `seq()` function which generates a sequence of numbers. ```r #you can specify an argument by order seq(2, 20, 2) ``` ``` # [1] 2 4 6 8 10 12 14 16 18 20 ``` ```r #or by explicitly defining each argument seq(to = 20, from = 2, by = 2) ``` ``` # [1] 2 4 6 8 10 12 14 16 18 20 ``` -- .emphasize[Get in the habit of naming your arguments, so you have all your ducks in a row] using `help()` will show you the arguments --- # Head in the clouds [HeadInTheClouds DOWNLOAD](https://drive.google.com/open?id=1kRiaDO43Xw8PoLsRAN5hsdJoauQj2tvw) .font90[ 1. install the `tm` and `wordcloud` packages - (Hint: `install.packages(c(tm,wordcloud), dependencies = TRUE)`) 2. Read in the text file (Hint: `readLines(file)`) 3. Use `x <- tolower(x)` to convert data to lowercase 4. Use `tm::removePunctuation(x)` to remove weird fluff 5. Use the help function `?tm::getTransformations()` to find other functions to clean your text 6. Convert the vector of sentences into a vector of words with `strsplit(x, " ")`, then use `unlist(x)` to convert from a list to a vector 7. Make a wordcloud of the top 25 words - (Hint: `wordcloud::wordcloud(x)`)] --- # Agree to Disagree [AgreeToDisagree DOWNLOAD](https://drive.google.com/open?id=1ax-CR_cJxUsfpZIS9yhvkFn3QfeJ21MU) .font90[ 1. install the `rhoR` package (Hint: `install.packages("rhoR", dependencies = TRUE)`) 2. Read the data in using `read.csv()` 3. Calculate the observed baserate `rhoR::baserate()` for Leadership coding in 2019. 4. Calculate the kappa value (hint: `rhoR::kappa()`) for Leadership coding in 2019. 5. What is the rho statistic (i.e., Type 1 error) given: `rhoR::rho(calculalted_kappa, OcSBaserate = firstBaserate, testSetLength = 140, OcSLength = 1746)` 6. Create a ROBOT that codes on keywords...see if you can replace Judge 2 - `data$Robot_leadership <- ifelse(grepl("some|words|here", data$Title), 1, 0)` - run steps 4 and 5 again.] --- class: inverse, center, middle <img src="https://seeklogo.com/images/G/github-logo-7880D80B8D-seeklogo.com.png" style="width:20px; height:20px;position:absolute;top:449px;right:599px;" /> <img src="https://image.flaticon.com/icons/png/512/61/61109.png" style="width:20px; height:20px;position:absolute;top:396px;right:620px;" /> # Thank you! .emphasize2[Feel free to reach out if you have any questions: __email:__ shea.fyffe@gmail.com https://linkedin.com/in/sheafyffe/ https://github.com/Shea-Fyffe] [A gift to you](https://drive.google.com/file/d/1WHDilXUZq1HyIdkYVX3bmogFWfePK8Oc/view?usp=sharing) ---