Before we start…

Please be noted that you will work with R Markdown documents. R Markdown consists of three parts: 1) contents; 2) codes; 3) outputs (results). First, the content parts describe what you are learning about and asked to work on. Second, the code parts are in grey boxes and are what you can enter in the source window of RStudio. The single pound sign # denotes my annotations that help you get sense of what each code is carrying out. Third, the output parts show the results of R code execution. Check your codes result in the same as this part. The results are indexed with numbers in square brackets, starting from [1].

Working Directory

Working directory is the location where you can read files from or save the files into.

We use projects in RStudio to set the working directory to the folder we are working in.

getwd() # Find the current working directory (where inputs are found and outputs are sent).
## [1] "D:/Dropbox/2021_Class/BigData_Journalism/R"
#setwd("~/...")

Package? Library?

R is an open-source software, which means there are tons of functions being developed by many people all over the world. Using such functions, we can operate functions for web scraping easily and effectively.

A package is a collection of such R functions (as well as data and compiled code). And the location where the packages are stored is called the library.

When we download a package needed using the function, install.packages(“package name”), it will be stored in the library. And to use the package, we should operate the function, library(package name), which makes the package available.

Getting Help

You can access the help files about functions and packages.

?c # Get help of a particular function c( )
## starting httpd help server ... done
help(c) # Search and return the help file for the function

1. Vectors

numeric_vector <- c(1:5)
numeric_vector
## [1] 1 2 3 4 5
character_vector <- c("a","b","c","d","e")
character_vector
## [1] "a" "b" "c" "d" "e"
logical_vector <- c(TRUE,FALSE,FALSE,T,F)
logical_vector
## [1]  TRUE FALSE FALSE  TRUE FALSE

2. Lists

# list( ) applies to another list; that is, a list can belong to another list
MyList <- list(numeric_vector, character_vector,logical_vector)
MyList #vector or list?
## [[1]]
## [1] 1 2 3 4 5
## 
## [[2]]
## [1] "a" "b" "c" "d" "e"
## 
## [[3]]
## [1]  TRUE FALSE FALSE  TRUE FALSE

3. Data frames

library(tidyverse)

a_data_frame <- tibble(first = numeric_vector, second = character_vector, third = logical_vector)
a_data_frame
## # A tibble: 5 x 3
##   first second third
##   <int> <chr>  <lgl>
## 1     1 a      TRUE 
## 2     2 b      FALSE
## 3     3 c      FALSE
## 4     4 d      TRUE 
## 5     5 e      FALSE
a_data_frame$third
## [1]  TRUE FALSE FALSE  TRUE FALSE

4. Indexing: selecting certain elements

How to select certain elements in a list? Use double square brackets [[ ]] for a list and single square brackets [ ] for a vector. [[ ]] returns sub-elements of a list element.

character_vector
## [1] "a" "b" "c" "d" "e"
character_vector[1]
## [1] "a"
class(character_vector[1])
## [1] "character"
length(character_vector[1])
## [1] 1
character_vector[2]
## [1] "b"
MyList[[1]] # returns a vector with the elements of the first element of the list MyList
## [1] 1 2 3 4 5
class(MyList[[1]])
## [1] "integer"
# [1] to select the first element of a vector; [[1]] to select the elements of a list's first element

When you get familiar with list( ), then you will figure out what MyList[[1]][5] is.

MyList[[1]][5]
## [1] 5

How to select the letter “e” in the second element of the list object MyList?

MyList[[2]][5]
## [1] "e"

How to select certain elements in a data frame?

a_data_frame
## # A tibble: 5 x 3
##   first second third
##   <int> <chr>  <lgl>
## 1     1 a      TRUE 
## 2     2 b      FALSE
## 3     3 c      FALSE
## 4     4 d      TRUE 
## 5     5 e      FALSE
a_data_frame[3,3]
## # A tibble: 1 x 1
##   third
##   <lgl>
## 1 FALSE

5. Functions

Function

Function

  - Ways of manipulating vectors, lists, and data frames

Built-in functions

sqrt
## function (x)  .Primitive("sqrt")
mean
## function (x, ...) 
## UseMethod("mean")
## <bytecode: 0x0000000016c0ba20>
## <environment: namespace:base>
sum
## function (..., na.rm = FALSE)  .Primitive("sum")
sqrt(3)
## [1] 1.732051
mean(numeric_vector)
## [1] 3
sum(numeric_vector)
## [1] 15
sqrt(numeric_vector)
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068
class
## function (x)  .Primitive("class")
toupper
## function (x) 
## {
##     if (!is.character(x)) 
##         x <- as.character(x)
##     .Internal(toupper(x))
## }
## <bytecode: 0x0000000016a3a1c8>
## <environment: namespace:base>

User-defined functions

function_name <- function(x){
  function_body
}
adding_two <- function(x){
  x+2
}

adding_two(x=3)
## [1] 5
my_func <- function(x){
  function_result <- x / 2
}

my_func(x=4)

#function_result
#Error: object 'function_result' not found
my_func <- function(x){
  function_result <- x / 2
  function_result
}

my_func(4)
## [1] 2

Final two notes

Note1: R is sensitive to lowercase and uppercase letters

word1 <- "TEXT"
word1
## [1] "TEXT"
word2 <- "Text"
word2
## [1] "Text"
word1 == word2
## [1] FALSE

Note2: Parentheses are for functions but brackets are for selecting a certain element of a vector.

character_vector[1:5]
## [1] "a" "b" "c" "d" "e"
toupper(character_vector[1:5]) 
## [1] "A" "B" "C" "D" "E"
tolower(toupper(character_vector[1:5]))
## [1] "a" "b" "c" "d" "e"