The CDAR Team R Bible of useful packages and functions with examples


Useful resources

Here is a link to the package cheatsheets available from R: https://rstudio.com/resources/cheatsheets/
Here is a link to Roel Verbelen’s favourite packages: https://roelverbelen.netlify.app/resources/r/packages/


   


ODBC

Package: odbc  
Function: dbConnect()  
Example:  

# This assigns the odbc connection to SQLSERVERNAME to con so we can access an SQL Server from inside RStudio
con <- dbConnect(odbc()
                ,'SQLSERVERNAME')

Package: odbc  
Function: dbSendQuery() & dbFetch()  
Example:  

# This uses dbSendQuery to send an SQL statement to the SQL Server assigned to 'con'
# The data is then pulled in to the dataframe 'datafromSQL' by piping in to dbFetch() which brings back the data from the send query
datafromSQL <- dbSendQuery(con
                          ,"SELECT *
                            FROM [DatabaseName].[SubLevel].[TableName]
                            WHERE X > 3") %>% dbFetch()

Package: odbc  
Function: dbWriteTable()  
Example:  

# This uses dbWriteTable() to write a table of data back to a table in the SQL Server.  It the table does not exist, it will be created when written
# This can be used to create a new table and write data to it, overwrite the data in an existing table or append new data to an existing table
DBI::dbWriteTable(conn = con
                 ,name = "Table_Name_In_SQL_Database"
                 ,value = TheRDataFrameToBeWritten
                 ,overwrite = FALSE
                 ,append = TRUE
                 ,temporary = FALSE)

Package: odbc  
Function: dbListTables()  
Example:  

# This function returns the names of all tables found within the connection 'con'
# Other arguments can be included but are not required; 'table_name =' The name of the table to return, the default returns all tables
dbListTables(con)

Package: odbc  
Function: dbDisconnect()  
Example:  

# Disconnects R from the SQL Server assigned to 'con'
dbDisconnect(con)

 

DPLYR

Package: dplyr  
Function: %>%  
Example:  

# The pipe %>% operator, which dplyr imports from the magrittr package, helps the user write code in a way that is easier to read and understand, as it reads from # left to right instead of nesting.  You can pipe directly into another function or functions

# Assign "2021-01-31" to x
x <- "2021-01-31"

# Check the structure of x (as you can see, it is a chr or character)
str(x)
##  chr "2021-01-31"
# Or, we can pipe x in to the str() function, which gives the same output
x %>% str()
##  chr "2021-01-31"
# Here we do the same, but then pipe in to the as.Date() function
x <- "2021-01-31" %>% as.Date()

# Again, check the structure of x (as you can see, it is now of Date type)
str(x)
##  Date[1:1], format: "2021-01-31"
# Now we take x and pipe in to the lubridate package wday function, with the argument label = TRUE (meaning it will show the name of the day instead of its numerical place in the week 1-7)
x %>% lubridate::wday(label = TRUE)
## [1] Sun
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
# And here it is in one line of code piping in to as.Date(), then in to lubridate::wday, then piping in to print() to show the result
x <- "2021-01-31" %>% as.Date() %>% lubridate::wday(label = TRUE) %>% print()
## [1] Sun
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat

Package: dplyr  
Function: filter()  
Example:  

# Filter does just that; filters results that meet logical criteria
# First, let's create a small dataframe to filter
OurData <- data.frame("Name" = c("Justine", "Christopher", "Mathew")
                     ,"FootballClub" = c("Man U", "Leeds United", "Sheffield Wednesday")
                     ,"LeagueTitlesLastYear" = c(0,1,0)) %>% print()
##          Name        FootballClub LeagueTitlesLastYear
## 1     Justine               Man U                    0
## 2 Christopher        Leeds United                    1
## 3      Mathew Sheffield Wednesday                    0
# Next, we can use the filter() function to filter the results
filter(OurData, Name == "Justine")
##      Name FootballClub LeagueTitlesLastYear
## 1 Justine        Man U                    0
# Or we can use pipe to get the same output
OurData %>% filter(Name == "Justine")
##      Name FootballClub LeagueTitlesLastYear
## 1 Justine        Man U                    0
# We can also use multiple arguments by using the and (&) or or (|) operators
# Here was are saying filter 'OurData' where either Name is Mathew OR LeagueTitlesLastYear > 0
OurData %>% filter(Name == "Mathew" | LeagueTitlesLastYear > 0)
##          Name        FootballClub LeagueTitlesLastYear
## 1 Christopher        Leeds United                    1
## 2      Mathew Sheffield Wednesday                    0

Package: dplyr  
Function: select()  
Example:  

# select() allows you to select which columns to keep and/or remove, simple!
# Let's again use the table we created above but only select the 'Name' column
select(OurData, Name) %>% print()
##          Name
## 1     Justine
## 2 Christopher
## 3      Mathew
# Or...
OurData %>% select(Name) %>% print()
##          Name
## 1     Justine
## 2 Christopher
## 3      Mathew
# Or, select two columns using pipe
OurData %>% select(Name
                  ,FootballClub) %>% print()
##          Name        FootballClub
## 1     Justine               Man U
## 2 Christopher        Leeds United
## 3      Mathew Sheffield Wednesday
# We can also rename columns as we select them - note the order of the rename
OurData %>% select(FirstName = Name
                  ,FootballClub) %>% print()
##     FirstName        FootballClub
## 1     Justine               Man U
## 2 Christopher        Leeds United
## 3      Mathew Sheffield Wednesday

Package: dplyr  
Function: mutate()  
Example:  

# mutate() is a very powerful function. It basically allows you to do any computation or transformation on the values in the data frame. You can change the values in already existing columns or create new columns based on transformation of other columns!!  Let's mutate an existing column...
mutate(OurData
      ,FootballClub = paste(FootballClub, "FC")) %>% print()
##          Name           FootballClub LeagueTitlesLastYear
## 1     Justine               Man U FC                    0
## 2 Christopher        Leeds United FC                    1
## 3      Mathew Sheffield Wednesday FC                    0
# Or, using the trusty pipe...
OurData %>% mutate(FootballClub = paste(FootballClub, "FC")) %>% print()
##          Name           FootballClub LeagueTitlesLastYear
## 1     Justine               Man U FC                    0
## 2 Christopher        Leeds United FC                    1
## 3      Mathew Sheffield Wednesday FC                    0
# Now lets use mutate() to create a new column "ClubColours"...
OurData %>% mutate(ClubColours = c("Red/Black", "White", "Blue/White"))
##          Name        FootballClub LeagueTitlesLastYear ClubColours
## 1     Justine               Man U                    0   Red/Black
## 2 Christopher        Leeds United                    1       White
## 3      Mathew Sheffield Wednesday                    0  Blue/White

LUBRIDATE

Package: lubridate  
Function: year()  
Example:  

# Extract the year from a date
lubridate::year("2021-01-31") %>% print()
## [1] 2021
# Or...
"2021-01-31" %>% lubridate::year() %>% print()
## [1] 2021

Package: lubridate  
Function: isoweek()  
Example:  

# Extract the week from a date.  There are 3 options for week from lubridate, but isoweek returns the week as it would appear in the ISO 8601 system, which uses a reoccurring leap week, unlike week(), which returns the number of complete seven day periods that have occurred between the date and January 1st, plus one
lubridate::isoweek("2021-01-31") %>% print()
## [1] 4
# Or...
"2021-01-31" %>% lubridate::isoweek() %>% print()
## [1] 4

Package: lubridate  
Function: wday()  
Example:  

# Extract the day of the week from a date, either as a number (of it's day in the week 1-7)...
lubridate::wday("2021-01-31") %>% print()
## [1] 1
# Or as the name of the day...
"2021-01-31" %>% lubridate::wday(label = TRUE, abbr = FALSE) %>% print()
## [1] Sunday
## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday
# Or as the name of the day abbreviated to 3 letters...
"2021-01-31" %>% lubridate::wday(label = TRUE, abbr = TRUE) %>% print()
## [1] Sun
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat

Base R

Package: NA (Available in base R)  
Function: setwd()  
Example:  

# Sets the file path of the R session working directory
setwd("C:/Temp/R_Working_Dir")

Package: NA (Available in base R)  
Function: getwd()  
Example:  

# Get the file path of the current R session working directory
getwd()
## [1] "C:/Temp/R_Working_Dir"

Package: NA (Available in base R)  
Function: print()  
Example:  

# Prints the output of the code to the console
print(OurData)
##          Name        FootballClub LeagueTitlesLastYear
## 1     Justine               Man U                    0
## 2 Christopher        Leeds United                    1
## 3      Mathew Sheffield Wednesday                    0
# Or, using the good old pipe...
OurData %>% print()
##          Name        FootballClub LeagueTitlesLastYear
## 1     Justine               Man U                    0
## 2 Christopher        Leeds United                    1
## 3      Mathew Sheffield Wednesday                    0

Package: NA (Available in base R)  
Function: data.frame()  
Example:  

# Create a simple data frame
df <- data.frame(x = 1:5, y = c('Andrew', 'Boris', 'Clive', 'Dom', 'Ethan'))

Package: NA (Available in base R)  
Function: for loop  
Example:  

# Using a for loop allows us to repeat a specific block of code multiple times using this syntax: for (variable in sequence) {Do something}
for (i in 1:5){
j <- i + 10
print(j)
}
## [1] 11
## [1] 12
## [1] 13
## [1] 14
## [1] 15

Lots more to come!!!