2023-07-06

Class Plan

  • Data activity (10 min)
  • Coding! Coding! Coding! (50 min)
  • Break (5 min)
  • Discuss readings (10 min)
  • More coding! (40 min)
  • Introduce Problem Set (Remainder)

Week 2 Groups!

print.data.frame(groups)
##                    group 1             group 2                   group 3
## 1                Su, Barry     Gnanam, Akash Y              Gupta, Umang
## 2             Ng, Michelle Premkrishna, Shrish Saccone, Alexander Connor
## 3 Crawford, John Alexander         Tian, Zerui        Jun, Ernest Ng Wei
## 4                              Knutson, Blue C       Albertini, Federico
##                   group 4              group 5                 group 6
## 1     Andrew Yu Ming Xin, Dotson, Bianca Ciara                        
## 2   Widodo, Ignazio Marco     Wan Rosli, Nadia Spindler, Laine Addison
## 3 Alsayegh, Aisha E H M I                                Cai, Qingyuan
## 4           Ning, Zhi Yan      Tan, Zheng Yang   Leong, Wen Hou Lester
##                    group 7
## 1            Lim, Fang Jan
## 2 Huynh Le Hue Tam, Vivian
## 3             Shah, Jainam
## 4   Cortez, Hugo Alexander

Data Activity

  • Task: communicate a message about the data using a visual approach of your choosing.
  • Be creative! Think about using the numeric and text information in different ways.
  • Visualization does not need to be complete!

Rising Temperatures

Rising Temperatures

Rising Temperatures

Rising Temperatures

Rising Temperatures

Rising Temperatures

Environmental Justice and Law

Problem Set 1 Check-in

  • How are we feeling?
  • Close your eyes!
  • Hold thumb up/middle/down

How Will Problem Sets Be Graded?

  • For each question:
  • Is there an answer? (5 points, usually 1 per question)
  • Is answer correct, reasonable, or if not, is there a serious attempt to resolve this and a thorough explanation of any errors/problems? (5 points, usually 1 per question)
  • Is the answer presented in a way that is clear and engages with the course materials where appropriate? (5 points, usually for more in-depth questions or the document as a whole)

A Note on Markdown

  • Consider using commands within the chunk headers
  • message = FALSE removes messages from display
  • warning = FALSE removes warnings from display
  • chunk header = {r example1}
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

A Note on Markdown

  • Consider using commands within the chunk headers
  • message = FALSE removes messages from display
  • warning = FALSE removes warnings from display
  • chunk header = {r example2, message = FALSE}
library(dplyr)

A Note on Markdown

  • Consider using commands within the chunk headers
  • message = FALSE removes messages from display
  • warning = FALSE removes warnings from display
  • Lots of other chunk header options as well!

Starting an R Script

#####################################################
## title: a new R script!
## author: you!
## purpose: to try out R
## date: today's date
#####################################################

# you can start coding below

Installing and using packages

install.packages("dplyr")
library(dplyr)

Working with Dataframes in R

  • We’ve worked with matrices a little bit, but there was something new on the problem set: a dataframe
  • Dataframes are similar to matrices, but can be collections of all types of data

Working with Dataframes in R

  • Try converting a matrix to a dataframe
# create 3x3 matrix
mat <- matrix(c(1,2,3, 1,2,3, 1,2,3), nrow = 3)

# look at matrix
mat
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    2    2    2
## [3,]    3    3    3

Working with Dataframes in R

# create dataframe
df <- as.data.frame(mat)

# look at dataframe
df
##   V1 V2 V3
## 1  1  1  1
## 2  2  2  2
## 3  3  3  3

Working with Dataframes in R

  • Benefits of dataframes? We can add other types of information!
# add character variable
df$V4 <- c("one", "two", "three")

# look at our dataframe
df
##   V1 V2 V3    V4
## 1  1  1  1   one
## 2  2  2  2   two
## 3  3  3  3 three

Working with dataframes in R

-We’ll now create our own dataframe

# create a dataframe (note that : returns a sequence between the numbers, by 1)
df <- data.frame(year = 2000:2020, temp = 40:60)

# look at the first 6 rows
head(df)
##   year temp
## 1 2000   40
## 2 2001   41
## 3 2002   42
## 4 2003   43
## 5 2004   44
## 6 2005   45

Working with Dataframes in R

  • Try creating the following dataframe, and add a column that signifies the month of the temperature recording
  • We’ll say that each observation takes place in July
# create a dataframe (note that : returns a sequence between the numbers, by 1)
df <- data.frame(year = 2000:2020, temp = 40:60)

Working with Dataframes in R

  • There is another way to manipulate variables in dataframes, which we will mostly use (rather than $)
  • It is called the pipe
  • How does it work?
  • Instead of df$month <- "july", we would write df %>% mutate(month = "july")
  • But why?

Working with Dataframes in R

  • As our code becomes more complicated, this will help us simplify it
  • Let’s first understand what %>% does
fourth_function(third_function(second_function(first_function(x))))

So instead, we opt for the dplyr method, which is written as follows:

x %>%
  first_function() %>% 
  second_function() %>% 
  third_function() %>%
  fourth_function()

Working with Dataframes in R

  • Now let’s return to the previous example (instead of df$month <- "july", we would write df %>% mutate(month = "july"))
  • Given what we know about %>%, what would be another way to write df %>% mutate(month = "july")?

Working with Dataframes in R

  • %>% puts whatever precedes it into the first argument of the function that follows it
  • df %>% mutate(month = "july") is equivalent to mutate(df, month = "july")

Working with Dataframes in R

  • Another example
# first, let's write a simple function
add_2 <- function(x){
  return(x+2)
}

# check that it is working
add_2(5)
## [1] 7
  • How would we write the above statement using a pipe?

Working with Dataframes in R

# now try the pipe method
5 %>% add_2()
## [1] 7
# try adding 2, twice
5 %>%
  add_2() %>%
  add_2()
## [1] 9

Working with Dataframes in R

  • Returning to information from earlier:
  • mutate is a function we will use a lot, which allows us to create and modify variables in a dataframe
  • Let’s look at the mutate help page

Working with Dataframes in R

And now, returning to the dataframe from earlier:

# look at the first 6 rows
head(df)
##   year temp
## 1 2000   40
## 2 2001   41
## 3 2002   42
## 4 2003   43
## 5 2004   44
## 6 2005   45
  • What if we wanted to change the temperatures from Celsius to Farenheit?
  • Discuss in groups how this should be done.

Working with Dataframes in R

library(dplyr)

# function to get from C to F
c_to_f <- function(c){
  f <- 9/5*c+32
  return(f)
}

# run function on temperature variable to create new variable
df %>%
  mutate(temp_f = c_to_f(temp)) %>%
  head() 
##   year temp temp_f
## 1 2000   40  104.0
## 2 2001   41  105.8
## 3 2002   42  107.6
## 4 2003   43  109.4
## 5 2004   44  111.2
## 6 2005   45  113.0

Working with Dataframes in R

  • Let’s look at our dataframe now
  • No temp_f! What happened?
# take a look at the first 6 rows, again
head(df)
##   year temp
## 1 2000   40
## 2 2001   41
## 3 2002   42
## 4 2003   43
## 5 2004   44
## 6 2005   45

Working with Dataframes in R

  • To permanently assign a new column to a dataframe, we would need to say df <- df %>% mutate(temp_f = c_to_f(temp))
  • In this case, df$temp_f <- c_to_f(df$temp) would also work
  • There is another pipe that is useful
  • The magrittr pipe is written %<>%
  • We can use this pipe to shorten our code: df %<>% mutate(temp_f = c_to_f(temp)) is equivalent to those above

Readings

  • In small groups, discuss:
  • What is risk?
  • What is resilience?

Risk and Resilience

  • Risks are probabilities that people or places will be affected by events such as natural disasters.
  • Risks may be codified (flood plains) or not, and may affect specific populations (sea level rise) or not.
  • Resilience is the ability to withstand or avoid the negative consequences of disasters.
  • Resilience may vary across demographics, but may also depend on social networks and neighborhoods.
  • Sometimes resilience is the ability to leave, other times it is the ability to stay put.

Downloading Books

  • We’ll begin our text analysis journey with the gutenbergr R package
  • We can load this package and read in a book from this list

Downloading Books

  • We’ll try a 1913 book called “Our Vanishing Wild Life: Its Extermination and Preservation.”
#install.packages("gutenbergr")
library(gutenbergr)

# download the book - notice that the number is taken from the gutenberg website
vanishing_wl <- gutenberg_download(c(13249), meta_fields = "title")

Downloading Books

  • Activity: pick another book from the gutenbergr list and try downloading it into your R environment!
  • https://www.gutenberg.org/ebooks/
  • Use gutenberg_download() with the E-book number

Downloading Books

  • Try looking at your data (it might be a bit of a mess!)
  • There is information in the course reader about organizing these types of data

APIs

  • Application Program Interfaces are another way to gather online data
  • These allow us to access information from websites, directly from R

APIs

  • Lucky for us, there are lots of R packages which help us pull data through APIs
  • We’ll look at two today: The Guardian (newspaper) and Google Trends

APIs

APIs

  • While we wait on The Guardian, let’s explore the gtrendsR package
library(gtrendsR)

APIs

  • We can look at search trends for specific topics over time
hur_wf <- gtrends(c("wildfire", "hurricane"), 
                  geo = c("US"))

APIs

  • We can also easily plot our results!

APIs

  • Another example

APIs

  • Activity: in groups, pick a couple keywords and a location to look at google trends data (e.g. “smoke”, “fireworks”, “smog”; “US-CA”).
  • Plot your results! (just use plot())
  • Use ?gtrendsr for more information on the function.

APIs

  • For the Guardian API, we will use guardianapi
  • Install and library the package
library(guardianapi)
  • Next, you will run gu_api_key() and enter your API Key

APIs

  • The Guardian API lets us pull full text of news articles and blogs from within topics and date ranges that we specify. -For example, we can pull articles on the recent Canadian wildfires and smoke in New York City with the following code:
ca_wf <- gu_content('"Canada" AND "wildfire" AND  "smoke" AND  "air quality" AND "New York City"',
                         from_date = "2023-06-01")

APIs

  • Activity: in groups, try pulling recent Guardian articles on topics of your choosing (e.g. “heat”, “smog”, “air quality”)
  • What do we notice about the data?

APIs

  • There are lots of variables!
  • We can limit our data by using the select() function

Web Scraping

  • We can also pull information directly from websites!
  • However, we should be careful as this is sometimes illegal

Web Scraping

  • For the most part, scraping Wikipedia shouldn’t be a problem
  • We’ll use the rvest package to help us scrape Wikipedia data
library(rvest)

Web Scraping

Web Scraping

  • We can read this page’s HTML
  • We can think of HTML as the language behind a website, which tells your browser the meaning and structure of the site
# read in html
us_disasters <- read_html("https://en.wikipedia.org/wiki/List_of_natural_disasters_in_the_United_States")

# take a look
us_disasters
## {html_document}
## <html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-enabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled vector-feature-zebra-design-disabled" lang="en" dir="ltr">
## [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
## [2] <body class="skin-vector skin-vector-search-vue mediawiki ltr sitedir-ltr ...

Web Scraping

  • For now, we will not clean these data
  • But there is information in the course reader on how to do this

Web Scraping

##   Year                                   Disaster Death.toll Damage.costUS.
## 1 2023                           Tornado outbreak         33   $4.3 billion
## 2 2023                           Tornado outbreak         25   $1.9 billion
## 3 2023              Flooding and Tornado outbreak         13   $4.5 billion
## 4 2023 Derecho, Tornado outbreak and Winter storm         14               
## 5 2022                               Winter storm        106   $5.4 billion
## 6 2022                                 Earthquake          2               
##                                    Main.article
## 1  Tornado outbreak of March 31 – April 1, 2023
## 2         Tornado outbreak of March 24–27, 2023
## 3 Early-March 2023 North American storm complex
## 4    February 2023 North American storm complex
## 5     December 2022 North American winter storm
## 6                      2022 Ferndale earthquake
##                                                                                                                               Location
## 1                                                                                     Southern United States, Midwestern United States
## 2                                                                                                               Southern United States
## 3                                                                               Southwestern United States, Southeastern United States
## 4                                                           Western United States, Southern United States and Midwestern United States
## 5 Western United States, Midwestern United States, Great Lakes region (especially the Buffalo-Niagara Falls metropolitan area), Canada
## 6                                                                                               North Coast, California, United States
##   Notes
## 1      
## 2      
## 3      
## 4      
## 5      
## 6

Web Scraping

Biodiversity

  • Biodiversity loss is the reduction of genes, species, and traits in an ecosystem.
  • We are currently experiencing earth’s 6th mass extinction event.

Locke et al, 2021

Biodiversity

Problem Set 2

  • Due on Monday!
  • Will post grading rubric
  • Use R Markdown or Jupyter notebooks to format nicely!
  • Come to office hours with questions, or schedule time to meet