Helpful tip: Using the # at the start of a line (not within a code block) in an R Markdown document will create a header. Click the “Outline” tab in the top right corner of the source pane to toggle between your headers.

set working directory

Remember, the path specified below will look different in your script since you’re not working on my computer!

setwd("~/Users/shanaya/Documents/POL3325G Data Science Winter 2025/Lectures/Lecture 3")

load packages

library(rio)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

import data, save it as an object

You can call your object whatever you’d like! Here, I save my data as an object called “dat”. Just save it as something so that it appears in the global environment.

dat <- import("federal-candidates-2023-subset.dta") 

Question 1:

Using the federal candidates dataset that we have already imported into R during this lesson, I want you to subset the dataframe to include only the following variables: ID variable, election date, candidates names, and occupation.

Solution:

dat2 <- dat %>%
  select(id, edate, candidate_name, occupation)

Above, I filter the dataset to keep only the specified columns (id, edate, candidate_name, occupation). You may have had to look at the names of the variables using the names() function to figure out the exact name of the ID and election data variables.

Question 2:

Subset your dataframe to keep only those candidates that participated in the 2011 election. (HINT: edate == 2011-05-02)

Solution:

dat3 <- dat2 %>% filter(edate == "2011-05-02")

Above, I filter the data to keep only the candidates for the 2011 federal election by specifying the date of the election. Notice how I had to put the date in quotation marks. This was because the edate variable is of the class “character”.

Question 3:

Sort your dataframe by province.

Solution:

dat3 %>% arrange(province) # this won't work! See explanation below

Uh oh! We can’t sort by province because we did not keep the province variable when we first subsetted the columns of our dataset using select(). If we wanted to sort by province, we could add the province variable above and then sort the data.

Question 4:

Rename the occupation variable to ‘job’.

Solution:

dat3 <- rename(dat3, job = occupation)

You can check this worked by running:

head(dat3)
##      id      edate        candidate_name               job
## 1 31383 2011-05-02   WRZESNEWSKYJ, Borys   parliamentarian
## 2 32629 2011-05-02 LAFORESTERIE, Francis    sales director
## 3  3988 2011-05-02          CLEARY, Ryan journalist/writer
## 4 32346 2011-05-02           GRANT, Lisa household manager
## 5  6074 2011-05-02          DUNCAN, John   parliamentarian
## 6  9490 2011-05-02         HOBACK, Randy   parliamentarian

You could also rename the variable by using the rename() function within a pipe:

dat3 <- dat3 %>%
  rename(job = occupation)