Set working directory

setwd("~/Users/shanaya/Documents/POL3325G Data Science Winter 2025/Lectures/Lecture 4")

Note: it is totally fine if you set the working directory by navigating to Session –> Set Working Directory (and you do not have a line of code in your script/R Markdown for setwd(). The code block above shows the code that executes in the console when you manually set the working directory.

Load packages

library(rio)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Question 1.

Import the federal candidates data. Remember, you either need a copy of the data in your working directory to import it in the usual way:

dat <- import("federal-candidates-2023-subset.dta") 

Or you can import the data by specifying a pathway to the data in another folder:

dat <- import("/Users/shanayavanhooren/Documents/POL3325G/Lectures/Lecture 2/federal-candidates-2023-subset.dta")

It is totally normal for your pathway to look different than mine because you’re not working on my computer!

Question 2.

Create a new variable called “region”. The “region” variable should include the categories: West (all Western Canada provinces), Central (Ontario and Quebec), East (Newfoundland, New Brunswick, Nova Scotia, PEI). The territories can be coded as NA. The variable should be a factor and the order of the categories should be West, Central, East. What functions did you use? How did you check your work? (HINT: table())

table(dat$province) # here, I am just checking how the province variable is coded
## 
##                   Alberta          British Columbia                  Manitoba 
##                       962                      1234                       431 
##             New Brunswick Newfoundland and Labrador     Northwest Territories 
##                       265                       179                        25 
##               Nova Scotia                   Nunavut                   Ontario 
##                       301                        22                      3467 
##      Prince Edward Island                    Quebec              Saskatchewan 
##                        99                      2615                       376 
##                     Yukon 
##                        25
dat <- dat %>%
  mutate(region =
           case_match(
             province,
             c("Alberta", "British Columbia", "Manitoba", "Saskatchewan") ~ 'West',
             c("Ontario", "Quebec") ~ "Central",
             c("Newfoundland and Labrador", "New Brunswick", "Nova Scotia", "Prince Edward Island") ~ "East",
             .default = NA 
           )) %>%
  mutate(region = factor(region, levels=c("West", "Central", "East")))

table(dat$province, dat$region)
##                            
##                             West Central East
##   Alberta                    962       0    0
##   British Columbia          1234       0    0
##   Manitoba                   431       0    0
##   New Brunswick                0       0  265
##   Newfoundland and Labrador    0       0  179
##   Northwest Territories        0       0    0
##   Nova Scotia                  0       0  301
##   Nunavut                      0       0    0
##   Ontario                      0    3467    0
##   Prince Edward Island         0       0   99
##   Quebec                       0    2615    0
##   Saskatchewan               376       0    0
##   Yukon                        0       0    0

I used the argument .default within the `case_match() and specified NA to code any other (unspecified) categories (the territories) as NA. To be extra sure that this worked as you expected, you might ask R to return the rows of the dat object where province is Yukon (one of the territories) to compare how it is coded in the region variable to how it is coded in the province variable.

dat %>% 
  filter(province == "Yukon")
##       id parliament year     candidate_name      edate birth_year country_birth
## 1   3218         38 2004     CAPP, Geoffrey 2004-06-28         NA              
## 2   8432         39 2006      GREETHAM, Sue 2006-01-23         NA              
## 3   2299         39 2006         BOYDE, Pam 2006-01-23         NA              
## 4  20902         39 2006  LEBLOND, Philippe 2006-01-23         NA              
## 5    696         39 2006     BAGNELL, Larry 2006-01-23       1949              
## 6  25318         40 2008  PASLOSKI, Darrell 2008-10-14         NA              
## 7   1938         40 2008        BOLTON, Ken 2008-10-14         NA              
## 8    696         40 2008     BAGNELL, Larry 2008-10-14       1949              
## 9  29048         40 2008    STREICKER, John 2008-10-14         NA              
## 10 29048         41 2011    STREICKER, John 2011-05-02         NA              
## 11 31693         41 2011        BARR, Kevin 2011-05-02         NA              
## 12 32688         41 2011         LEEF, Ryan 2011-05-02       1973              
## 13   696         41 2011     BAGNELL, Larry 2011-05-02         NA              
## 14 32059         42 2015     DE JONG, Frank 2015-10-19         NA              
## 15   696         42 2015     BAGNELL, Larry 2015-10-19       1949              
## 16 31655         42 2015  ATKINSON, Melissa 2015-10-19         NA              
## 17 32688         42 2015         LEEF, Ryan 2015-10-19         NA              
## 18 35165         43 2019    Zelezny, Joseph 2019-10-21         NA              
## 19 34213         43 2019     Morris, Lenore 2019-10-21         NA              
## 20   696         43 2019     Bagnell, Larry 2019-10-21       1949        Canada
## 21 34889         43 2019   Lemphers, Justin 2019-10-21         NA              
## 22 35325         43 2019 Smith, Jonas Jacot 2019-10-21         NA              
## 23 35325         44 2021 Smith, Jonas Jacot 2021-09-20         NA              
## 24 34213         44 2021     Morris, Lenore 2021-09-20         NA              
## 25 36018         44 2021    Hanley, Brendan 2021-09-20         NA              
##                           occupation riding_id riding province votes
## 1    customer service representative        NA  YUKON    Yukon   100
## 2                       entrepreneur     60001  YUKON    Yukon  3341
## 3           businesswoman/consultant     60001  YUKON    Yukon  3366
## 4        bicycle shop owner/sculptor     60001  YUKON    Yukon   559
## 5                    parliamentarian     60001  YUKON    Yukon  6847
## 6           pharmacist - businessman     60001  YUKON    Yukon  4788
## 7                            retired     60001  YUKON    Yukon  1276
## 8                    parliamentarian     60001  YUKON    Yukon  6715
## 9              professional engineer     60001  YUKON    Yukon  1881
## 10                          engineer     60001  YUKON    Yukon  3037
## 11                     self-employed     60001  YUKON    Yukon  2308
## 12                     civil servant     60001  YUKON    Yukon  5422
## 13                   parliamentarian     60001  YUKON    Yukon  5290
## 14                           teacher     60001  YUKON    Yukon   533
## 15                           retired     60001  YUKON    Yukon 10887
## 16                            lawyer     60001  YUKON    Yukon  3943
## 17                   parliamentarian     60001  YUKON    Yukon  4928
## 18 IT Consultant and Product Manager      6001  Yukon    Yukon   284
## 19                            Lawyer      6001  Yukon    Yukon  2201
## 20                   Parliamentarian      6001  Yukon    Yukon  7034
## 21                   Labour Official      6001  Yukon    Yukon  4617
## 22                Executive Director      6001  Yukon    Yukon  6881
## 23                        Consultant     60001  Yukon    Yukon  2639
## 24                            Lawyer     60001  Yukon    Yukon   846
## 25                         Physician     60001  Yukon    Yukon  6471
##    percent_votes                          party_raw  party_minor_group
## 1       0.798212 Christian Heritage Party of Canada Christian_Heritage
## 2      23.673208       Conservative Party of Canada       Conservative
## 3      23.850351               New Democratic Party                NDP
## 4       3.960887              Green Party of Canada              Green
## 5      48.515553                            Liberal            Liberal
## 6      32.660301       Conservative Party of Canada       Conservative
## 7       8.703957               New Democratic Party                NDP
## 8      45.804913                            Liberal            Liberal
## 9      12.830832              Green Party of Canada              Green
## 10     18.913870              Green Party of Canada              Green
## 11     14.373794               New Democratic Party                NDP
## 12     33.767204       Conservative Party of Canada       Conservative
## 13     32.945133                            Liberal            Liberal
## 14      2.626780              Green Party of Canada              Green
## 15     53.654331                            Liberal            Liberal
## 16     19.432261               New Democratic Party                NDP
## 17     24.286629       Conservative Party of Canada       Conservative
## 18      1.351287           People's Party of Canada                PPC
## 19     10.472475              Green Party of Canada              Green
## 20     33.468143            Liberal Party of Canada            Liberal
## 21     21.967930               New Democratic Party                NDP
## 22     32.740162       Conservative Party of Canada       Conservative
## 23     13.598886                        Independent        Independent
## 24      4.359477              Green Party of Canada              Green
## 25     33.345356            Liberal Party of Canada            Liberal
##    party_major_group                gov_party_raw gov_minor_group
## 1        Third_Party      Liberal Party of Canada         Liberal
## 2       Conservative Conservative Party of Canada    Conservative
## 3            CCF_NDP Conservative Party of Canada    Conservative
## 4        Third_Party Conservative Party of Canada    Conservative
## 5            Liberal Conservative Party of Canada    Conservative
## 6       Conservative Conservative Party of Canada    Conservative
## 7            CCF_NDP Conservative Party of Canada    Conservative
## 8            Liberal Conservative Party of Canada    Conservative
## 9        Third_Party Conservative Party of Canada    Conservative
## 10       Third_Party Conservative Party of Canada    Conservative
## 11           CCF_NDP Conservative Party of Canada    Conservative
## 12      Conservative Conservative Party of Canada    Conservative
## 13           Liberal Conservative Party of Canada    Conservative
## 14       Third_Party      Liberal Party of Canada         Liberal
## 15           Liberal      Liberal Party of Canada         Liberal
## 16           CCF_NDP      Liberal Party of Canada         Liberal
## 17      Conservative      Liberal Party of Canada         Liberal
## 18       Third_Party      Liberal Party of Canada         Liberal
## 19       Third_Party      Liberal Party of Canada         Liberal
## 20           Liberal      Liberal Party of Canada         Liberal
## 21           CCF_NDP      Liberal Party of Canada         Liberal
## 22      Conservative      Liberal Party of Canada         Liberal
## 23       Independent      Liberal Party of Canada         Liberal
## 24       Third_Party      Liberal Party of Canada         Liberal
## 25           Liberal      Liberal Party of Canada         Liberal
##    gov_major_group num_candidates type_elxn elected incumbent gender lgbtq2_out
## 1          Liberal              6         1       0         0      0         NA
## 2     Conservative              4         1       0         0      1         NA
## 3     Conservative              4         1       0         0      1         NA
## 4     Conservative              4         1       0         0      0         NA
## 5     Conservative              4         1       1         1      0         NA
## 6     Conservative              4         1       0         0      0         NA
## 7     Conservative              4         1       0         0      0         NA
## 8     Conservative              4         1       1         1      0         NA
## 9     Conservative              4         1       0         0      0         NA
## 10    Conservative              4         1       0         0      0         NA
## 11    Conservative              4         1       0         0      0         NA
## 12    Conservative              4         1       1         0      0         NA
## 13    Conservative              4         1       0         1      0         NA
## 14         Liberal              4         1       0         0      0         NA
## 15         Liberal              4         1       1         0      0         NA
## 16         Liberal              4         1       0         0      1         NA
## 17         Liberal              4         1       0         1      0         NA
## 18         Liberal              5         1       0         0      0         NA
## 19         Liberal              5         1       0         0      1         NA
## 20         Liberal              5         1       1         1      0         NA
## 21         Liberal              5         1       0         0      0         NA
## 22         Liberal              5         1       0         0      0         NA
## 23         Liberal              5         1       0         0      0          0
## 24         Liberal              5         1       0         0      1          0
## 25         Liberal              5         1       1         0      0          0
##    indigenousorigins lawyer censuscategory acclaimed switcher
## 1                  0      0              6         0        0
## 2                  0      0              1         0        0
## 3                  0      0              1         0        0
## 4                  0      0              6         0        0
## 5                  0      0             10         0        0
## 6                  0      0              3         0        0
## 7                  0      0             NA         0        0
## 8                  0      0             10         0        0
## 9                  0      0              2         0        0
## 10                 0      0              2         0        0
## 11                 1      0             NA         0        0
## 12                 0      0              0         0        0
## 13                 0      0             10         0        0
## 14                 0      0              4         0        0
## 15                 0      0             NA         0        0
## 16                 1      1              4         0        0
## 17                 0      0             10         0        0
## 18                 0      0              0         0        0
## 19                 0      1              4         0        0
## 20                 0      0             10         0        0
## 21                 0      0              6         0        0
## 22                 0      0              0         0        0
## 23                 0      0              4         0        0
## 24                 0      1              4         0        0
## 25                 0      0              3         0        0
##    multiple_candidacy region
## 1                   0   <NA>
## 2                   0   <NA>
## 3                   0   <NA>
## 4                   0   <NA>
## 5                   0   <NA>
## 6                   0   <NA>
## 7                   0   <NA>
## 8                   0   <NA>
## 9                   0   <NA>
## 10                  0   <NA>
## 11                  0   <NA>
## 12                  0   <NA>
## 13                  0   <NA>
## 14                  0   <NA>
## 15                  0   <NA>
## 16                  0   <NA>
## 17                  0   <NA>
## 18                  0   <NA>
## 19                  0   <NA>
## 20                  0   <NA>
## 21                  0   <NA>
## 22                  0   <NA>
## 23                  0   <NA>
## 24                  0   <NA>
## 25                  0   <NA>

I can double check that my variable was coded as a factor with the proper levels:

class(dat$region) # should return "factor"
## [1] "factor"
levels(dat$region) # should return levels as West, Central, East
## [1] "West"    "Central" "East"

Question 3.

Create a variable called “sex” with the categories “male” and “female” that is based on the existing gender variable in the dataset. You may need to think creatively to try to figure out what the 0s and 1s in the gender column mean (which is female? which is male?).

First, I wonder what values the gender variable can take (the question hints at this - 0s and 1s). One way to check is to use the table() function.

table(dat$gender)
## 
##    0    1 
## 7086 2914

Yup, all zeroes and ones.

Next, I filter the columns of the dataset to look only at the variables that I think might be useful for deciphering the gender variable, since there’s no codebook to tell us what 0 and 1 stand for. Is 0 male? Or is 0 female?

check_gender <- dat %>%
  select(id, year, candidate_name, province, incumbent, gender)

head(check_gender, 40)
##       id year         candidate_name         province incumbent gender
## 1   4443 2004       CÔTÉ, Jean-Guy           Quebec         0      0
## 2  24524 2004           NELSON, Erin British Columbia         0      0
## 3  20062 2004            KUNZ, Revel British Columbia         0      1
## 4  20861 2004       LE BEL, Benjamin           Quebec         0      0
## 5  19061 2004           KOSSICK, Don     Saskatchewan         0      0
## 6   2813 2004           BUORS, Chris         Manitoba         0      0
## 7   4318 2004         CORBIERE, Mark          Ontario         0      0
## 8   6320 2004       ELGERSMA, Steven          Ontario         0      0
## 9   3258 2004     CARIGNAN, Jean Guy           Quebec         1      0
## 10  9532 2004        HOFFMAN, Rachel           Quebec         0      1
## 11 27178 2004             ROSE, Phil          Ontario         0      0
## 12 30686 2004           WATSON, Jeff          Ontario         0      0
## 13 21944 2004         MACLEAN, Blair          Ontario         0      0
## 14  5173 2004          DE COSTE, Guy           Quebec         0      0
## 15 26715 2004        RICHARD, Lucien           Quebec         0      0
## 16  4797 2004            CURRIE, Bev     Saskatchewan         0      0
## 17 30225 2004     VAN OOSTEN, Andrew          Ontario         0      0
## 18 30270 2004     VELLACOTT, Maurice     Saskatchewan         1      0
## 19  1924 2004      BOLDUC, Christian           Quebec         0      0
## 20  7732 2004 GENEST, Claude William           Quebec         0      0
## 21  3520 2004  C. SCHERRER, Hélène           Quebec         1      1
## 22 29285 2004           TANGRI, Nina          Ontario         0      1
## 23  3828 2004     CHOUINARD, Normand           Quebec         0      0
## 24 28248 2004        SINCLAIR, Bruce          Alberta         0      0
## 25 27680 2004            SAVOY, Andy    New Brunswick         1      0
## 26  3686 2004         CHATTERS, Dave          Alberta         1      0
## 27 23489 2004           MCVICAR, Bob    New Brunswick         0      0
## 28 24016 2004              MOORE, Ed          Ontario         0      0
## 29  6366 2004           ELLIS, Peter          Ontario         0      0
## 30  3576 2004          CHAN, Shirley British Columbia         0      1
## 31 21023 2004      LÉGARÉ, Patrick           Quebec         0      0
## 32 19081 2004           KOVATCH, Moe     Saskatchewan         0      0
## 33  2894 2004           BURTON, Andy British Columbia         1      0
## 34 28975 2004        STINSON, Darrel British Columbia         1      0
## 35  1694 2004      BISSONNETTE, Lise           Quebec         0      1
## 36  1048 2004     BEAUCHAMP, Benoît           Quebec         0      0
## 37  1565 2004           BEZAN, James         Manitoba         0      0
## 38  6922 2004          FOGAL, Connie British Columbia         0      1
## 39 25673 2004        PETERSEN, Donna British Columbia         0      1
## 40 23600 2004       MERRIFIELD, Pete          Ontario         0      0

When I scroll through this filtered data, I see that gender is mostly coded as zero and for those cases coded as 1, the candidate’s first names are those like “Rachel”, “Shirley”, “Lise”, “Donna”, so I am pretty confident that 1 = female.

Now I’m ready to recode the gender variable using mutate() and case_match().

dat <- dat %>%
  mutate(sex = case_match(gender, 
                          1 ~ "female",
                          0 ~ "male"))

Let’s check our variable recoding:

table(dat$gender, dat$sex)
##    
##     female male
##   0      0 7086
##   1   2914    0

In the table above, we see that all of the observations that were assigned a 1 are labelled “female” and all of the observations assigned as 0 on the gender variable are assigned “male” on the sex variable. We’re confident that the recoding worked!