setwd("~/Users/shanaya/Documents/POL3325G Data Science Winter 2025/Lectures/Lecture 4")
Note: it is totally fine if you set the working directory by
navigating to Session –> Set Working Directory (and you do not have a
line of code in your script/R Markdown for setwd()
. The
code block above shows the code that executes in the console when you
manually set the working directory.
library(rio)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Import the federal candidates data. Remember, you either need a copy of the data in your working directory to import it in the usual way:
dat <- import("federal-candidates-2023-subset.dta")
Or you can import the data by specifying a pathway to the data in another folder:
dat <- import("/Users/shanayavanhooren/Documents/POL3325G/Lectures/Lecture 2/federal-candidates-2023-subset.dta")
It is totally normal for your pathway to look different than mine because you’re not working on my computer!
Create a new variable called “region”. The “region” variable should include the categories: West (all Western Canada provinces), Central (Ontario and Quebec), East (Newfoundland, New Brunswick, Nova Scotia, PEI). The territories can be coded as NA. The variable should be a factor and the order of the categories should be West, Central, East. What functions did you use? How did you check your work? (HINT: table())
table(dat$province) # here, I am just checking how the province variable is coded
##
## Alberta British Columbia Manitoba
## 962 1234 431
## New Brunswick Newfoundland and Labrador Northwest Territories
## 265 179 25
## Nova Scotia Nunavut Ontario
## 301 22 3467
## Prince Edward Island Quebec Saskatchewan
## 99 2615 376
## Yukon
## 25
dat <- dat %>%
mutate(region =
case_match(
province,
c("Alberta", "British Columbia", "Manitoba", "Saskatchewan") ~ 'West',
c("Ontario", "Quebec") ~ "Central",
c("Newfoundland and Labrador", "New Brunswick", "Nova Scotia", "Prince Edward Island") ~ "East",
.default = NA
)) %>%
mutate(region = factor(region, levels=c("West", "Central", "East")))
table(dat$province, dat$region)
##
## West Central East
## Alberta 962 0 0
## British Columbia 1234 0 0
## Manitoba 431 0 0
## New Brunswick 0 0 265
## Newfoundland and Labrador 0 0 179
## Northwest Territories 0 0 0
## Nova Scotia 0 0 301
## Nunavut 0 0 0
## Ontario 0 3467 0
## Prince Edward Island 0 0 99
## Quebec 0 2615 0
## Saskatchewan 376 0 0
## Yukon 0 0 0
I used the argument .default within the `case_match()
and specified NA to code any other (unspecified) categories (the
territories) as NA. To be extra sure that this worked as you expected,
you might ask R to return the rows of the dat object where province is
Yukon (one of the territories) to compare how it is coded in the region
variable to how it is coded in the province variable.
dat %>%
filter(province == "Yukon")
## id parliament year candidate_name edate birth_year country_birth
## 1 3218 38 2004 CAPP, Geoffrey 2004-06-28 NA
## 2 8432 39 2006 GREETHAM, Sue 2006-01-23 NA
## 3 2299 39 2006 BOYDE, Pam 2006-01-23 NA
## 4 20902 39 2006 LEBLOND, Philippe 2006-01-23 NA
## 5 696 39 2006 BAGNELL, Larry 2006-01-23 1949
## 6 25318 40 2008 PASLOSKI, Darrell 2008-10-14 NA
## 7 1938 40 2008 BOLTON, Ken 2008-10-14 NA
## 8 696 40 2008 BAGNELL, Larry 2008-10-14 1949
## 9 29048 40 2008 STREICKER, John 2008-10-14 NA
## 10 29048 41 2011 STREICKER, John 2011-05-02 NA
## 11 31693 41 2011 BARR, Kevin 2011-05-02 NA
## 12 32688 41 2011 LEEF, Ryan 2011-05-02 1973
## 13 696 41 2011 BAGNELL, Larry 2011-05-02 NA
## 14 32059 42 2015 DE JONG, Frank 2015-10-19 NA
## 15 696 42 2015 BAGNELL, Larry 2015-10-19 1949
## 16 31655 42 2015 ATKINSON, Melissa 2015-10-19 NA
## 17 32688 42 2015 LEEF, Ryan 2015-10-19 NA
## 18 35165 43 2019 Zelezny, Joseph 2019-10-21 NA
## 19 34213 43 2019 Morris, Lenore 2019-10-21 NA
## 20 696 43 2019 Bagnell, Larry 2019-10-21 1949 Canada
## 21 34889 43 2019 Lemphers, Justin 2019-10-21 NA
## 22 35325 43 2019 Smith, Jonas Jacot 2019-10-21 NA
## 23 35325 44 2021 Smith, Jonas Jacot 2021-09-20 NA
## 24 34213 44 2021 Morris, Lenore 2021-09-20 NA
## 25 36018 44 2021 Hanley, Brendan 2021-09-20 NA
## occupation riding_id riding province votes
## 1 customer service representative NA YUKON Yukon 100
## 2 entrepreneur 60001 YUKON Yukon 3341
## 3 businesswoman/consultant 60001 YUKON Yukon 3366
## 4 bicycle shop owner/sculptor 60001 YUKON Yukon 559
## 5 parliamentarian 60001 YUKON Yukon 6847
## 6 pharmacist - businessman 60001 YUKON Yukon 4788
## 7 retired 60001 YUKON Yukon 1276
## 8 parliamentarian 60001 YUKON Yukon 6715
## 9 professional engineer 60001 YUKON Yukon 1881
## 10 engineer 60001 YUKON Yukon 3037
## 11 self-employed 60001 YUKON Yukon 2308
## 12 civil servant 60001 YUKON Yukon 5422
## 13 parliamentarian 60001 YUKON Yukon 5290
## 14 teacher 60001 YUKON Yukon 533
## 15 retired 60001 YUKON Yukon 10887
## 16 lawyer 60001 YUKON Yukon 3943
## 17 parliamentarian 60001 YUKON Yukon 4928
## 18 IT Consultant and Product Manager 6001 Yukon Yukon 284
## 19 Lawyer 6001 Yukon Yukon 2201
## 20 Parliamentarian 6001 Yukon Yukon 7034
## 21 Labour Official 6001 Yukon Yukon 4617
## 22 Executive Director 6001 Yukon Yukon 6881
## 23 Consultant 60001 Yukon Yukon 2639
## 24 Lawyer 60001 Yukon Yukon 846
## 25 Physician 60001 Yukon Yukon 6471
## percent_votes party_raw party_minor_group
## 1 0.798212 Christian Heritage Party of Canada Christian_Heritage
## 2 23.673208 Conservative Party of Canada Conservative
## 3 23.850351 New Democratic Party NDP
## 4 3.960887 Green Party of Canada Green
## 5 48.515553 Liberal Liberal
## 6 32.660301 Conservative Party of Canada Conservative
## 7 8.703957 New Democratic Party NDP
## 8 45.804913 Liberal Liberal
## 9 12.830832 Green Party of Canada Green
## 10 18.913870 Green Party of Canada Green
## 11 14.373794 New Democratic Party NDP
## 12 33.767204 Conservative Party of Canada Conservative
## 13 32.945133 Liberal Liberal
## 14 2.626780 Green Party of Canada Green
## 15 53.654331 Liberal Liberal
## 16 19.432261 New Democratic Party NDP
## 17 24.286629 Conservative Party of Canada Conservative
## 18 1.351287 People's Party of Canada PPC
## 19 10.472475 Green Party of Canada Green
## 20 33.468143 Liberal Party of Canada Liberal
## 21 21.967930 New Democratic Party NDP
## 22 32.740162 Conservative Party of Canada Conservative
## 23 13.598886 Independent Independent
## 24 4.359477 Green Party of Canada Green
## 25 33.345356 Liberal Party of Canada Liberal
## party_major_group gov_party_raw gov_minor_group
## 1 Third_Party Liberal Party of Canada Liberal
## 2 Conservative Conservative Party of Canada Conservative
## 3 CCF_NDP Conservative Party of Canada Conservative
## 4 Third_Party Conservative Party of Canada Conservative
## 5 Liberal Conservative Party of Canada Conservative
## 6 Conservative Conservative Party of Canada Conservative
## 7 CCF_NDP Conservative Party of Canada Conservative
## 8 Liberal Conservative Party of Canada Conservative
## 9 Third_Party Conservative Party of Canada Conservative
## 10 Third_Party Conservative Party of Canada Conservative
## 11 CCF_NDP Conservative Party of Canada Conservative
## 12 Conservative Conservative Party of Canada Conservative
## 13 Liberal Conservative Party of Canada Conservative
## 14 Third_Party Liberal Party of Canada Liberal
## 15 Liberal Liberal Party of Canada Liberal
## 16 CCF_NDP Liberal Party of Canada Liberal
## 17 Conservative Liberal Party of Canada Liberal
## 18 Third_Party Liberal Party of Canada Liberal
## 19 Third_Party Liberal Party of Canada Liberal
## 20 Liberal Liberal Party of Canada Liberal
## 21 CCF_NDP Liberal Party of Canada Liberal
## 22 Conservative Liberal Party of Canada Liberal
## 23 Independent Liberal Party of Canada Liberal
## 24 Third_Party Liberal Party of Canada Liberal
## 25 Liberal Liberal Party of Canada Liberal
## gov_major_group num_candidates type_elxn elected incumbent gender lgbtq2_out
## 1 Liberal 6 1 0 0 0 NA
## 2 Conservative 4 1 0 0 1 NA
## 3 Conservative 4 1 0 0 1 NA
## 4 Conservative 4 1 0 0 0 NA
## 5 Conservative 4 1 1 1 0 NA
## 6 Conservative 4 1 0 0 0 NA
## 7 Conservative 4 1 0 0 0 NA
## 8 Conservative 4 1 1 1 0 NA
## 9 Conservative 4 1 0 0 0 NA
## 10 Conservative 4 1 0 0 0 NA
## 11 Conservative 4 1 0 0 0 NA
## 12 Conservative 4 1 1 0 0 NA
## 13 Conservative 4 1 0 1 0 NA
## 14 Liberal 4 1 0 0 0 NA
## 15 Liberal 4 1 1 0 0 NA
## 16 Liberal 4 1 0 0 1 NA
## 17 Liberal 4 1 0 1 0 NA
## 18 Liberal 5 1 0 0 0 NA
## 19 Liberal 5 1 0 0 1 NA
## 20 Liberal 5 1 1 1 0 NA
## 21 Liberal 5 1 0 0 0 NA
## 22 Liberal 5 1 0 0 0 NA
## 23 Liberal 5 1 0 0 0 0
## 24 Liberal 5 1 0 0 1 0
## 25 Liberal 5 1 1 0 0 0
## indigenousorigins lawyer censuscategory acclaimed switcher
## 1 0 0 6 0 0
## 2 0 0 1 0 0
## 3 0 0 1 0 0
## 4 0 0 6 0 0
## 5 0 0 10 0 0
## 6 0 0 3 0 0
## 7 0 0 NA 0 0
## 8 0 0 10 0 0
## 9 0 0 2 0 0
## 10 0 0 2 0 0
## 11 1 0 NA 0 0
## 12 0 0 0 0 0
## 13 0 0 10 0 0
## 14 0 0 4 0 0
## 15 0 0 NA 0 0
## 16 1 1 4 0 0
## 17 0 0 10 0 0
## 18 0 0 0 0 0
## 19 0 1 4 0 0
## 20 0 0 10 0 0
## 21 0 0 6 0 0
## 22 0 0 0 0 0
## 23 0 0 4 0 0
## 24 0 1 4 0 0
## 25 0 0 3 0 0
## multiple_candidacy region
## 1 0 <NA>
## 2 0 <NA>
## 3 0 <NA>
## 4 0 <NA>
## 5 0 <NA>
## 6 0 <NA>
## 7 0 <NA>
## 8 0 <NA>
## 9 0 <NA>
## 10 0 <NA>
## 11 0 <NA>
## 12 0 <NA>
## 13 0 <NA>
## 14 0 <NA>
## 15 0 <NA>
## 16 0 <NA>
## 17 0 <NA>
## 18 0 <NA>
## 19 0 <NA>
## 20 0 <NA>
## 21 0 <NA>
## 22 0 <NA>
## 23 0 <NA>
## 24 0 <NA>
## 25 0 <NA>
I can double check that my variable was coded as a factor with the proper levels:
class(dat$region) # should return "factor"
## [1] "factor"
levels(dat$region) # should return levels as West, Central, East
## [1] "West" "Central" "East"
Create a variable called “sex” with the categories “male” and “female” that is based on the existing gender variable in the dataset. You may need to think creatively to try to figure out what the 0s and 1s in the gender column mean (which is female? which is male?).
First, I wonder what values the gender variable can take (the
question hints at this - 0s and 1s). One way to check is to use the
table()
function.
table(dat$gender)
##
## 0 1
## 7086 2914
Yup, all zeroes and ones.
Next, I filter the columns of the dataset to look only at the variables that I think might be useful for deciphering the gender variable, since there’s no codebook to tell us what 0 and 1 stand for. Is 0 male? Or is 0 female?
check_gender <- dat %>%
select(id, year, candidate_name, province, incumbent, gender)
head(check_gender, 40)
## id year candidate_name province incumbent gender
## 1 4443 2004 CÔTÉ, Jean-Guy Quebec 0 0
## 2 24524 2004 NELSON, Erin British Columbia 0 0
## 3 20062 2004 KUNZ, Revel British Columbia 0 1
## 4 20861 2004 LE BEL, Benjamin Quebec 0 0
## 5 19061 2004 KOSSICK, Don Saskatchewan 0 0
## 6 2813 2004 BUORS, Chris Manitoba 0 0
## 7 4318 2004 CORBIERE, Mark Ontario 0 0
## 8 6320 2004 ELGERSMA, Steven Ontario 0 0
## 9 3258 2004 CARIGNAN, Jean Guy Quebec 1 0
## 10 9532 2004 HOFFMAN, Rachel Quebec 0 1
## 11 27178 2004 ROSE, Phil Ontario 0 0
## 12 30686 2004 WATSON, Jeff Ontario 0 0
## 13 21944 2004 MACLEAN, Blair Ontario 0 0
## 14 5173 2004 DE COSTE, Guy Quebec 0 0
## 15 26715 2004 RICHARD, Lucien Quebec 0 0
## 16 4797 2004 CURRIE, Bev Saskatchewan 0 0
## 17 30225 2004 VAN OOSTEN, Andrew Ontario 0 0
## 18 30270 2004 VELLACOTT, Maurice Saskatchewan 1 0
## 19 1924 2004 BOLDUC, Christian Quebec 0 0
## 20 7732 2004 GENEST, Claude William Quebec 0 0
## 21 3520 2004 C. SCHERRER, Hélène Quebec 1 1
## 22 29285 2004 TANGRI, Nina Ontario 0 1
## 23 3828 2004 CHOUINARD, Normand Quebec 0 0
## 24 28248 2004 SINCLAIR, Bruce Alberta 0 0
## 25 27680 2004 SAVOY, Andy New Brunswick 1 0
## 26 3686 2004 CHATTERS, Dave Alberta 1 0
## 27 23489 2004 MCVICAR, Bob New Brunswick 0 0
## 28 24016 2004 MOORE, Ed Ontario 0 0
## 29 6366 2004 ELLIS, Peter Ontario 0 0
## 30 3576 2004 CHAN, Shirley British Columbia 0 1
## 31 21023 2004 LÉGARÉ, Patrick Quebec 0 0
## 32 19081 2004 KOVATCH, Moe Saskatchewan 0 0
## 33 2894 2004 BURTON, Andy British Columbia 1 0
## 34 28975 2004 STINSON, Darrel British Columbia 1 0
## 35 1694 2004 BISSONNETTE, Lise Quebec 0 1
## 36 1048 2004 BEAUCHAMP, Benoît Quebec 0 0
## 37 1565 2004 BEZAN, James Manitoba 0 0
## 38 6922 2004 FOGAL, Connie British Columbia 0 1
## 39 25673 2004 PETERSEN, Donna British Columbia 0 1
## 40 23600 2004 MERRIFIELD, Pete Ontario 0 0
When I scroll through this filtered data, I see that gender is mostly coded as zero and for those cases coded as 1, the candidate’s first names are those like “Rachel”, “Shirley”, “Lise”, “Donna”, so I am pretty confident that 1 = female.
Now I’m ready to recode the gender variable using
mutate()
and case_match()
.
dat <- dat %>%
mutate(sex = case_match(gender,
1 ~ "female",
0 ~ "male"))
Let’s check our variable recoding:
table(dat$gender, dat$sex)
##
## female male
## 0 0 7086
## 1 2914 0
In the table above, we see that all of the observations that were assigned a 1 are labelled “female” and all of the observations assigned as 0 on the gender variable are assigned “male” on the sex variable. We’re confident that the recoding worked!