Assignment 3: Universal Basic Income

Instructor: Umair Durrani

A Case for Universal Basic Income in Canada

Watch a video on the topic of universal basic income in canada here. Think about the possible benefits and potential challenges in the implementation of this measure.

Install Required Libraries

You need to install the following packages for completing this assignment. Please run the following code in your console to install these packages.

install.packages("datapasta")
install.packages("janitor")
install.packages("kableExtra")
install.packages("RedditExtractoR")
install.packages("tidytext")

Loading Libraries

Getting Data

In this assignment, we are going to use the experimental estimates of household income well-being during the COVID-19 pandemic as released by Statistics Canada. The data tables from Statistics Canada contain information about various income and savings aspects of the population. The population is divided into quintiles as defined below:

A quintile is a statistical value of a data set that represents 20% of a given population, so the first quintile represents the lowest fifth of the data (1% to 20%); the second quintile represents the second fifth (21% to 40%) and so on

Source

We can extract the data from the Statistics Canada page using the datapasta package that you have just installed. The following animation shows how to use the datapasta package to copy and paste a table from web into R:

Try it on your own.

The result is shown below. I have named this new dataframe as income_data.

income_data <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
Row_Names = c(NA,"Household disposable income","Lowest income quintile",
         "Second income quintile","Third income quintile",
         "Fourth income quintile","Highest income quintile","Compensation of employees",
         "Lowest income quintile","Second income quintile","Third income quintile",
         "Fourth income quintile","Highest income quintile","Net mixed income",
         "Lowest income quintile","Second income quintile",
         "Third income quintile","Fourth income quintile","Highest income quintile",
         "Current transfers received","Lowest income quintile","Second income quintile",
         "Third income quintile","Fourth income quintile",
         "Highest income quintile","Household final consumption expenditure",
         "Lowest income quintile","Second income quintile","Third income quintile",
         "Fourth income quintile","Highest income quintile"),
`2019` = c("dollars","81,351","25,430","52,011","71,153",
             "95,200","162,963","74,834","10,980","33,699","62,253",
             "100,049","167,190","13,081","4,509","9,204","11,613","15,021",
             "25,057","22,348","19,106","24,061","24,142","22,101","22,331",
             "83,043","52,733","63,448","76,738","92,381","129,915"),
Quarter.1.2020 = c("dollars","19,958","6,129",
                     "12,847","17,648","23,188","39,979","18,646","2,644",
                     "8,228","15,418","24,870","42,069","3,112","1,026","2,164",
                     "2,732","3,582","6,056","6,083","4,969","6,449",
                     "6,611","5,904","6,480","19,388","15,050","15,460",
                     "18,285","21,124","27,023"),
Quarter.2.2020 = c("dollars","22,755","8,187",
                     "15,919","20,655","26,178","42,837","17,303","2,338",
                     "7,366","14,059","23,101","39,652","2,918","768","2,022",
                     "2,494","3,437","5,869","9,587","7,234","10,102",
                     "10,646","10,015","9,937","17,903","13,932","14,580",
                     "17,095","19,531","24,377"),
Quarter.3.2020 = c("dollars","23,576","8,386",
                     "15,798","21,109","27,392","45,198","18,555","2,596",
                     "8,167","15,148","24,869","41,994","3,924","1,646","2,805",
                     "3,636","4,419","7,112","7,759","6,252","8,182",
                     "8,355","7,686","8,323","20,488","15,660","16,415",
                     "19,484","22,436","28,446"),
`Q1.2020/2019` = c("% difference","-1.9","-3.6",
                     "-1.2","-0.8","-2.6","-1.9","-0.3","-3.7","-2.3","-0.9",
                     "-0.6","0.6","-4.8","-9.0","-6.0","-5.9","-4.6",
                     "-3.3","8.9","4.0","7.2","9.5","6.9","16.1","-6.6",
                     "14.2","-2.5","-4.7","-8.5","-16.8"),
`Q2.2020/Q1.2020` = c("% difference","14.0",
                        "33.6","23.9","17.0","12.9","7.1","-7.2","-11.6",
                        "-10.5","-8.8","-7.1","-5.7","-6.2","-25.1","-6.6",
                        "-8.7","-4.0","-3.1","57.6","45.6","56.6","61.0",
                        "69.6","53.3","-7.7","-7.4","-5.7","-6.5","-7.5",
                        "-9.8"),
`Q3.2020/Q2.2020` = c("% difference","3.6","2.4",
                        "-0.8","2.2","4.6","5.5","7.2","11.0","10.9",
                        "7.7","7.7","5.9","34.5","114.3","38.7","45.8",
                        "28.6","21.2","-19.1","-13.6","-19.0","-21.5","-23.3",
                        "-16.2","14.4","12.4","12.6","14.0","14.9","16.7")
) %>% 
  slice(-1) %>% 
  select(-c(6:8)) 

income_data  %>% kbl() %>% kable_paper("hover", full_width = F)
Row_Names 2019 Quarter.1.2020 Quarter.2.2020 Quarter.3.2020
Household disposable income 81,351 19,958 22,755 23,576
Lowest income quintile 25,430 6,129 8,187 8,386
Second income quintile 52,011 12,847 15,919 15,798
Third income quintile 71,153 17,648 20,655 21,109
Fourth income quintile 95,200 23,188 26,178 27,392
Highest income quintile 162,963 39,979 42,837 45,198
Compensation of employees 74,834 18,646 17,303 18,555
Lowest income quintile 10,980 2,644 2,338 2,596
Second income quintile 33,699 8,228 7,366 8,167
Third income quintile 62,253 15,418 14,059 15,148
Fourth income quintile 100,049 24,870 23,101 24,869
Highest income quintile 167,190 42,069 39,652 41,994
Net mixed income 13,081 3,112 2,918 3,924
Lowest income quintile 4,509 1,026 768 1,646
Second income quintile 9,204 2,164 2,022 2,805
Third income quintile 11,613 2,732 2,494 3,636
Fourth income quintile 15,021 3,582 3,437 4,419
Highest income quintile 25,057 6,056 5,869 7,112
Current transfers received 22,348 6,083 9,587 7,759
Lowest income quintile 19,106 4,969 7,234 6,252
Second income quintile 24,061 6,449 10,102 8,182
Third income quintile 24,142 6,611 10,646 8,355
Fourth income quintile 22,101 5,904 10,015 7,686
Highest income quintile 22,331 6,480 9,937 8,323
Household final consumption expenditure 83,043 19,388 17,903 20,488
Lowest income quintile 52,733 15,050 13,932 15,660
Second income quintile 63,448 15,460 14,580 16,415
Third income quintile 76,738 18,285 17,095 19,484
Fourth income quintile 92,381 21,124 19,531 22,436
Highest income quintile 129,915 27,023 24,377 28,446

Understanding the Variables in Row_Names

The following table shows the meanings of the variables:

Variable Description Source
Household Disposable Income The amount of money that a household has to spend or save after income taxes have been deducted Investopedia
Compensation of Employees The total gross (pre-tax) wages paid by employers to employees for work done in an accounting period, such as a quarter or a year Wikipedia
Net Mixed Income Mixed income refers to the net income of both farm and non-farm unincorporated businesses, representing income that is generated by these types of businesses in the production of goods and services Statistics Canada
Current Transfers Received Government transfers include all transfer payments from federal and provincial governments programs intended to provide income support to certain groups, such as seniors, families and those injured in the workplace Statistics Canada
Household Final Consumption Expenditure All types of expenditure that a household does. See the source for more info. Statistics Canada

Tidying Data

In the current format the income_data is human readable. This current format is the wide data format. But for data visualization with ggplot2, we need to tidy this dataset and change it into a long format. We are going to use the tidyr package from tidyverse for this task.

Step 1: Separate the quintile categories from other labels

In the Row_Names column, we see a mix of the quintiles and labels like Household Disposable Income, etc. Let’s separate them and clean the columns names:

income_data <- income_data %>% 
  mutate(categories = case_when(
    Row_Names %in% c("Lowest income quintile", "Second income quintile",
                     "Third income quintile", "Fourth income quintile",
                     "Highest income quintile") ~ NA_character_,
    TRUE ~ Row_Names
  )) %>% 
  fill(categories, .direction = "down") %>% 
  filter(Row_Names %in% c("Lowest income quintile", "Second income quintile",
                     "Third income quintile", "Fourth income quintile",
                     "Highest income quintile")) %>% 
  relocate(categories) %>% 
  clean_names()


income_data  %>% kbl() %>% kable_paper("hover", full_width = F)
categories row_names x2019 quarter_1_2020 quarter_2_2020 quarter_3_2020
Household disposable income Lowest income quintile 25,430 6,129 8,187 8,386
Household disposable income Second income quintile 52,011 12,847 15,919 15,798
Household disposable income Third income quintile 71,153 17,648 20,655 21,109
Household disposable income Fourth income quintile 95,200 23,188 26,178 27,392
Household disposable income Highest income quintile 162,963 39,979 42,837 45,198
Compensation of employees Lowest income quintile 10,980 2,644 2,338 2,596
Compensation of employees Second income quintile 33,699 8,228 7,366 8,167
Compensation of employees Third income quintile 62,253 15,418 14,059 15,148
Compensation of employees Fourth income quintile 100,049 24,870 23,101 24,869
Compensation of employees Highest income quintile 167,190 42,069 39,652 41,994
Net mixed income Lowest income quintile 4,509 1,026 768 1,646
Net mixed income Second income quintile 9,204 2,164 2,022 2,805
Net mixed income Third income quintile 11,613 2,732 2,494 3,636
Net mixed income Fourth income quintile 15,021 3,582 3,437 4,419
Net mixed income Highest income quintile 25,057 6,056 5,869 7,112
Current transfers received Lowest income quintile 19,106 4,969 7,234 6,252
Current transfers received Second income quintile 24,061 6,449 10,102 8,182
Current transfers received Third income quintile 24,142 6,611 10,646 8,355
Current transfers received Fourth income quintile 22,101 5,904 10,015 7,686
Current transfers received Highest income quintile 22,331 6,480 9,937 8,323
Household final consumption expenditure Lowest income quintile 52,733 15,050 13,932 15,660
Household final consumption expenditure Second income quintile 63,448 15,460 14,580 16,415
Household final consumption expenditure Third income quintile 76,738 18,285 17,095 19,484
Household final consumption expenditure Fourth income quintile 92,381 21,124 19,531 22,436
Household final consumption expenditure Highest income quintile 129,915 27,023 24,377 28,446

Step 2: Convert the character data into numeric values

The dollar values in each column are stored as character data (notice the commas). We use readr::parse_number() to convert them into numbers:

income_data <- income_data %>% 
  mutate(across(.cols = c(x2019, starts_with("quarter")), .fns = parse_number))

income_data  %>% kbl() %>% kable_paper("hover", full_width = F)
categories row_names x2019 quarter_1_2020 quarter_2_2020 quarter_3_2020
Household disposable income Lowest income quintile 25430 6129 8187 8386
Household disposable income Second income quintile 52011 12847 15919 15798
Household disposable income Third income quintile 71153 17648 20655 21109
Household disposable income Fourth income quintile 95200 23188 26178 27392
Household disposable income Highest income quintile 162963 39979 42837 45198
Compensation of employees Lowest income quintile 10980 2644 2338 2596
Compensation of employees Second income quintile 33699 8228 7366 8167
Compensation of employees Third income quintile 62253 15418 14059 15148
Compensation of employees Fourth income quintile 100049 24870 23101 24869
Compensation of employees Highest income quintile 167190 42069 39652 41994
Net mixed income Lowest income quintile 4509 1026 768 1646
Net mixed income Second income quintile 9204 2164 2022 2805
Net mixed income Third income quintile 11613 2732 2494 3636
Net mixed income Fourth income quintile 15021 3582 3437 4419
Net mixed income Highest income quintile 25057 6056 5869 7112
Current transfers received Lowest income quintile 19106 4969 7234 6252
Current transfers received Second income quintile 24061 6449 10102 8182
Current transfers received Third income quintile 24142 6611 10646 8355
Current transfers received Fourth income quintile 22101 5904 10015 7686
Current transfers received Highest income quintile 22331 6480 9937 8323
Household final consumption expenditure Lowest income quintile 52733 15050 13932 15660
Household final consumption expenditure Second income quintile 63448 15460 14580 16415
Household final consumption expenditure Third income quintile 76738 18285 17095 19484
Household final consumption expenditure Fourth income quintile 92381 21124 19531 22436
Household final consumption expenditure Highest income quintile 129915 27023 24377 28446

Step 3: Pivot longer

We now pivot the table from the wide to long format using the tidyr::pivot_longer():

income_data <- income_data %>% 
  rename(`2019` = x2019) %>% 
  pivot_longer(
    cols = c(`2019`, quarter_1_2020, quarter_2_2020, quarter_3_2020), 
    names_to = "quarter", 
    values_to = "dollars"
  ) 

income_data %>% head() %>% kbl() %>% kable_paper("hover", full_width = F)
categories row_names quarter dollars
Household disposable income Lowest income quintile 2019 25430
Household disposable income Lowest income quintile quarter_1_2020 6129
Household disposable income Lowest income quintile quarter_2_2020 8187
Household disposable income Lowest income quintile quarter_3_2020 8386
Household disposable income Second income quintile 2019 52011
Household disposable income Second income quintile quarter_1_2020 12847

Step 4: Pivot wider

We know that the dollar values correspond to the categories column. So, we can put those values in individual category columns. We also clean the column names and put the quarter column at the beginning of the dataset:

income_data <- income_data %>% 
  pivot_wider(names_from = categories, 
              values_from = dollars) %>% 
  clean_names() %>% 
  relocate(quarter)

income_data %>% head() %>% kbl() %>% kable_paper("hover", full_width = F)
quarter row_names household_disposable_income compensation_of_employees net_mixed_income current_transfers_received household_final_consumption_expenditure
2019 Lowest income quintile 25430 10980 4509 19106 52733
quarter_1_2020 Lowest income quintile 6129 2644 1026 4969 15050
quarter_2_2020 Lowest income quintile 8187 2338 768 7234 13932
quarter_3_2020 Lowest income quintile 8386 2596 1646 6252 15660
2019 Second income quintile 52011 33699 9204 24061 63448
quarter_1_2020 Second income quintile 12847 8228 2164 6449 15460

Step 5: Break the quarter column and make the quintile column a factor vector

Since the quarter column contains both the year and quarter, we can use the tidyr::separate() to break it into two columns.

income_data <- income_data %>% 
  separate(col = quarter,
           into = c("year", "quarter"), sep = "_") 

income_data %>% head() %>% kbl() %>% kable_paper("hover", full_width = F)
year quarter row_names household_disposable_income compensation_of_employees net_mixed_income current_transfers_received household_final_consumption_expenditure
2019 NA Lowest income quintile 25430 10980 4509 19106 52733
quarter 1 Lowest income quintile 6129 2644 1026 4969 15050
quarter 2 Lowest income quintile 8187 2338 768 7234 13932
quarter 3 Lowest income quintile 8386 2596 1646 6252 15660
2019 NA Second income quintile 52011 33699 9204 24061 63448
quarter 1 Second income quintile 12847 8228 2164 6449 15460
income_data <- income_data %>% 
  mutate(year = case_when(
    year == "quarter" ~ 2020,
    TRUE ~ as.numeric(year)
  )) %>% 
  rename(quintile = row_names) %>% 
  mutate(quintile = factor(quintile,
                           levels = c("Lowest income quintile", "Second income quintile",
                     "Third income quintile", "Fourth income quintile",
                     "Highest income quintile")))

income_data %>% head() %>% kbl() %>% kable_paper("hover", full_width = F)
year quarter quintile household_disposable_income compensation_of_employees net_mixed_income current_transfers_received household_final_consumption_expenditure
2019 NA Lowest income quintile 25430 10980 4509 19106 52733
2020 1 Lowest income quintile 6129 2644 1026 4969 15050
2020 2 Lowest income quintile 8187 2338 768 7234 13932
2020 3 Lowest income quintile 8386 2596 1646 6252 15660
2019 NA Second income quintile 52011 33699 9204 24061 63448
2020 1 Second income quintile 12847 8228 2164 6449 15460

Finally, we create a new column to keep both the year and quarter info in one column for the purpose of labelling in visualizations:

income_data <- income_data %>% 
  mutate(q_year = case_when(
    !is.na(quarter) ~ paste0("Q ", quarter, " ", year),
    is.na(quarter) ~ as.character(year)
  )) %>% 
  relocate(q_year)

income_data %>% head() %>% kbl() %>% kable_paper("hover", full_width = F)
q_year year quarter quintile household_disposable_income compensation_of_employees net_mixed_income current_transfers_received household_final_consumption_expenditure
2019 2019 NA Lowest income quintile 25430 10980 4509 19106 52733
Q 1 2020 2020 1 Lowest income quintile 6129 2644 1026 4969 15050
Q 2 2020 2020 2 Lowest income quintile 8187 2338 768 7234 13932
Q 3 2020 2020 3 Lowest income quintile 8386 2596 1646 6252 15660
2019 2019 NA Second income quintile 52011 33699 9204 24061 63448
Q 1 2020 2020 1 Second income quintile 12847 8228 2164 6449 15460

Task 1: Recreate the following plot using the income_data

Hints:

Task 2: Recreate the following plot using the income_data

Hints: - ggrepel package (you need to install it). Use geom_text_repel() instead of geom_text()

Task 3: Determine the change in the income in 2020:

According to the Statistics Canada article:

Over the first three quarters of 2020, disposable income for the lowest-income households increased 36.8%, more than for any other households.

Using the income_data and the formula for percentage change (below) determine the percent increase in the income of lowest-income quintile. This number must be 36.8% as above. Show your code.

\[ \mbox{% change} = \frac{\mbox{ final value} - \mbox{ initial value}}{\mbox{ initial value}} \times 100 \]

Hint:

Task 4: Get the data by age group

Using the steps above, get the table from this page of Statistics Canada in R. This contains the age group instead of population quintiles. You’ll need to use the datapasta addin in RStudio to do this step. Your code must show all the steps in one code chunk.

q_year year quarter age_group household_disposable_income compensation_of_employees net_mixed_income current_transfers_received household_final_consumption_expenditure
2019 2019 NA Younger than 35 years 75337 85308 9744 11584 75494
Q 1 2020 2020 1 Younger than 35 years 18259 20990 2245 3475 17546
Q 2 2020 2020 2 Younger than 35 years 21934 19340 2141 8169 16089
Q 3 2020 2020 3 Younger than 35 years 21953 20993 2773 5362 18674
2019 2019 NA 35 to 44 years 93641 109778 14820 14307 92286
Q 1 2020 2020 1 35 to 44 years 22859 27546 3514 4006 21532

Task 5: Recreate the following plot for income_data

The following shows the increases in household disposable incomes in the second and third quarters of 2020. The main lesson here is that COVID-19 support income from the government of Canada helped Canadians, particularly the youngest age group as shown below.

Modify the following code for income quintiles (instead of age groups) using the income_data.

age_data %>% 
  filter(year == 2020) %>% 
  select(q_year:household_disposable_income) %>% 
  group_by(age_group) %>% 
  mutate(change_in_income = c(NA, diff(household_disposable_income)),
         percent_change_in_income = ((change_in_income/lag(household_disposable_income))*100) %>% round(1)) %>% 
  ungroup() %>% 
  filter(q_year != "Q 1 2020") %>% 
  ggplot(aes(x = age_group, y = percent_change_in_income, fill = q_year)) +
  geom_col(position = position_dodge()) +
  geom_label(aes(label = paste0(percent_change_in_income, "%"))) +
  coord_flip() +
  labs(x=NULL,
       y=NULL)

Which quintile group had the largest increase increase in Q 2 in 2020?

Task 6: Load and Visualize comments about Universal Basic Income on Reddit

Load Data

The reddit thread you want to get data from is here.

library(RedditExtractoR)

rcoms <- reddit_content(URL = "https://old.reddit.com/r/canada/comments/l6lubc/3_in_5_canadians_support_universal_basic_income/")

  |                                                                  
  |                                                            |   0%
  |                                                                  
  |============================================================| 100%

Select a few columns and keep the top 10 comments

Select a few interesting columns and then keep the top 10 comments by the comment_score.

rcoms <- rcoms %>% 
  select(id, structure, comm_date, comment_score,
         controversiality, comment) %>% 
  as_tibble() %>% 
  mutate(num_of_underscores = str_count(structure, "_")) %>% 
  filter(num_of_underscores == 0) %>% 
  slice_max(order_by = comment_score, n = 30)

rcoms %>% head() %>% kbl() %>% kable_paper("hover", full_width = F)
id structure comm_date comment_score controversiality comment num_of_underscores
64 2 28-01-21 1505 0 I believe the idea is to wipe out all other government support programs and only have this one universal basic income. It would stop people from double dipping and get rid of a lot of redundancy in government. No more welfare, ei, disability,- just one program to rule them all. 0
1 1 28-01-21 797 0 for context the currently yearly income for disability is just over 12K. 0
269 10 28-01-21 691 0 2 in 5 Canadians think 3 in 5 Canadians don’t have a clue about economics but 5 in 5 Canadians think this poll sucks goose shit. 0
148 3 28-01-21 459 0 So everyone gets 30k right? 0
230 7 28-01-21 319 0 We wouldnt need to have a basic income of Neoliberalism didnt suppress real wage growth for the last 40+ years. Sadly this horrible economic system that systematically stripped the wealth from the working class has created a need for UBI. Just a random though here maybe, just maybe employers could pay good benefits, actual living wages with annual real pay increases and actual pension plans. Tax corporations the same tax rates as natural persons. Maybe we wouldnt need a UBI 0
256 9 28-01-21 161 0

Unless you change the systems of ownership where the money goes after a purchase is made all this is going to do is simply scale up the problem

What changes if all of that money goes to landlords and amazon?

What makes you think you will be able to afford a house when everyone else can increase their bid by the same amount that you can now? In fact if you are single you actually just lost ground against married couples
0

Tokenize the comments

Break the comment paragraphs into individual words and remove common words aka stopwords

library(tidytext)

rcoms_tokens <- rcoms %>%
  unnest_tokens(word, comment) %>% 
  anti_join(stop_words)

rcoms_tokens %>% 
  head() %>% 
  kbl() %>% 
  kable_paper("hover", full_width = F)
id structure comm_date comment_score controversiality num_of_underscores word
64 2 28-01-21 1505 0 0 idea
64 2 28-01-21 1505 0 0 wipe
64 2 28-01-21 1505 0 0 government
64 2 28-01-21 1505 0 0 support
64 2 28-01-21 1505 0 0 programs
64 2 28-01-21 1505 0 0 universal

Remove Profanity and add sentiments

Remove the profane words by filtering badwords (obtained online). Also, add the positive and negative sentiments in a new column as expressed in the bing dictionary. You can read more about other dictionaries here

# Source: https://rpubs.com/Nikotino/58395
badwords <- readLines("http://www.cs.cmu.edu/~biglou/resources/bad-words.txt")

rcoms_tokens <- rcoms_tokens %>% 
  filter(!word %in% badwords) %>%
  inner_join(get_sentiments("bing"))

rcoms_tokens %>% 
  head() %>% 
  kbl() %>% 
  kable_paper("hover", full_width = F)
id structure comm_date comment_score controversiality num_of_underscores word sentiment
64 2 28-01-21 1505 0 0 support positive
64 2 28-01-21 1505 0 0 redundancy negative
269 10 28-01-21 691 0 0 sucks negative
230 7 28-01-21 319 0 0 suppress negative
230 7 28-01-21 319 0 0 sadly negative
230 7 28-01-21 319 0 0 horrible negative

Visualize the sentiments

Modify the code below to create a word cloud. You need to provide x and y aesthetics if you use geom_text. You can generate x and y from the rnorm function.

However, you are free to use other libraries to generate word clouds. See some examples here

rcoms_tokens  %>%
  count(word, sentiment, sort = TRUE) %>%
  group_by(sentiment) %>%
  top_n(10) %>%
  ungroup() %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(n, word, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(x = "Contribution to sentiment",
       y = NULL)

Submission Instructions

I am providing you a separate Rmd file named as template_for_assignment3_submission.Rmd. You need to put all your code in it, put your name, student number and section number at the beginning and then change the name of the file as Assignment_3_[Your_full_Name]. Knit this file and submit the HTML result in the link provided on Blackboard.