Assignment 3: Universal Basic Income

A Case for Universal Basic Income in Canada

Watch a video on the topic of universal basic income in canada here. Think about the possible benefits and potential challenges in the implementation of this measure.

Install Required Libraries

You need to install the following packages for completing this assignment. Please run the following code in your console to install these packages.

install.packages("datapasta")
install.packages("janitor")
install.packages("kableExtra")
install.packages("RedditExtractoR")
install.packages("tidytext")

Loading Libraries

library(tidyverse)
library(here)
library(janitor)
library(kableExtra)
library(tidytext)

Getting Data

In this assignment, we are going to use the experimental estimates of household income well-being during the COVID-19 pandemic as released by Statistics Canada. The data tables from Statistics Canada contain information about various income and savings aspects of the population. The population is divided into quintiles as defined below:

A quintile is a statistical value of a data set that represents 20% of a given population, so the first quintile represents the lowest fifth of the data (1% to 20%); the second quintile represents the second fifth (21% to 40%) and so on

Source

We can extract the data from the Statistics Canada page using the datapasta package that you have just installed. The following animation shows how to use the datapasta package to copy and paste a table from web into R:

Try it on your own.

The result is shown below. I have named this new dataframe as income_data.

income_data <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
Row_Names = c(NA,"Household disposable income","Lowest income quintile",
         "Second income quintile","Third income quintile",
         "Fourth income quintile","Highest income quintile","Compensation of employees",
         "Lowest income quintile","Second income quintile","Third income quintile",
         "Fourth income quintile","Highest income quintile","Net mixed income",
         "Lowest income quintile","Second income quintile",
         "Third income quintile","Fourth income quintile","Highest income quintile",
         "Current transfers received","Lowest income quintile","Second income quintile",
         "Third income quintile","Fourth income quintile",
         "Highest income quintile","Household final consumption expenditure",
         "Lowest income quintile","Second income quintile","Third income quintile",
         "Fourth income quintile","Highest income quintile"),
`2019` = c("dollars","81,351","25,430","52,011","71,153",
             "95,200","162,963","74,834","10,980","33,699","62,253",
             "100,049","167,190","13,081","4,509","9,204","11,613","15,021",
             "25,057","22,348","19,106","24,061","24,142","22,101","22,331",
             "83,043","52,733","63,448","76,738","92,381","129,915"),
Quarter.1.2020 = c("dollars","19,958","6,129",
                     "12,847","17,648","23,188","39,979","18,646","2,644",
                     "8,228","15,418","24,870","42,069","3,112","1,026","2,164",
                     "2,732","3,582","6,056","6,083","4,969","6,449",
                     "6,611","5,904","6,480","19,388","15,050","15,460",
                     "18,285","21,124","27,023"),
Quarter.2.2020 = c("dollars","22,755","8,187",
                     "15,919","20,655","26,178","42,837","17,303","2,338",
                     "7,366","14,059","23,101","39,652","2,918","768","2,022",
                     "2,494","3,437","5,869","9,587","7,234","10,102",
                     "10,646","10,015","9,937","17,903","13,932","14,580",
                     "17,095","19,531","24,377"),
Quarter.3.2020 = c("dollars","23,576","8,386",
                     "15,798","21,109","27,392","45,198","18,555","2,596",
                     "8,167","15,148","24,869","41,994","3,924","1,646","2,805",
                     "3,636","4,419","7,112","7,759","6,252","8,182",
                     "8,355","7,686","8,323","20,488","15,660","16,415",
                     "19,484","22,436","28,446"),
`Q1.2020/2019` = c("% difference","-1.9","-3.6",
                     "-1.2","-0.8","-2.6","-1.9","-0.3","-3.7","-2.3","-0.9",
                     "-0.6","0.6","-4.8","-9.0","-6.0","-5.9","-4.6",
                     "-3.3","8.9","4.0","7.2","9.5","6.9","16.1","-6.6",
                     "14.2","-2.5","-4.7","-8.5","-16.8"),
`Q2.2020/Q1.2020` = c("% difference","14.0",
                        "33.6","23.9","17.0","12.9","7.1","-7.2","-11.6",
                        "-10.5","-8.8","-7.1","-5.7","-6.2","-25.1","-6.6",
                        "-8.7","-4.0","-3.1","57.6","45.6","56.6","61.0",
                        "69.6","53.3","-7.7","-7.4","-5.7","-6.5","-7.5",
                        "-9.8"),
`Q3.2020/Q2.2020` = c("% difference","3.6","2.4",
                        "-0.8","2.2","4.6","5.5","7.2","11.0","10.9",
                        "7.7","7.7","5.9","34.5","114.3","38.7","45.8",
                        "28.6","21.2","-19.1","-13.6","-19.0","-21.5","-23.3",
                        "-16.2","14.4","12.4","12.6","14.0","14.9","16.7")
) %>% 
  slice(-1) %>% 
  select(-c(6:8)) 

income_data  %>% kbl() %>% kable_paper("hover", full_width = F)

Row_Names	2019	Quarter.1.2020	Quarter.2.2020	Quarter.3.2020
Household disposable income	81,351	19,958	22,755	23,576
Lowest income quintile	25,430	6,129	8,187	8,386
Second income quintile	52,011	12,847	15,919	15,798
Third income quintile	71,153	17,648	20,655	21,109
Fourth income quintile	95,200	23,188	26,178	27,392
Highest income quintile	162,963	39,979	42,837	45,198
Compensation of employees	74,834	18,646	17,303	18,555
Lowest income quintile	10,980	2,644	2,338	2,596
Second income quintile	33,699	8,228	7,366	8,167
Third income quintile	62,253	15,418	14,059	15,148
Fourth income quintile	100,049	24,870	23,101	24,869
Highest income quintile	167,190	42,069	39,652	41,994
Net mixed income	13,081	3,112	2,918	3,924
Lowest income quintile	4,509	1,026	768	1,646
Second income quintile	9,204	2,164	2,022	2,805
Third income quintile	11,613	2,732	2,494	3,636
Fourth income quintile	15,021	3,582	3,437	4,419
Highest income quintile	25,057	6,056	5,869	7,112
Current transfers received	22,348	6,083	9,587	7,759
Lowest income quintile	19,106	4,969	7,234	6,252
Second income quintile	24,061	6,449	10,102	8,182
Third income quintile	24,142	6,611	10,646	8,355
Fourth income quintile	22,101	5,904	10,015	7,686
Highest income quintile	22,331	6,480	9,937	8,323
Household final consumption expenditure	83,043	19,388	17,903	20,488
Lowest income quintile	52,733	15,050	13,932	15,660
Second income quintile	63,448	15,460	14,580	16,415
Third income quintile	76,738	18,285	17,095	19,484
Fourth income quintile	92,381	21,124	19,531	22,436
Highest income quintile	129,915	27,023	24,377	28,446

Understanding the Variables in `Row_Names`

The following table shows the meanings of the variables:

Variable	Description	Source
Household Disposable Income	The amount of money that a household has to spend or save after income taxes have been deducted	Investopedia
Compensation of Employees	The total gross (pre-tax) wages paid by employers to employees for work done in an accounting period, such as a quarter or a year	Wikipedia
Net Mixed Income	Mixed income refers to the net income of both farm and non-farm unincorporated businesses, representing income that is generated by these types of businesses in the production of goods and services	Statistics Canada
Current Transfers Received	Government transfers include all transfer payments from federal and provincial governments programs intended to provide income support to certain groups, such as seniors, families and those injured in the workplace	Statistics Canada
Household Final Consumption Expenditure	All types of expenditure that a household does. See the source for more info.	Statistics Canada

Tidying Data

In the current format the income_data is human readable. This current format is the wide data format. But for data visualization with ggplot2, we need to tidy this dataset and change it into a long format. We are going to use the tidyr package from tidyverse for this task.

Step 1: Separate the quintile categories from other labels

In the Row_Names column, we see a mix of the quintiles and labels like Household Disposable Income, etc. Let’s separate them and clean the columns names:

income_data <- income_data %>% 
  mutate(categories = case_when(
    Row_Names %in% c("Lowest income quintile", "Second income quintile",
                     "Third income quintile", "Fourth income quintile",
                     "Highest income quintile") ~ NA_character_,
    TRUE ~ Row_Names
  )) %>% 
  fill(categories, .direction = "down") %>% 
  filter(Row_Names %in% c("Lowest income quintile", "Second income quintile",
                     "Third income quintile", "Fourth income quintile",
                     "Highest income quintile")) %>% 
  relocate(categories) %>% 
  clean_names()


income_data  %>% kbl() %>% kable_paper("hover", full_width = F)

categories	row_names	x2019	quarter_1_2020	quarter_2_2020	quarter_3_2020
Household disposable income	Lowest income quintile	25,430	6,129	8,187	8,386
Household disposable income	Second income quintile	52,011	12,847	15,919	15,798
Household disposable income	Third income quintile	71,153	17,648	20,655	21,109
Household disposable income	Fourth income quintile	95,200	23,188	26,178	27,392
Household disposable income	Highest income quintile	162,963	39,979	42,837	45,198
Compensation of employees	Lowest income quintile	10,980	2,644	2,338	2,596
Compensation of employees	Second income quintile	33,699	8,228	7,366	8,167
Compensation of employees	Third income quintile	62,253	15,418	14,059	15,148
Compensation of employees	Fourth income quintile	100,049	24,870	23,101	24,869
Compensation of employees	Highest income quintile	167,190	42,069	39,652	41,994
Net mixed income	Lowest income quintile	4,509	1,026	768	1,646
Net mixed income	Second income quintile	9,204	2,164	2,022	2,805
Net mixed income	Third income quintile	11,613	2,732	2,494	3,636
Net mixed income	Fourth income quintile	15,021	3,582	3,437	4,419
Net mixed income	Highest income quintile	25,057	6,056	5,869	7,112
Current transfers received	Lowest income quintile	19,106	4,969	7,234	6,252
Current transfers received	Second income quintile	24,061	6,449	10,102	8,182
Current transfers received	Third income quintile	24,142	6,611	10,646	8,355
Current transfers received	Fourth income quintile	22,101	5,904	10,015	7,686
Current transfers received	Highest income quintile	22,331	6,480	9,937	8,323
Household final consumption expenditure	Lowest income quintile	52,733	15,050	13,932	15,660
Household final consumption expenditure	Second income quintile	63,448	15,460	14,580	16,415
Household final consumption expenditure	Third income quintile	76,738	18,285	17,095	19,484
Household final consumption expenditure	Fourth income quintile	92,381	21,124	19,531	22,436
Household final consumption expenditure	Highest income quintile	129,915	27,023	24,377	28,446

Step 2: Convert the character data into numeric values

The dollar values in each column are stored as character data (notice the commas). We use readr::parse_number() to convert them into numbers:

income_data <- income_data %>% 
  mutate(across(.cols = c(x2019, starts_with("quarter")), .fns = parse_number))

income_data  %>% kbl() %>% kable_paper("hover", full_width = F)

categories	row_names	x2019	quarter_1_2020	quarter_2_2020	quarter_3_2020
Household disposable income	Lowest income quintile	25430	6129	8187	8386
Household disposable income	Second income quintile	52011	12847	15919	15798
Household disposable income	Third income quintile	71153	17648	20655	21109
Household disposable income	Fourth income quintile	95200	23188	26178	27392
Household disposable income	Highest income quintile	162963	39979	42837	45198
Compensation of employees	Lowest income quintile	10980	2644	2338	2596
Compensation of employees	Second income quintile	33699	8228	7366	8167
Compensation of employees	Third income quintile	62253	15418	14059	15148
Compensation of employees	Fourth income quintile	100049	24870	23101	24869
Compensation of employees	Highest income quintile	167190	42069	39652	41994
Net mixed income	Lowest income quintile	4509	1026	768	1646
Net mixed income	Second income quintile	9204	2164	2022	2805
Net mixed income	Third income quintile	11613	2732	2494	3636
Net mixed income	Fourth income quintile	15021	3582	3437	4419
Net mixed income	Highest income quintile	25057	6056	5869	7112
Current transfers received	Lowest income quintile	19106	4969	7234	6252
Current transfers received	Second income quintile	24061	6449	10102	8182
Current transfers received	Third income quintile	24142	6611	10646	8355
Current transfers received	Fourth income quintile	22101	5904	10015	7686
Current transfers received	Highest income quintile	22331	6480	9937	8323
Household final consumption expenditure	Lowest income quintile	52733	15050	13932	15660
Household final consumption expenditure	Second income quintile	63448	15460	14580	16415
Household final consumption expenditure	Third income quintile	76738	18285	17095	19484
Household final consumption expenditure	Fourth income quintile	92381	21124	19531	22436
Household final consumption expenditure	Highest income quintile	129915	27023	24377	28446

Step 3: Pivot longer

We now pivot the table from the wide to long format using the tidyr::pivot_longer():

income_data <- income_data %>% 
  rename(`2019` = x2019) %>% 
  pivot_longer(
    cols = c(`2019`, quarter_1_2020, quarter_2_2020, quarter_3_2020), 
    names_to = "quarter", 
    values_to = "dollars"
  ) 

income_data %>% head() %>% kbl() %>% kable_paper("hover", full_width = F)

categories	row_names	quarter	dollars
Household disposable income	Lowest income quintile	2019	25430
Household disposable income	Lowest income quintile	quarter_1_2020	6129
Household disposable income	Lowest income quintile	quarter_2_2020	8187
Household disposable income	Lowest income quintile	quarter_3_2020	8386
Household disposable income	Second income quintile	2019	52011
Household disposable income	Second income quintile	quarter_1_2020	12847

Step 4: Pivot wider

We know that the dollar values correspond to the categories column. So, we can put those values in individual category columns. We also clean the column names and put the quarter column at the beginning of the dataset:

income_data <- income_data %>% 
  pivot_wider(names_from = categories, 
              values_from = dollars) %>% 
  clean_names() %>% 
  relocate(quarter)

income_data %>% head() %>% kbl() %>% kable_paper("hover", full_width = F)

quarter	row_names	household_disposable_income	compensation_of_employees	net_mixed_income	current_transfers_received	household_final_consumption_expenditure
2019	Lowest income quintile	25430	10980	4509	19106	52733
quarter_1_2020	Lowest income quintile	6129	2644	1026	4969	15050
quarter_2_2020	Lowest income quintile	8187	2338	768	7234	13932
quarter_3_2020	Lowest income quintile	8386	2596	1646	6252	15660
2019	Second income quintile	52011	33699	9204	24061	63448
quarter_1_2020	Second income quintile	12847	8228	2164	6449	15460

Step 5: Break the `quarter` column and make the quintile column a factor vector

Since the quarter column contains both the year and quarter, we can use the tidyr::separate() to break it into two columns.

income_data <- income_data %>% 
  separate(col = quarter,
           into = c("year", "quarter"), sep = "_") 

income_data %>% head() %>% kbl() %>% kable_paper("hover", full_width = F)

year	quarter	row_names	household_disposable_income	compensation_of_employees	net_mixed_income	current_transfers_received	household_final_consumption_expenditure
2019	NA	Lowest income quintile	25430	10980	4509	19106	52733
quarter	1	Lowest income quintile	6129	2644	1026	4969	15050
quarter	2	Lowest income quintile	8187	2338	768	7234	13932
quarter	3	Lowest income quintile	8386	2596	1646	6252	15660
2019	NA	Second income quintile	52011	33699	9204	24061	63448
quarter	1	Second income quintile	12847	8228	2164	6449	15460

income_data <- income_data %>% 
  mutate(year = case_when(
    year == "quarter" ~ 2020,
    TRUE ~ as.numeric(year)
  )) %>% 
  rename(quintile = row_names) %>% 
  mutate(quintile = factor(quintile,
                           levels = c("Lowest income quintile", "Second income quintile",
                     "Third income quintile", "Fourth income quintile",
                     "Highest income quintile")))

income_data %>% head() %>% kbl() %>% kable_paper("hover", full_width = F)

year	quarter	quintile	household_disposable_income	compensation_of_employees	net_mixed_income	current_transfers_received	household_final_consumption_expenditure
2019	NA	Lowest income quintile	25430	10980	4509	19106	52733
2020	1	Lowest income quintile	6129	2644	1026	4969	15050
2020	2	Lowest income quintile	8187	2338	768	7234	13932
2020	3	Lowest income quintile	8386	2596	1646	6252	15660
2019	NA	Second income quintile	52011	33699	9204	24061	63448
2020	1	Second income quintile	12847	8228	2164	6449	15460

Finally, we create a new column to keep both the year and quarter info in one column for the purpose of labelling in visualizations:

income_data <- income_data %>% 
  mutate(q_year = case_when(
    !is.na(quarter) ~ paste0("Q ", quarter, " ", year),
    is.na(quarter) ~ as.character(year)
  )) %>% 
  relocate(q_year)

income_data %>% head() %>% kbl() %>% kable_paper("hover", full_width = F)

q_year	year	quarter	quintile	household_disposable_income	compensation_of_employees	net_mixed_income	current_transfers_received	household_final_consumption_expenditure
2019	2019	NA	Lowest income quintile	25430	10980	4509	19106	52733
Q 1 2020	2020	1	Lowest income quintile	6129	2644	1026	4969	15050
Q 2 2020	2020	2	Lowest income quintile	8187	2338	768	7234	13932
Q 3 2020	2020	3	Lowest income quintile	8386	2596	1646	6252	15660
2019	2019	NA	Second income quintile	52011	33699	9204	24061	63448
Q 1 2020	2020	1	Second income quintile	12847	8228	2164	6449	15460

Task 1: Recreate the following plot using the `income_data`

Hints:

theme() with axis.text and axis.ticks options
coord_flip()
geom_text()

Task 2: Recreate the following plot using the `income_data`

Hints: - ggrepel package (you need to install it). Use geom_text_repel() instead of geom_text()

Task 3: Determine the change in the income in 2020:

According to the Statistics Canada article:

Over the first three quarters of 2020, disposable income for the lowest-income households increased 36.8%, more than for any other households.

Using the income_data and the formula for percentage change (below) determine the percent increase in the income of lowest-income quintile. This number must be 36.8% as above. Show your code.

\[ \mbox{% change} = \frac{\mbox{ final value} - \mbox{ initial value}}{\mbox{ initial value}} \times 100 \]

Hint:

Use the dplyr::pull() function to extract a value

Task 4: Get the data by age group

Using the steps above, get the table from this page of Statistics Canada in R. This contains the age group instead of population quintiles. You’ll need to use the datapasta addin in RStudio to do this step. Your code must show all the steps in one code chunk.

q_year	year	quarter	age_group	household_disposable_income	compensation_of_employees	net_mixed_income	current_transfers_received	household_final_consumption_expenditure
2019	2019	NA	Younger than 35 years	75337	85308	9744	11584	75494
Q 1 2020	2020	1	Younger than 35 years	18259	20990	2245	3475	17546
Q 2 2020	2020	2	Younger than 35 years	21934	19340	2141	8169	16089
Q 3 2020	2020	3	Younger than 35 years	21953	20993	2773	5362	18674
2019	2019	NA	35 to 44 years	93641	109778	14820	14307	92286
Q 1 2020	2020	1	35 to 44 years	22859	27546	3514	4006	21532

Task 5: Recreate the following plot for `income_data`

The following shows the increases in household disposable incomes in the second and third quarters of 2020. The main lesson here is that COVID-19 support income from the government of Canada helped Canadians, particularly the youngest age group as shown below.

Modify the following code for income quintiles (instead of age groups) using the income_data.

age_data %>% 
  filter(year == 2020) %>% 
  select(q_year:household_disposable_income) %>% 
  group_by(age_group) %>% 
  mutate(change_in_income = c(NA, diff(household_disposable_income)),
         percent_change_in_income = ((change_in_income/lag(household_disposable_income))*100) %>% round(1)) %>% 
  ungroup() %>% 
  filter(q_year != "Q 1 2020") %>% 
  ggplot(aes(x = age_group, y = percent_change_in_income, fill = q_year)) +
  geom_col(position = position_dodge()) +
  geom_label(aes(label = paste0(percent_change_in_income, "%"))) +
  coord_flip() +
  labs(x=NULL,
       y=NULL)

Which quintile group had the largest increase increase in Q 2 in 2020?

Task 6: Load and Visualize comments about Universal Basic Income on Reddit

Load Data

The reddit thread you want to get data from is here.

library(RedditExtractoR)

rcoms <- reddit_content(URL = "https://old.reddit.com/r/canada/comments/l6lubc/3_in_5_canadians_support_universal_basic_income/")


  |                                                                  
  |                                                            |   0%
  |                                                                  
  |============================================================| 100%

Select a few columns and keep the top 10 comments

Select a few interesting columns and then keep the top 10 comments by the comment_score.

rcoms <- rcoms %>% 
  select(id, structure, comm_date, comment_score,
         controversiality, comment) %>% 
  as_tibble() %>% 
  mutate(num_of_underscores = str_count(structure, "_")) %>% 
  filter(num_of_underscores == 0) %>% 
  slice_max(order_by = comment_score, n = 30)

rcoms %>% head() %>% kbl() %>% kable_paper("hover", full_width = F)

id	structure	comm_date	comment_score	comment
64	2	28-01-21	1505	I believe the idea is to wipe out all other government support programs and only have this one universal basic income. It would stop people from double dipping and get rid of a lot of redundancy in government. No more welfare, ei, disability,- just one program to rule them all.
1	1	28-01-21	797	for context the currently yearly income for disability is just over 12K.
269	10	28-01-21	691	2 in 5 Canadians think 3 in 5 Canadians don’t have a clue about economics but 5 in 5 Canadians think this poll sucks goose shit.
148	3	28-01-21	459	So everyone gets 30k right?
230	7	28-01-21	319	We wouldnt need to have a basic income of Neoliberalism didnt suppress real wage growth for the last 40+ years. Sadly this horrible economic system that systematically stripped the wealth from the working class has created a need for UBI. Just a random though here maybe, just maybe employers could pay good benefits, actual living wages with annual real pay increases and actual pension plans. Tax corporations the same tax rates as natural persons. Maybe we wouldnt need a UBI
256	9	28-01-21	161	Unless you change the systems of ownership where the money goes after a purchase is made all this is going to do is simply scale up the problem What changes if all of that money goes to landlords and amazon? What makes you think you will be able to afford a house when everyone else can increase their bid by the same amount that you can now? In fact if you are single you actually just lost ground against married couples

Tokenize the comments

Break the comment paragraphs into individual words and remove common words aka stopwords

library(tidytext)

rcoms_tokens <- rcoms %>%
  unnest_tokens(word, comment) %>% 
  anti_join(stop_words)

rcoms_tokens %>% 
  head() %>% 
  kbl() %>% 
  kable_paper("hover", full_width = F)

id	structure	comm_date	comment_score	word
64	2	28-01-21	1505	idea
64	2	28-01-21	1505	wipe
64	2	28-01-21	1505	government
64	2	28-01-21	1505	support
64	2	28-01-21	1505	programs
64	2	28-01-21	1505	universal

Remove Profanity and add sentiments

Remove the profane words by filtering badwords (obtained online). Also, add the positive and negative sentiments in a new column as expressed in the bing dictionary. You can read more about other dictionaries here

# Source: https://rpubs.com/Nikotino/58395
badwords <- readLines("http://www.cs.cmu.edu/~biglou/resources/bad-words.txt")

rcoms_tokens <- rcoms_tokens %>% 
  filter(!word %in% badwords) %>%
  inner_join(get_sentiments("bing"))

rcoms_tokens %>% 
  head() %>% 
  kbl() %>% 
  kable_paper("hover", full_width = F)

id	structure	comm_date	comment_score	word	sentiment
64	2	28-01-21	1505	support	positive
64	2	28-01-21	1505	redundancy	negative
269	10	28-01-21	691	sucks	negative
230	7	28-01-21	319	suppress	negative
230	7	28-01-21	319	sadly	negative
230	7	28-01-21	319	horrible	negative

Visualize the sentiments

Modify the code below to create a word cloud. You need to provide x and y aesthetics if you use geom_text. You can generate x and y from the rnorm function.

However, you are free to use other libraries to generate word clouds. See some examples here

rcoms_tokens  %>%
  count(word, sentiment, sort = TRUE) %>%
  group_by(sentiment) %>%
  top_n(10) %>%
  ungroup() %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(n, word, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(x = "Contribution to sentiment",
       y = NULL)

Submission Instructions

I am providing you a separate Rmd file named as template_for_assignment3_submission.Rmd. You need to put all your code in it, put your name, student number and section number at the beginning and then change the name of the file as Assignment_3_[Your_full_Name]. Knit this file and submit the HTML result in the link provided on Blackboard.