In this Project, we are going to use the experimental estimates of household income well-being during the COVID-19 pandemic as released by Statistics Canada. The data tables from Statistics Canada contain information about various income and savings aspects of the population. The population is divided into quintiles as defined below:

A quintile is a statistical value of a data set that represents 20% of a given population, so the first quintile represents the lowest fifth of the data (1% to 20%); the second quintile represents the second fifth (21% to 40%) and so on

We can extract the data from the Statistics Canada page using the datapasta package

Load Libraries

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.1     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(here)
## here() starts at C:/Users/Owner/Desktop/Sem1/DAB501
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library(tidytext)
library(ggrepel)
library(wordcloud)
## Loading required package: RColorBrewer
library(RColorBrewer)
library(RedditExtractoR)

Loading & Tidying Income Quintile Data

Loading the income group dataset below, using the datapasta addin in RStudio.

income_data <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
Row_Names = c(NA,"Household disposable income","Lowest income quintile",
         "Second income quintile","Third income quintile",
         "Fourth income quintile","Highest income quintile","Compensation of employees",
         "Lowest income quintile","Second income quintile","Third income quintile",
         "Fourth income quintile","Highest income quintile","Net mixed income",
         "Lowest income quintile","Second income quintile",
         "Third income quintile","Fourth income quintile","Highest income quintile",
         "Current transfers received","Lowest income quintile","Second income quintile",
         "Third income quintile","Fourth income quintile",
         "Highest income quintile","Household final consumption expenditure",
         "Lowest income quintile","Second income quintile","Third income quintile",
         "Fourth income quintile","Highest income quintile"),
`2019` = c("dollars","81,351","25,430","52,011","71,153",
             "95,200","162,963","74,834","10,980","33,699","62,253",
             "100,049","167,190","13,081","4,509","9,204","11,613","15,021",
             "25,057","22,348","19,106","24,061","24,142","22,101","22,331",
             "83,043","52,733","63,448","76,738","92,381","129,915"),
Quarter.1.2020 = c("dollars","19,958","6,129",
                     "12,847","17,648","23,188","39,979","18,646","2,644",
                     "8,228","15,418","24,870","42,069","3,112","1,026","2,164",
                     "2,732","3,582","6,056","6,083","4,969","6,449",
                     "6,611","5,904","6,480","19,388","15,050","15,460",
                     "18,285","21,124","27,023"),
Quarter.2.2020 = c("dollars","22,755","8,187",
                     "15,919","20,655","26,178","42,837","17,303","2,338",
                     "7,366","14,059","23,101","39,652","2,918","768","2,022",
                     "2,494","3,437","5,869","9,587","7,234","10,102",
                     "10,646","10,015","9,937","17,903","13,932","14,580",
                     "17,095","19,531","24,377"),
Quarter.3.2020 = c("dollars","23,576","8,386",
                     "15,798","21,109","27,392","45,198","18,555","2,596",
                     "8,167","15,148","24,869","41,994","3,924","1,646","2,805",
                     "3,636","4,419","7,112","7,759","6,252","8,182",
                     "8,355","7,686","8,323","20,488","15,660","16,415",
                     "19,484","22,436","28,446"),
`Q1.2020/2019` = c("% difference","-1.9","-3.6",
                     "-1.2","-0.8","-2.6","-1.9","-0.3","-3.7","-2.3","-0.9",
                     "-0.6","0.6","-4.8","-9.0","-6.0","-5.9","-4.6",
                     "-3.3","8.9","4.0","7.2","9.5","6.9","16.1","-6.6",
                     "14.2","-2.5","-4.7","-8.5","-16.8"),
`Q2.2020/Q1.2020` = c("% difference","14.0",
                        "33.6","23.9","17.0","12.9","7.1","-7.2","-11.6",
                        "-10.5","-8.8","-7.1","-5.7","-6.2","-25.1","-6.6",
                        "-8.7","-4.0","-3.1","57.6","45.6","56.6","61.0",
                        "69.6","53.3","-7.7","-7.4","-5.7","-6.5","-7.5",
                        "-9.8"),
`Q3.2020/Q2.2020` = c("% difference","3.6","2.4",
                        "-0.8","2.2","4.6","5.5","7.2","11.0","10.9",
                        "7.7","7.7","5.9","34.5","114.3","38.7","45.8",
                        "28.6","21.2","-19.1","-13.6","-19.0","-21.5","-23.3",
                        "-16.2","14.4","12.4","12.6","14.0","14.9","16.7")
) %>% 
  slice(-1) %>% 
  select(-c(6:8)) 




income_data <- income_data %>% 
  mutate(categories = case_when(
    Row_Names %in% c("Lowest income quintile", "Second income quintile",
                     "Third income quintile", "Fourth income quintile",
                     "Highest income quintile") ~ NA_character_,
    TRUE ~ Row_Names
  )) %>%
  fill(categories, .direction = "down")%>% 
  filter(Row_Names %in% c("Lowest income quintile", "Second income quintile",
                          "Third income quintile", "Fourth income quintile",
                          "Highest income quintile")) %>% 
  relocate(categories) %>% 
  clean_names()

income_data <- income_data %>% 
  mutate(across(.cols = c(x2019, starts_with("quarter")), .fns = parse_number))


income_data <- income_data %>% 
  rename(`2019` = x2019) %>% 
  pivot_longer(
    cols = c(`2019`, quarter_1_2020, quarter_2_2020, quarter_3_2020), 
    names_to = "quarter", 
    values_to = "dollars"
  ) 

income_data <- income_data %>% 
  pivot_wider(names_from = categories, 
              values_from = dollars) %>% 
  clean_names() %>% 
  relocate(quarter)

income_data <- income_data %>% 
  separate(col = quarter,
           into = c("year", "quarter"), sep = "_")
## Warning: Expected 2 pieces. Additional pieces discarded in 15 rows [2, 3, 4, 6,
## 7, 8, 10, 11, 12, 14, 15, 16, 18, 19, 20].
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 5 rows [1, 5, 9,
## 13, 17].
income_data <- income_data %>% 
  mutate(year = case_when(
    year == "quarter" ~ 2020,
    TRUE ~ as.numeric(year)
  ))  %>% 
  rename(quintile = row_names) %>% 
  mutate(quintile = factor(quintile,
                           levels = c("Lowest income quintile", "Second income quintile",
                                      "Third income quintile", "Fourth income quintile",
                                      "Highest income quintile")))
## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion
income_data <- income_data %>% 
  mutate(q_year = case_when(
    !is.na(quarter) ~ paste0("Q ", quarter, " ", year),
    is.na(quarter) ~ as.character(year)
  )) %>% 
  relocate(q_year)



income_data  %>% kbl() %>% kable_paper("hover", full_width = F)
q_year year quarter quintile household_disposable_income compensation_of_employees net_mixed_income current_transfers_received household_final_consumption_expenditure
2019 2019 NA Lowest income quintile 25430 10980 4509 19106 52733
Q 1 2020 2020 1 Lowest income quintile 6129 2644 1026 4969 15050
Q 2 2020 2020 2 Lowest income quintile 8187 2338 768 7234 13932
Q 3 2020 2020 3 Lowest income quintile 8386 2596 1646 6252 15660
2019 2019 NA Second income quintile 52011 33699 9204 24061 63448
Q 1 2020 2020 1 Second income quintile 12847 8228 2164 6449 15460
Q 2 2020 2020 2 Second income quintile 15919 7366 2022 10102 14580
Q 3 2020 2020 3 Second income quintile 15798 8167 2805 8182 16415
2019 2019 NA Third income quintile 71153 62253 11613 24142 76738
Q 1 2020 2020 1 Third income quintile 17648 15418 2732 6611 18285
Q 2 2020 2020 2 Third income quintile 20655 14059 2494 10646 17095
Q 3 2020 2020 3 Third income quintile 21109 15148 3636 8355 19484
2019 2019 NA Fourth income quintile 95200 100049 15021 22101 92381
Q 1 2020 2020 1 Fourth income quintile 23188 24870 3582 5904 21124
Q 2 2020 2020 2 Fourth income quintile 26178 23101 3437 10015 19531
Q 3 2020 2020 3 Fourth income quintile 27392 24869 4419 7686 22436
2019 2019 NA Highest income quintile 162963 167190 25057 22331 129915
Q 1 2020 2020 1 Highest income quintile 39979 42069 6056 6480 27023
Q 2 2020 2020 2 Highest income quintile 42837 39652 5869 9937 24377
Q 3 2020 2020 3 Highest income quintile 45198 41994 7112 8323 28446

Household Disposable Income Distribution 2019-2020

income_data_subset<-income_data[income_data$q_year!="2019",] 

ggplot(income_data_subset,aes(x=quintile,y=household_disposable_income))+
  geom_bar(aes(fill=q_year),stat="identity")+
  geom_text(aes(label=household_disposable_income))+
    facet_grid(. ~ q_year )+
  coord_flip()+
  labs(title="Household Disposable Income Distribution 2019-2020")+
  theme(
    legend.position = "none",
    axis.title.x=element_blank(),
    axis.title.y=element_blank(),
    axis.ticks.x = element_blank(),
    axis.text.x = element_blank())

Interpreting the plot.
Household Disposable Income is the amount of money that a household has to spend or save after income taxes have been deducted.
There was an increase in all the Income quintiles in Q2 2020 from Q1 2020.Lowest Income Quintile increased the least in comparison. The increase in the rest of the Quintiles was from two to three thousand.
For Q3 2020, Highest Income Quintile again increased by three thousand, while the rest of the quintiles also increased, but only a little. Second Income Quintile decreased by about 200.

Household Final Consumption Expenditure 2019-2020

ggplot(income_data_subset,aes(q_year,
        household_final_consumption_expenditure,color=quintile,group=quintile))+
  geom_point()+
  geom_line()+
  geom_text_repel(aes(label=household_final_consumption_expenditure))+
  labs(title="Household Final Consumption Expenditure 2019-2020")+
  theme(
  
    axis.title.y=element_blank(),
    axis.title.x=element_blank(),
    axis.ticks.y = element_blank(),
    axis.text.y = element_blank())

Interpreting the plot.
Final Consumption expenditure is the total expenses by a household. There was a decrease in it, in all the quintiles, in Q2 of 2020. The most decrease was in Highest Income Quintile.
In Q3 2020, there was an increase, making the household expenditure greater than that in Q1 of 2020, in all the Quintiles.

Determine the change in the income in 2020

According to the Statistics Canada article:

Over the first three quarters of 2020, disposable income for the lowest-income households increased 36.8%, more than for any other households.

Using the income_dataand the formula for percentage change (below) determine the percent increase in the income of lowest-income quintile. This number must be 36.8% as above. $$

% =

$$

income_data_subset %>%  
  filter(quintile=="Lowest income quintile" & q_year %in% c("Q 1 2020","Q 3 2020")) %>%
  select(q_year,quintile,household_disposable_income) %>% 
  mutate(
percent=((household_disposable_income -lag(household_disposable_income))/lag(household_disposable_income))*100
) %>% drop_na() %>%pull()
## [1] 36.82493

Getting the data by age group

Using the same steps as above, we can get the table from this page of Statistics Canada in R. This contains the age group instead of population quintiles. You’ll need to use the datapasta addin in RStudio to do this step.

age_data<-data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
Row_Names = c(NA,"Household disposable income","Younger than 35 years",
         "35 to 44 years","45 to 54 years","55 to 64 years",
         "65 years and older","Compensation of employees","Younger than 35 years",
         "35 to 44 years","45 to 54 years","55 to 64 years","65 years and older",
         "Net mixed income","Younger than 35 years","35 to 44 years",
         "45 to 54 years","55 to 64 years","65 years and older",
         "Current transfers received","Younger than 35 years","35 to 44 years","45 to 54 years",
         "55 to 64 years","65 years and older",
         "Household final consumption expenditure","Younger than 35 years","35 to 44 years","45 to 54 years",
         "55 to 64 years","65 years and older"),
`2019` = c("dollars","81,351","75,337","93,641","104,868",
             "87,365","60,193","74,834","85,308","109,778","122,610",
             "85,218","11,887","13,081","9,744","14,820","16,664","14,903",
             "10,957","22,348","11,584","14,307","13,097","19,075","41,935",
             "83,043","75,494","92,286","107,072","90,092","63,789"),
Quarter.1.2020 = c("dollars","19,958","18,259",
                     "22,859","25,456","21,283","15,243","18,646","20,990",
                     "27,546","30,663","21,227","2,957","3,112","2,245",
                     "3,514","3,920","3,528","2,700","6,083","3,475","4,006",
                     "3,807","5,279","10,934","19,388","17,546","21,532",
                     "25,094","21,210","14,785"),
Quarter.2.2020 = c("dollars","22,755","21,934",
                     "25,728","29,400","24,251","16,630","17,303","19,340",
                     "25,369","28,392","20,088","2,739","2,918","2,141",
                     "3,324","3,710","3,085","2,615","9,587","8,169","8,246",
                     "9,119","8,948","12,033","17,903","16,089","19,898",
                     "23,187","19,714","13,625"),
Quarter.3.2020 = c("dollars","23,576","21,953",
                     "26,781","30,453","25,721","17,286","18,555","20,993",
                     "26,937","30,486","21,424","2,983","3,924","2,773",
                     "4,384","4,855","4,778","3,308","7,759","5,362","5,839",
                     "6,162","6,928","11,997","20,488","18,674","22,933",
                     "26,516","22,421","15,422"),
`Q1.2020/2019` = c("% difference","-1.9","-3.1",
                     "-2.4","-2.9","-2.6","1.3","-0.3","-1.6","0.4","0.0",
                     "-0.4","-0.5","-4.8","-7.8","-5.2","-5.9","-5.3",
                     "-1.4","8.9","20.0","12.0","16.3","10.7","4.3","-6.6",
                     "-7.0","-6.7","-6.3","-5.8","-7.3"),
`Q2.2020/Q1.2020` = c("% difference","14.0",
                        "20.1","12.6","15.5","13.9","9.1","-7.2","-7.9","-7.9",
                        "-7.4","-5.4","-7.4","-6.2","-4.6","-5.4","-5.4",
                        "-12.6","-3.1","57.6","135.1","105.8","139.5",
                        "69.5","10.1","-7.7","-8.3","-7.6","-7.6","-7.1",
                        "-7.8"),
`Q3.2020/Q2.2020` = c("% difference","3.6","0.1",
                        "4.1","3.6","6.1","3.9","7.2","8.5","6.2","7.4",
                        "6.7","8.9","34.5","29.5","31.9","30.9","54.9",
                        "26.5","-19.1","-34.4","-29.2","-32.4","-22.6",
                        "-0.3","14.4","16.1","15.3","14.4","13.7","13.2")
)%>% 
  slice(-1) %>% 
  select(-c(6:8)) 



age_data <- age_data %>% 
  mutate(categories = case_when(
    Row_Names %in% c("Younger than 35 years", "35 to 44 years",
                     "45 to 54 years", "55 to 64 years",
                     "65 years and older") ~ NA_character_,
    TRUE ~ Row_Names
  )) %>%
  fill(categories, .direction = "down")%>% 
  filter(Row_Names %in% c("Younger than 35 years", " 35 to 44 years",
                          "45 to 54 years", "55 to 64 years",
                          "65 years and older")) %>% 
  relocate(categories) %>% 
  clean_names()


age_data <- age_data %>% 
  mutate(across(.cols = c(x2019, starts_with("quarter")), .fns = parse_number))


age_data <- age_data %>% 
  rename(`2019` = x2019) %>% 
  pivot_longer(
    cols = c(`2019`, quarter_1_2020, quarter_2_2020, quarter_3_2020), 
    names_to = "quarter", 
    values_to = "dollars"
  ) 


age_data <- age_data %>% 
  pivot_wider(names_from = categories, 
              values_from = dollars) %>% 
  clean_names() %>% 
  relocate(quarter)

age_data <- age_data %>% 
  separate(col = quarter,
           into = c("year", "quarter"), sep = "_")
## Warning: Expected 2 pieces. Additional pieces discarded in 12 rows [2, 3, 4, 6,
## 7, 8, 10, 11, 12, 14, 15, 16].
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 4 rows [1, 5, 9,
## 13].
age_data <- age_data %>% 
  mutate(year = case_when(
    year == "quarter" ~ 2020,
    TRUE ~ as.numeric(year)
  ))  %>% 
  rename(age_group = row_names) %>% 
  mutate(age_group = factor(age_group,
                           levels = c("Younger than 35 years", " 35 to 44 years",
                          "45 to 54 years", "55 to 64 years",
                          "65 years and older")))
## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion
age_data <- age_data %>% 
  mutate(q_year = case_when(
    !is.na(quarter) ~ paste0("Q ", quarter, " ", year),
    is.na(quarter) ~ as.character(year)
  )) %>% 
  relocate(q_year)



age_data  %>% kbl() %>% kable_paper("hover", full_width = F)
q_year year quarter age_group household_disposable_income compensation_of_employees net_mixed_income current_transfers_received household_final_consumption_expenditure
2019 2019 NA Younger than 35 years 75337 85308 9744 11584 75494
Q 1 2020 2020 1 Younger than 35 years 18259 20990 2245 3475 17546
Q 2 2020 2020 2 Younger than 35 years 21934 19340 2141 8169 16089
Q 3 2020 2020 3 Younger than 35 years 21953 20993 2773 5362 18674
2019 2019 NA 45 to 54 years 104868 122610 16664 13097 107072
Q 1 2020 2020 1 45 to 54 years 25456 30663 3920 3807 25094
Q 2 2020 2020 2 45 to 54 years 29400 28392 3710 9119 23187
Q 3 2020 2020 3 45 to 54 years 30453 30486 4855 6162 26516
2019 2019 NA 55 to 64 years 87365 85218 14903 19075 90092
Q 1 2020 2020 1 55 to 64 years 21283 21227 3528 5279 21210
Q 2 2020 2020 2 55 to 64 years 24251 20088 3085 8948 19714
Q 3 2020 2020 3 55 to 64 years 25721 21424 4778 6928 22421
2019 2019 NA 65 years and older 60193 11887 10957 41935 63789
Q 1 2020 2020 1 65 years and older 15243 2957 2700 10934 14785
Q 2 2020 2020 2 65 years and older 16630 2739 2615 12033 13625
Q 3 2020 2020 3 65 years and older 17286 2983 3308 11997 15422

Recreating the plot for income_data

income_data %>% 
  filter(year == 2020) %>% 
  select(q_year:household_disposable_income) %>% 
  group_by(quintile) %>% 
  mutate(change_in_income = c(NA, diff(household_disposable_income)),
         percent_change_in_income = ((change_in_income/lag(household_disposable_income))*100) %>% round(1)) %>% 
  ungroup() %>% 
  filter(q_year != "Q 1 2020") %>% 
  ggplot(aes(x = quintile, y = percent_change_in_income, fill = q_year)) +
  geom_col(position = position_dodge()) +
  geom_label(aes(label = paste0(percent_change_in_income, "%"))) +
  coord_flip() +
  labs(x=NULL,
       y=NULL)

Lowest Income quintile had the largest increase in Q 2 in 2020 of about 33.6 %.
Interpreting the plot in context of Universal Basic Income.
The graph shows the increases in household disposable incomes in the second and third quarters of 2020.The percent change was the highest for the lowest income households and the least for high income households in the Quarter 2 of 2020, which was the middle of the pandemic. It shows the government of Canada helped out the households with low income levels. It means that the income that low income households get after tax has been deducted substantially increased

Loading and Visualizing comments about Universal Basic Income on Reddit

Loading Data

The reddit thread you want to get data from is here.

library(RedditExtractoR)

rcoms <- reddit_content(URL = "https://old.reddit.com/r/canada/comments/l6lubc/3_in_5_canadians_support_universal_basic_income/")
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

Selecting a few columns and keep the top 10 comments

We are selecting a few interesting columns and then keeping the top 10 comments by the comment_score.

#rcoms <- rcoms %>% 
# select(id, structure, comm_date, comment_score,
#         controversiality, comment) %>% 
#  as_tibble() %>% 
#  mutate(num_of_underscores = str_count(structure, "_")) %>% 
#  filter(num_of_underscores == 0) %>% 
#  slice_max(order_by = comment_score, n = 30)

#rcoms %>% head() %>% kbl() %>% kable_paper("hover", full_width = F)

Commenting this chunk to get more rows for better text analysis.

Tokenize the comments

Breaking the comment paragraphs into individual words and removing common words aka stopwords

library(tidytext)

rcoms_tokens <- rcoms %>%
  unnest_tokens(word, comment) %>% 
  anti_join(stop_words)
## Joining, by = "word"
rcoms_tokens %>% 
  head() %>% 
  kbl() %>% 
  kable_paper("hover", full_width = F)
id structure post_date comm_date num_comments subreddit upvote_prop post_score author user comment_score controversiality title post_text link domain URL word
1 1 28-01-21 28-01-21 2792 canada 0.89 22258 [deleted] CanadianJudo 797 0 3 in 5 Canadians support universal basic income as high as $30K/year [deleted] https://cultmtl.com/2021/01/3-in-5-canadians-supports-universal-basic-income-canada-as-high-as-30k-year-quebec/ cultmtl.com https://old.reddit.com/r/canada/comments/l6lubc/3_in_5_canadians_support_universal_basic_income/?ref=search_posts context
1 1 28-01-21 28-01-21 2792 canada 0.89 22258 [deleted] CanadianJudo 797 0 3 in 5 Canadians support universal basic income as high as $30K/year [deleted] https://cultmtl.com/2021/01/3-in-5-canadians-supports-universal-basic-income-canada-as-high-as-30k-year-quebec/ cultmtl.com https://old.reddit.com/r/canada/comments/l6lubc/3_in_5_canadians_support_universal_basic_income/?ref=search_posts yearly
1 1 28-01-21 28-01-21 2792 canada 0.89 22258 [deleted] CanadianJudo 797 0 3 in 5 Canadians support universal basic income as high as $30K/year [deleted] https://cultmtl.com/2021/01/3-in-5-canadians-supports-universal-basic-income-canada-as-high-as-30k-year-quebec/ cultmtl.com https://old.reddit.com/r/canada/comments/l6lubc/3_in_5_canadians_support_universal_basic_income/?ref=search_posts income
1 1 28-01-21 28-01-21 2792 canada 0.89 22258 [deleted] CanadianJudo 797 0 3 in 5 Canadians support universal basic income as high as $30K/year [deleted] https://cultmtl.com/2021/01/3-in-5-canadians-supports-universal-basic-income-canada-as-high-as-30k-year-quebec/ cultmtl.com https://old.reddit.com/r/canada/comments/l6lubc/3_in_5_canadians_support_universal_basic_income/?ref=search_posts disability
1 1 28-01-21 28-01-21 2792 canada 0.89 22258 [deleted] CanadianJudo 797 0 3 in 5 Canadians support universal basic income as high as $30K/year [deleted] https://cultmtl.com/2021/01/3-in-5-canadians-supports-universal-basic-income-canada-as-high-as-30k-year-quebec/ cultmtl.com https://old.reddit.com/r/canada/comments/l6lubc/3_in_5_canadians_support_universal_basic_income/?ref=search_posts 12k
2 1_1 28-01-21 28-01-21 2792 canada 0.89 22258 [deleted] r3dlazer 460 0 3 in 5 Canadians support universal basic income as high as $30K/year [deleted] https://cultmtl.com/2021/01/3-in-5-canadians-supports-universal-basic-income-canada-as-high-as-30k-year-quebec/ cultmtl.com https://old.reddit.com/r/canada/comments/l6lubc/3_in_5_canadians_support_universal_basic_income/?ref=search_posts extremely

Removing Profanity and adding sentiments

We are removing the profane words by filtering badwords (obtained online). Also, adding the positive and negative sentiments in a new column as expressed in the bing dictionary.

# Source: https://rpubs.com/Nikotino/58395
badwords <- readLines("http://www.cs.cmu.edu/~biglou/resources/bad-words.txt")

rcoms_tokens <- rcoms_tokens %>% 
  filter(!word %in% badwords) %>%
  inner_join(get_sentiments("bing"))
## Joining, by = "word"
rcoms_tokens %>% 
  head() %>% 
  kbl() %>% 
  kable_paper("hover", full_width = F)
id structure post_date comm_date num_comments subreddit upvote_prop post_score author user comment_score controversiality title post_text link domain URL word sentiment
2 1_1 28-01-21 28-01-21 2792 canada 0.89 22258 [deleted] r3dlazer 460 0 3 in 5 Canadians support universal basic income as high as $30K/year [deleted] https://cultmtl.com/2021/01/3-in-5-canadians-supports-universal-basic-income-canada-as-high-as-30k-year-quebec/ cultmtl.com https://old.reddit.com/r/canada/comments/l6lubc/3_in_5_canadians_support_universal_basic_income/?ref=search_posts difficult negative
3 1_1_1 28-01-21 28-01-21 2792 canada 0.89 22258 [deleted] GroceryStoreGremlin 319 0 3 in 5 Canadians support universal basic income as high as $30K/year [deleted] https://cultmtl.com/2021/01/3-in-5-canadians-supports-universal-basic-income-canada-as-high-as-30k-year-quebec/ cultmtl.com https://old.reddit.com/r/canada/comments/l6lubc/3_in_5_canadians_support_universal_basic_income/?ref=search_posts grand positive
5 1_1_1_1_1 28-01-21 28-01-21 2792 canada 0.89 22258 [deleted] GroceryStoreGremlin 109 0 3 in 5 Canadians support universal basic income as high as $30K/year [deleted] https://cultmtl.com/2021/01/3-in-5-canadians-supports-universal-basic-income-canada-as-high-as-30k-year-quebec/ cultmtl.com https://old.reddit.com/r/canada/comments/l6lubc/3_in_5_canadians_support_universal_basic_income/?ref=search_posts overrated negative
6 1_1_1_1_1_1 28-01-21 28-01-21 2792 canada 0.89 22258 [deleted] Benjamin988u 156 0 3 in 5 Canadians support universal basic income as high as $30K/year [deleted] https://cultmtl.com/2021/01/3-in-5-canadians-supports-universal-basic-income-canada-as-high-as-30k-year-quebec/ cultmtl.com https://old.reddit.com/r/canada/comments/l6lubc/3_in_5_canadians_support_universal_basic_income/?ref=search_posts afford positive
6 1_1_1_1_1_1 28-01-21 28-01-21 2792 canada 0.89 22258 [deleted] Benjamin988u 156 0 3 in 5 Canadians support universal basic income as high as $30K/year [deleted] https://cultmtl.com/2021/01/3-in-5-canadians-supports-universal-basic-income-canada-as-high-as-30k-year-quebec/ cultmtl.com https://old.reddit.com/r/canada/comments/l6lubc/3_in_5_canadians_support_universal_basic_income/?ref=search_posts brutal negative
6 1_1_1_1_1_1 28-01-21 28-01-21 2792 canada 0.89 22258 [deleted] Benjamin988u 156 0 3 in 5 Canadians support universal basic income as high as $30K/year [deleted] https://cultmtl.com/2021/01/3-in-5-canadians-supports-universal-basic-income-canada-as-high-as-30k-year-quebec/ cultmtl.com https://old.reddit.com/r/canada/comments/l6lubc/3_in_5_canadians_support_universal_basic_income/?ref=search_posts disabled negative

Visualizing the sentiments using word cloud

library(ggwordcloud)
rcoms_tokens  %>%
  count(word, sentiment, sort = TRUE) %>%
  group_by(sentiment) %>%
  top_n(10) %>%
  ungroup() %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(label=word,size=n, color = sentiment)) +
  geom_text_wordcloud()+
   scale_size_area(max_size = 20)+
  facet_grid(.~sentiment)+
  theme_minimal()
## Selecting by n

Thoughts

As per the analysis, there is a substantial difference between household income and the expenditure of low income families, as opposed to high income families and other quinitiles. While the difference exist for almost all the quintiles for the pandemic year 2020. Due to this, Yes, I think there should be a Universal Basic Income in Canada because even with the government’s help the gap is there,and as also seen by the text analysis,people made more comments that have positive sentiments towards a Basic Income than negative sentiments.