In the face of an ever-evolving economic landscape, gaining insights into the financial of individuals and households is paramount. This project delve into the median income of Canadians across the true north and aims to shed light on the economic realities faced by the average Canadian. This project leverages Cancensus API in exploration to uncover patterns, disparities, and trends in income distribution. With a specific focus on the top 10 cities in Canada. Through this analysis, we endeavor we find that higher median income correlates with larger population centers and outliers in remote and center of Canada.
Given how large Canada is, I want to see how the average income of Canadians are in different regions. I first want to comparing the top 10 cities in Canada and how they fare in the average income. I am using Cancensus, taking data from the 2021 census from Canada and isolating the information that will provide me with the vectors for Median income and th the top 10 cities. The vectors will allow me to pull the information I need from the data set.
data_set = "CA21" # retrieving the 2021 census dataset
label_ = "CD" # For cities
# Pulling vectors that belong to income
income_vectors <- list_census_vectors(data_set) %>%
filter(type=="Total",grepl("income",label)) %>%
pull("vector")
# Finding the top 10 cities by population
regions <- list_census_regions(data_set) %>% filter(level %in% "CSD") %>% top_n(10,pop) %>% as_census_region_list
# list of vector data for income
list_income <- list_census_vectors(data_set) %>%
filter(type=="Total",grepl("income",label)) %>%
select(label, vector)%>% rename(variable=vector)
Looking at the list_income, I will extract the data from the top ten cities from the following list.Next, I pull the data from the vectors that I extracted from the Cancensus API. Then, I selected the top cities, selected and rearrange the pulled data to merge the variable name.
data<- get_census(dataset = data_set,
level = "Regions",
vectors = c(income_vectors),
regions = regions, # only the top 10 cities
geo_format = NA,
labels = 'short')
Next I am processing the data dropping the geometry and selecting the region names and list of vectors from the income list. I then melt the table into Region name so then can plot it for visualization.
# Arranging City names
Cities <- data %>% arrange(desc(Population)) %>% distinct(`Region Name`) %>% pull("Region Name")
# Selecting data for regions, income and reshaping
plot_data <- data %>%
select(c(`Region Name`, income_vectors)) %>%
melt(id="Region Name") %>%
mutate(`Region Name` = factor(`Region Name`, levels = Cities,
ordered=TRUE))
plot_data <- merge(plot_data, list_income, by = "variable")
First I only wanted to analyze the median income and just filter for only variables that have Median in it.
The statistics looks a bit similar - it could be because are looking at the big cities in Canada. Looking at the biggest cities in Canada, Toronto and Vancouver, we see that while Toronto’s median house hold income is at $84,000 similar to Vancouver’s $82,000. While Ottawa has a significantly higher median house hold income at $102,000. This is still looking at the cities, therefore I will map the “Median after-tax income in 2020 among recipients” across Canada to see if there is a difference.
Wanting to see if there is a difference if there is an average or median when plot, I filter it for average income and did not find much differences.
I wanted to see how different variables fare across Canada so I created a function that can pull data from the 2021 census to a choropleth map depicting the variability of the variable on the side bar. The vectors can be found using the list_income dataframe.
From this we find that the higher median income is obtain higher up north and not in one of the big cities. However, it is hard to tell which city is which. I next use leaflet, I should be able to find specifically which cities have a higher median income.
## Reading regions list from local cache.
## Reading vectors data from local cache.
## Reading geo data from local cache.