This project aims to visualise the expenditure patterns across the different households in Singapore using a Household Expenditure Survey that was conducted from October 2017 to September 2018. This is a survey that is conducted every 5 years by the Singapore Department of Statistics. A sample population of 13,100 households in Singapore was utilised in this survey preparation.
As such the scope of this visualization is to look at Singapore households expenditure patterns and to understand what goods and services different households spend more on. This is so as to answer the projects main problem statement “How do Different Households In Singapore Spend Their Money?”
Understanding how different households spend their income, will be useful for public agencies in understanding the economic situation of a country. If certain households are not buying certain goods or services it might be an area to further investigate, it could be an area to conduct further investigations on.
How households spend their money, showcases their importance and in turn also highlights gaps. Public agencies can utilise this information to provide better public assistance and to conduct proper economic activities to close these gaps.
For the purpose of this project we will be looking at these few households as defined by the department of statistics:
Additionally due to an assignment requirement, all visualisation will be static.
We will be utilizing data from the following data sources
Average Monthly Household Expenditure by Type of Goods and Services and Type of Dwelling - A set of CSV files that provide various expenditure breakdowns by type of goods and type of dewlling.
Table 8: Resident Households by Monthly Household Income and Type of Dwelling - provides various income breakdown by type of household.
Here are some expected challenges to be faced and how it will be mitigated.
| Challenges | Mitigations |
|---|---|
| Data required are in separate sheets and documents (e.g. Income data is seperate from Expenditure data) |
Different sheets can be combined utilising the inner_join function |
| Visualisations need to be static as such dynamic features cannot be utilised |
Ensure visualisation are understandable based on their static view, and users can read and understand them easily |
| Unsure on what type of visualisation to use and how to create them in ggplot |
Research possible visualisation and experimenting with them as part of the project |
| Depth of information all will need to be showcased in one visualisation. |
Utilise multiple vizualisation and combine them into one unified visualisation |
| Multiple levels of expenditure expenditure definition is available and will be confusing to reader if we utilise it all. |
Utilise only the highest level of expenditure definition, this will provide an aggregated view of the expenditures that is easily visualised and readable. |
Below you can visualise the expected final visualisation that will be created. This is a rough sketch that will serve as guide as we prepare the project in the next few sections.
In this section i will go through step by step the various visualisation experimentation and data manipulations that was embarked on before the final visualisation.
For the purpose of this project we will be utilising the following packages:
tidyverse contains a set of essential packages for data manipulation and exploration.waffle a R package that helps to create Waffle Chart Visualizations in RgridExtra Provides a number of user-level functions to work with “grid” graphics, notably to arrange multiple grid-based plots on a page.grida package that provides low-level functions to create graphical objects (grobs), and position them on a page and will be utilised in unity with gridExtraThe following code chunk checks if the package required is installed in your environment and installs it if it is not and also loads the package to be utilised.
packages <- c('tidyverse', 'waffle','gridExtra','grid')
for (p in packages) {
if (!require(p, character.only = T)) {
install.packages(p)
}
library(p, character.only = T)
}
# Additional method to instal the waffle package.
#install.packages("waffle", repos = "https://cinc.rud.is")
We will first need the data set regarding the average monthly income earned by each household ### Import data The following code chunk loads the income data into variable hes.
hes <- read_csv('data/hes201718.csv')
Lets take a quick look into the structure and data type of fields in hes.
head(hes)
We will additionally need the data set regarding the average monthly expenditure by each household for each level of goods and services. ### Import Data The following code chunk loads the income data into variable expenditure.
expenditure <- read_csv('data/household_d2.csv')
Lets take a quick look into the structure and data type of fields in expenditure.
head(expenditure)
The final data set to be installed will be the average monthly expenditure by household. ### Import Data The following code chunk loads the income data into variable hh_expenditure.
hh_expenditure <- read_csv('data/average_household_exp.csv')
Lets take a quick look into the structure and data type of fields in hh_expenditure.
head(hh_expenditure)
The first visualisation that will need to be created is something that will allow users to visualise on average how much income is spent on expenditures.
The first step will be to filter the years to 2017/18 which is this projects focus area. The following code chunk does this.
hh_mean_exp <- hh_expenditure %>%
filter(Year == '2017/18')
hh_mean_exp <- hh_mean_exp[-1,] # remove HDB dwellings
#hh_mean_exp
After which we will join the expenditure data with the income data to create a combined income and expenditure dataset that can be utilsied in our visualization.
Additionally we will be creating a new column called Ave_Unspent_Income, that will showcase the ammount of income not spent on expenses. The table will then be extended longer using pivot_longer to create a new column called type that will utilise the HH_Exp_Mean and Ave_Unspent_Income to make the table longer. The following code chunk does this.
exp_income <- inner_join(hh_mean_exp, hes, by = c('D1' = 'Dwelling Type'))
exp_income <- exp_income %>%
mutate(`Ave_Unspent_Income` = `Average Monthly Household Income` - HH_Exp_Mean) %>%
pivot_longer(cols = c(5, 8), names_to = 'type', values_to = 'amount')
exp_income
The final step will be to utilise ggplot to plot out a stacked horizontal graph. The following code chunk does this.
exp_income_plot<-ggplot(exp_income, aes(x = reorder(D1, -amount), y = amount, fill = type)) +
geom_col() +
coord_flip() +
labs(title="How much of their income
do households spend for expenses?",
x=NULL,
y="Ammount in SGD$",
fill="Average Income vs Expense:") +
scale_fill_discrete(labels = c("Unspent Income", "Expenditure"))+
theme(panel.background = element_blank(),
plot.background = element_blank(),legend.position="bottom")
exp_income_plot
This visualisation gives us a quick and easy view of how much each household spends its income on expenditure and how much it saves or doesnt spend on expenditures.
To give a more holistic view of the number of different households that exists in Singapore, we will need to create a visualisation that will show case this number in a aesthetic and understandable manner.
The quick and easy way to showcase this breakdown will be to utilise a barplot, as can be seen below.
ggplot(hes, aes(x =reorder(`Dwelling Type`, `Number of Resident Households`), y = `Number of Resident Households`))+
geom_bar(stat="identity")+
labs(x="Household Type")
While the chart gets the job done, it is not very aesthetically pleasing. As such a visualisation that is more aesthetic, while showcasing the distribution in housholds will be good. As such a good replacement to the bar chart above will be to utilsie a waffle chart.
To do so, we will first need to calculate the percentage value of each household type. The following code chunk does this by first adding up the column Number of Resident Households and then using the total value to divide by individual household values to get the percentage
total <- colSums(hes['Number of Resident Households'])
hes <- hes %>%
mutate(perc_households = round(`Number of Resident Households`/total * 100, 0))
hes
After this is done, we will need to plot the waffle chart. The following code chunk does this.
household_plot<-ggplot(hes, aes(fill = `Dwelling Type`, values = perc_households)) +
geom_waffle(n_rows = 10,
size = 0.5,
colour = "#ffffff",
flip = TRUE) +
labs(title = "How are the 1,345,227 households in singapore split?",
fill="Household Type (1 box = 1%)")+
coord_equal() +
theme_minimal() +
theme_enhance_waffle()
household_plot
This visualisation allows readers to quickly visualise the different proportion of households in Singapore. The visualisation is also more aesthetic, as such this visualisation will be used.
The final visualisation that will need to be created is one that will showcase how different households spend money, and the things that they spend money on.
The first thing to be done will be to conduct data preparation, to filter only the year 2017/18 and filter out Total and HDB Dwellings. We will additionally convert any na values into 0. We will additionally shorten some of expenditure types, as they have long names.
With our data prepared we can move on to plotting out an appropriate visualisation.
expenditure17 <- expenditure %>%
filter(Year == '2017/18') %>%
filter((D1 != 'Total') & (D1 != 'HDB Dwellings'))%>%
mutate(`D2A (2-d)` = ifelse(as.character(`D2A (2-d)`) == "FURNISHINGS, HOUSEHOLD EQUIPMENT AND ROUTINE HOUSEHOLD MAINTENANCE", "HOUSEHOLD EQUIPMENTS", as.character(`D2A (2-d)`)))%>%
mutate(`D2A (2-d)` = ifelse(as.character(`D2A (2-d)`) == "IMPUTED RENTAL FOR OWNER-OCCUPIED ACCOMMODATION", "RENTAL FOR ACCOMMODATION", as.character(`D2A (2-d)`)))
expenditure17$HH_Exp_Mean <- as.double(expenditure17$HH_Exp_Mean)
expenditure17[is.na(expenditure17)] <- 0
expenditure17
There are a few different type of proportional charts that can be utilised to assist us in this task the most simplest one will be facetted bar chart.
ggplot(expenditure17,aes(x = HH_Exp_Mean, y = reorder(`D2A (2-d)`,HH_Exp_Mean),fill=`D2A (2-d)`)) +
geom_bar(stat="identity")+
facet_wrap(~D1, ncol=2)+
labs(y=NULL, x ="Average Monthly Expense")+
theme(axis.text.y = element_blank(),
legend.title=element_text(size=10),
legend.text=element_text(size=8)
)+
guides(shape = guide_legend(override.aes = list(size = 0.5)),
color = guide_legend(override.aes = list(size =0.5))) +
theme(legend.title = element_text(size = 5),
legend.text = element_text(size = 5),
legend.key.size = unit(0.5, "lines"))
While the visualisation allows us to quickly view how each household type spends its money. It is not always easy to compare the difference between the various households and to visualise this proportions.
Another possible chart we can utilise instead is a point chart to visualise this differneces all in one chart.
exp_plot<-ggplot(expenditure17) +
geom_point(aes(x = HH_Exp_Mean, y = reorder(`D2A (2-d)`,HH_Exp_Mean), color = `D1`),size = 5)+
labs(color = "Household Type",
x = "Average Monthly Expenses",
y = NULL)+
theme(text = element_text(size=10),
axis.text.y = element_text(size=10),
legend.title=element_text(size=12),
legend.text=element_text(size=10)
)
exp_plot
This visualisation allows us to better view the distribution of expenditures across the various households (represented as points). It allows readers to view where exactly for each expenditure the various households are. As such this visualisation will be utilised in the final visualisation.
The final visualisation will utilse the 3 graphs that was created in one unified visualisation. To enable this we will be utilising the r package gridExtra. The following code helps us to do this.
grob.title <- textGrob(expression(bold(underline("How do Different Households In Singapore Spend Their Money?"))),gp = gpar(fontsize = 15))
grob.caption <- textGrob("Data source: Household Expenditure Survey 2017/18, Singapore Department Of Statistics",gp = gpar(fontsize = 10))
grid.arrange(exp_plot,
exp_income_plot, household_plot,
ncol = 2,nrow = 2,
layout_matrix = rbind(c(2,3),c(1,1)),
widths = c(40, 40), heights = c(10,30),
top = grob.title,
bottom = grob.caption
)
This is the final visualisation for the project, we can now make further analysis to answer our problem statement based on this visualisation.
Based on the visualisation we can see that:
Apart from all these observations. Interestingly for me those who live in HDB 1&2 room flats spend the most on Food Serving Services, indicating that they eat out more.
To some of us this might seem like a luxury purchase especially considering how small the income is for these households. While it might be cheaper to make food at home and eat, these households may prefer eating out, as they might live alone. However a more likely reason could be that they are too busy from their work (which might be shift-based) to be able to have the privilege of time and energy to cook and eat at home.
As such it might be interesting for the government to look deeper at this and maybe provide more aid to this group in terms of food vouchers that can be spent in hawker centers and restaurants. So that this group will have an easing of their expenditure burden and given the chance to save up.
A work by Rajiv Abraham Xavier