1 Overview

This project aims to visualise the expenditure patterns across the different households in Singapore using a Household Expenditure Survey that was conducted from October 2017 to September 2018. This is a survey that is conducted every 5 years by the Singapore Department of Statistics. A sample population of 13,100 households in Singapore was utilised in this survey preparation.

1.1 Problem Statement

As such the scope of this visualization is to look at Singapore households expenditure patterns and to understand what goods and services different households spend more on. This is so as to answer the projects main problem statement “How do Different Households In Singapore Spend Their Money?”

1.2 Purpose of Visualisation

Understanding how different households spend their income, will be useful for public agencies in understanding the economic situation of a country. If certain households are not buying certain goods or services it might be an area to further investigate, it could be an area to conduct further investigations on.

How households spend their money, showcases their importance and in turn also highlights gaps. Public agencies can utilise this information to provide better public assistance and to conduct proper economic activities to close these gaps.

For the purpose of this project we will be looking at these few households as defined by the department of statistics:

  • HDB 1-and-2 Room Flats
  • HDB 3 Room Flats
  • HDB 5 Room & Executive Flats
  • Condominium & Other Apartments
  • Landed Properties

Additionally due to an assignment requirement, all visualisation will be static.

1.3 The Data

We will be utilizing data from the following data sources

2 Expected Data and Design Challenges and Mitigation

Here are some expected challenges to be faced and how it will be mitigated.

Challenges Mitigations
Data required are in separate sheets and documents
(e.g. Income data is seperate from Expenditure data)
Different sheets can be combined utilising the inner_join function
Visualisations need to be static as such dynamic
features cannot be utilised
Ensure visualisation are understandable based on their static view,
and users can read and understand them easily
Unsure on what type of visualisation to use
and how to create them in ggplot
Research possible visualisation and experimenting with them as part
of the project
Depth of information all will need to be showcased
in one visualisation.
Utilise multiple vizualisation and combine them into one unified
visualisation
Multiple levels of expenditure expenditure definition
is available and will be confusing to reader if
we utilise it all.
Utilise only the highest level of expenditure definition, this will
provide an aggregated view of the expenditures that is easily
visualised and readable.

3 Visualisation Sketch

Below you can visualise the expected final visualisation that will be created. This is a rough sketch that will serve as guide as we prepare the project in the next few sections.

4 Data Viz Step by Step

In this section i will go through step by step the various visualisation experimentation and data manipulations that was embarked on before the final visualisation.

4.1 Install and load R packages

For the purpose of this project we will be utilising the following packages:

  • tidyverse contains a set of essential packages for data manipulation and exploration.
  • waffle a R package that helps to create Waffle Chart Visualizations in R
  • gridExtra Provides a number of user-level functions to work with “grid” graphics, notably to arrange multiple grid-based plots on a page.
  • grida package that provides low-level functions to create graphical objects (grobs), and position them on a page and will be utilised in unity with gridExtra

The following code chunk checks if the package required is installed in your environment and installs it if it is not and also loads the package to be utilised.

packages <- c('tidyverse', 'waffle','gridExtra','grid')

for (p in packages) {
  if (!require(p, character.only = T)) {
    install.packages(p)
  }
  library(p, character.only = T)
}

# Additional method to instal the waffle package.
#install.packages("waffle", repos = "https://cinc.rud.is")

4.2 Income by household data

We will first need the data set regarding the average monthly income earned by each household ### Import data The following code chunk loads the income data into variable hes.

hes <- read_csv('data/hes201718.csv')

4.2.1 Observe data

Lets take a quick look into the structure and data type of fields in hes.

head(hes)

4.3 Expenditure by household and goods and service data

We will additionally need the data set regarding the average monthly expenditure by each household for each level of goods and services. ### Import Data The following code chunk loads the income data into variable expenditure.

expenditure <- read_csv('data/household_d2.csv')

4.3.1 Observe data

Lets take a quick look into the structure and data type of fields in expenditure.

head(expenditure)

4.4 Average Expenditure by Household

The final data set to be installed will be the average monthly expenditure by household. ### Import Data The following code chunk loads the income data into variable hh_expenditure.

hh_expenditure <- read_csv('data/average_household_exp.csv')

4.4.1 Observe data

Lets take a quick look into the structure and data type of fields in hh_expenditure.

head(hh_expenditure)

4.5 Visualising Expenditure and Income by Household Types

The first visualisation that will need to be created is something that will allow users to visualise on average how much income is spent on expenditures.

The first step will be to filter the years to 2017/18 which is this projects focus area. The following code chunk does this.

hh_mean_exp <- hh_expenditure %>%
  filter(Year == '2017/18')

hh_mean_exp <- hh_mean_exp[-1,] # remove HDB dwellings
#hh_mean_exp

After which we will join the expenditure data with the income data to create a combined income and expenditure dataset that can be utilsied in our visualization.

Additionally we will be creating a new column called Ave_Unspent_Income, that will showcase the ammount of income not spent on expenses. The table will then be extended longer using pivot_longer to create a new column called type that will utilise the HH_Exp_Mean and Ave_Unspent_Income to make the table longer. The following code chunk does this.

exp_income <- inner_join(hh_mean_exp, hes, by = c('D1' = 'Dwelling Type'))
exp_income <- exp_income %>%
  mutate(`Ave_Unspent_Income` = `Average Monthly Household Income` - HH_Exp_Mean) %>%
  pivot_longer(cols = c(5, 8), names_to = 'type', values_to = 'amount')
exp_income

The final step will be to utilise ggplot to plot out a stacked horizontal graph. The following code chunk does this.

exp_income_plot<-ggplot(exp_income, aes(x = reorder(D1, -amount), y = amount, fill = type)) +
  geom_col() + 
  coord_flip() + 
  labs(title="How much of their income 
       do households spend for expenses?",
       x=NULL,
       y="Ammount in SGD$",
       fill="Average Income vs Expense:") +
  scale_fill_discrete(labels = c("Unspent Income", "Expenditure"))+
  theme(panel.background = element_blank(),
        plot.background = element_blank(),legend.position="bottom")

exp_income_plot

This visualisation gives us a quick and easy view of how much each household spends its income on expenditure and how much it saves or doesnt spend on expenditures.

4.6 Visualising Household Type

To give a more holistic view of the number of different households that exists in Singapore, we will need to create a visualisation that will show case this number in a aesthetic and understandable manner.

The quick and easy way to showcase this breakdown will be to utilise a barplot, as can be seen below.

ggplot(hes, aes(x =reorder(`Dwelling Type`, `Number of Resident Households`), y = `Number of Resident Households`))+
  geom_bar(stat="identity")+
  labs(x="Household Type")

While the chart gets the job done, it is not very aesthetically pleasing. As such a visualisation that is more aesthetic, while showcasing the distribution in housholds will be good. As such a good replacement to the bar chart above will be to utilsie a waffle chart.

To do so, we will first need to calculate the percentage value of each household type. The following code chunk does this by first adding up the column Number of Resident Households and then using the total value to divide by individual household values to get the percentage

total <- colSums(hes['Number of Resident Households'])
hes <- hes %>%
  mutate(perc_households = round(`Number of Resident Households`/total * 100, 0))
hes

After this is done, we will need to plot the waffle chart. The following code chunk does this.

household_plot<-ggplot(hes, aes(fill = `Dwelling Type`, values = perc_households)) +
  geom_waffle(n_rows = 10,
              size = 0.5,
              colour = "#ffffff",
              flip = TRUE) +
  labs(title = "How are the 1,345,227 households in singapore split?",
       fill="Household Type (1 box = 1%)")+
  coord_equal() +
  theme_minimal() +
  theme_enhance_waffle()
  
household_plot

This visualisation allows readers to quickly visualise the different proportion of households in Singapore. The visualisation is also more aesthetic, as such this visualisation will be used.

4.7 Visualising Expenditure By Households

The final visualisation that will need to be created is one that will showcase how different households spend money, and the things that they spend money on.

The first thing to be done will be to conduct data preparation, to filter only the year 2017/18 and filter out Total and HDB Dwellings. We will additionally convert any na values into 0. We will additionally shorten some of expenditure types, as they have long names.

With our data prepared we can move on to plotting out an appropriate visualisation.

expenditure17 <- expenditure %>%
  filter(Year == '2017/18') %>%
  filter((D1 != 'Total') & (D1 != 'HDB Dwellings'))%>%
  mutate(`D2A (2-d)` = ifelse(as.character(`D2A (2-d)`) == "FURNISHINGS, HOUSEHOLD EQUIPMENT AND ROUTINE HOUSEHOLD MAINTENANCE", "HOUSEHOLD EQUIPMENTS", as.character(`D2A (2-d)`)))%>%
  mutate(`D2A (2-d)` = ifelse(as.character(`D2A (2-d)`) == "IMPUTED RENTAL FOR OWNER-OCCUPIED ACCOMMODATION", "RENTAL FOR ACCOMMODATION", as.character(`D2A (2-d)`)))

expenditure17$HH_Exp_Mean <- as.double(expenditure17$HH_Exp_Mean)
expenditure17[is.na(expenditure17)] <- 0
expenditure17

There are a few different type of proportional charts that can be utilised to assist us in this task the most simplest one will be facetted bar chart.

ggplot(expenditure17,aes(x = HH_Exp_Mean, y = reorder(`D2A (2-d)`,HH_Exp_Mean),fill=`D2A (2-d)`)) +
  geom_bar(stat="identity")+
  facet_wrap(~D1, ncol=2)+
  labs(y=NULL, x ="Average Monthly Expense")+
  theme(axis.text.y = element_blank(),
        legend.title=element_text(size=10),
        legend.text=element_text(size=8)
        )+
  guides(shape = guide_legend(override.aes = list(size = 0.5)),
               color = guide_legend(override.aes = list(size =0.5))) +
        theme(legend.title = element_text(size = 5), 
              legend.text  = element_text(size = 5),
              legend.key.size = unit(0.5, "lines"))

While the visualisation allows us to quickly view how each household type spends its money. It is not always easy to compare the difference between the various households and to visualise this proportions.

Another possible chart we can utilise instead is a point chart to visualise this differneces all in one chart.

exp_plot<-ggplot(expenditure17) +
    geom_point(aes(x = HH_Exp_Mean, y = reorder(`D2A (2-d)`,HH_Exp_Mean), color = `D1`),size = 5)+
  labs(color = "Household Type", 
       x = "Average Monthly Expenses", 
       y = NULL)+
  theme(text = element_text(size=10),
        axis.text.y = element_text(size=10),
        legend.title=element_text(size=12),
        legend.text=element_text(size=10)
        )
exp_plot

This visualisation allows us to better view the distribution of expenditures across the various households (represented as points). It allows readers to view where exactly for each expenditure the various households are. As such this visualisation will be utilised in the final visualisation.

5 Final Visualisation

5.1 Combining Visualisation to Create Final Visualisation

The final visualisation will utilse the 3 graphs that was created in one unified visualisation. To enable this we will be utilising the r package gridExtra. The following code helps us to do this.

grob.title <- textGrob(expression(bold(underline("How do Different Households In Singapore Spend Their Money?"))),gp = gpar(fontsize = 15))

grob.caption <- textGrob("Data source: Household Expenditure Survey 2017/18, Singapore Department Of Statistics",gp = gpar(fontsize = 10))

grid.arrange(exp_plot,
             exp_income_plot, household_plot,
             ncol = 2,nrow = 2,
             layout_matrix = rbind(c(2,3),c(1,1)),
             widths = c(40, 40), heights = c(10,30),
             top = grob.title, 
             bottom = grob.caption
             )

This is the final visualisation for the project, we can now make further analysis to answer our problem statement based on this visualisation.

5.2 Analysis of Visualisation

Based on the visualisation we can see that:

  • Majority of Singapore Residents (32%) live in HDB 4 room flats and Food serving servicesare the biggest expenses for these households. Food serving services are any expenses that are incurred in for example restaurants, hawkers, cafes.
  • For those who live in HDB 5 rooms & Executive Flats, transport is their largest expenditure
  • Landed Property households spend more of their money onhousing and utilities, which makes sense as landed property have higher maintenance and utility prices.
  • HDB 3 room flats spend the most on rental for accommodation, this is interesting to note as it might indicate that there are larger 3 room flat rentals or that they may own a 3 room HDB and instead they rent another property as well.
  • While those who live in condominium and other apartments spend most of their income on recreation and culture.
  • HDB 1&2 Room flats spend a larger percent of their income on expenses compared to those who live in landed property who have larger percent of their income left over after their expense.

Apart from all these observations. Interestingly for me those who live in HDB 1&2 room flats spend the most on Food Serving Services, indicating that they eat out more.

To some of us this might seem like a luxury purchase especially considering how small the income is for these households. While it might be cheaper to make food at home and eat, these households may prefer eating out, as they might live alone. However a more likely reason could be that they are too busy from their work (which might be shift-based) to be able to have the privilege of time and energy to cook and eat at home.

As such it might be interesting for the government to look deeper at this and maybe provide more aid to this group in terms of food vouchers that can be spent in hawker centers and restaurants. So that this group will have an easing of their expenditure burden and given the chance to save up.

 

A work by Rajiv Abraham Xavier