OBJECTIVE This study aims to observe and understand the factors involving a household Income and Spending in Region 7 - Central Visayas, Philippines. Does education affect income?, if so, what education level has better income? Is it better to have higher education? Is it better to be employed or own a business?
Upon observation on Spending, which marital status has more spending? what is the relationship of Income to spending? or to marital status?
These are the questions we will attempt to understand and see a trend of. If you are a Business person or wants to know the economic status of Central visayas, this is an in-depth narrative of the Income and Spending status of the Region so you can make data-driven decision for whatever purpose this may economically serve you
We will utilize a data from the Philippines Statistic Authority as uploaded in kaggle: https://www.kaggle.com/grosvenpaul/family-income-and-expenditure/code
This is a two part analysis, 1. Analyze Income vs Spending, and 2. Spending habits (which and what within the sample size spend the most)
DATA STRUCTURE AND RETRIEVAL The data from this study came from kaggle - an online repository and site for data science enthusiast. https://www.kaggle.com/jackdaoud/marketing-data
r7househould <- read.csv("R7_IncomeSpending.csv")
head(r7househould)
TOTAL PARTICIPANTS
length(r7househould$Region)
[1] 2541
We have a total of 2541 sample size. This is enough of a size to tell us the general economic distribution of the population in the region.
SEX AND AGE DISTRIBUTION
r7househould %>% ggplot(aes(Head_Age, fill=Head_Sex)) + geom_density(alpha=0.2) + ggtitle("Sex and Age Distribution of Participants")

mean(r7househould$Head_Age)
[1] 53.00275
table(r7househould$Head_Sex)
Female Male
665 1876
The average age of the participants is 53yo. As shown in the chart, we have more Male over female. We have more Male participants in the younger spectrum and more Female in the older spectrum
PARTICIPANTS BY EDUCATION
r7househould %>% ggplot(aes(Education_Summary)) + geom_bar(fill="blue") + theme(axis.text.x = element_text(angle = 45)) + ggtitle("Participants as Household Head by Education")

r7househould %>% group_by(Education_Summary) %>% summarise(Sum_of_sample=length(Education_Summary), percent_Summary=formatC(Sum_of_sample/length(r7househould$Region)*100)) %>% arrange(desc(Sum_of_sample))
Looking at the chart and numbers in education above, education level with the most participants are in the elementary level group. This means that more participants were not even able to complete elementary level education. THis is surprising as this data is latest between 2018-2019. WIll this affect income and source of income? Let’s proceed.
INCOME DISTRIBUTION
r7househould %>% ggplot(aes(x= Total_Income, fill=Education_Summary)) + geom_density(alpha=0.2) + scale_x_continuous(trans = "log10", labels=scales::comma) + ggtitle("Income Density by Education")

r7househould %>% ggplot(aes(Education_Summary, Total_Income)) + geom_boxplot() + scale_y_continuous(trans = "log10", labels=scales::comma) + ylab("income in php") + ggtitle("Income by Education of Participants") + theme(axis.text.x = element_text(angle = 45))

NA
NA
mean(r7househould$Total_Income)
[1] 234909.3
all_sample <- length(r7househould$Region)
all_educ <- r7househould %>% group_by(Education_Summary) %>% summarise(sum_per_education=length(Education_Summary), percent_sample = length(Education_Summary)/all_sample*100,avg_income=mean(Total_Income)) %>% arrange(desc(avg_income))
all_educ
Majority of the participants are within 100,000 to 700,000 php in Income range with an average of 234,909.Looking at average income per education level, the higher the education level the greater the possibility of higher income. Although looking at the upper outlier income, there is an equal opportunity for everyone to get the million mark income.
Unfortunately, those on elementary level and elementary graduate group are within lower outlier of the distribution.
INCOME BY MAIN SOURCE OF INCOME
r7househould %>% ggplot(aes(x= Total_Income, fill=Main_Income)) + geom_density(alpha=0.2) + scale_x_continuous(trans = "log10", labels=scales::comma) + ggtitle("Income Density by Income source")

r7househould %>% ggplot(aes(Main_Income, Total_Income)) + geom_boxplot() + scale_y_continuous(trans = "log10", labels=scales::comma) + ylab("income in php") + ggtitle("Income by Source of income of Participants") + theme(axis.text.x = element_text(angle = 45))

r7househould %>% ggplot(aes(Head_Age, Total_Income, color=Main_Income)) + geom_jitter()+ scale_y_continuous(trans = "log10", labels=scales::comma) + ggtitle("Source of income through time (age)")

NA
NA
looking at income by age through the income source, we see that between 25-60yo, the large part of the distribution are within wage earners and entrep activities and later on in time from 60yo onwards we are seeing other sources to be prevalent. This could from investments or savings made by the participants.
r7househould %>% group_by(Main_Income) %>% summarise(Sum_per_sample=length(Main_Income), percent_Summary=formatC(Sum_per_sample/length(r7househould$Region)*100),avg_income=mean(Total_Income)) %>% arrange(desc(Sum_per_sample))
There is nothing much of a difference in average income for all participants in the sample data, however for those into entrepreneurial activities, have greater potential of million earnings.
r7househould %>% filter(Total_Income>1000000) %>% group_by(Main_Income) %>% summarise(Sum_per_sample=length(Main_Income), percent_Summary=formatC(Sum_per_sample/length(r7househould$Region)*100),avg_income=mean(Total_Income)) %>% arrange(desc(Sum_per_sample))
And speaking of million earnings, just for the kick, i have dug in a bit and the ratio for the participants relative to having an income of >1000000 we see that it is possible however the chances are slim in all groups of income source.
INCOME VS MARITAL STATUS
r7househould %>% ggplot(aes(Marital_Status)) + geom_bar() + ggtitle("Participants by Marital status")

Majority of our participants are in the Married group and mainly having that large sample size means our analysis will be closer to the realistic analysis for married group regardless of education, income and other relational analysis in this study
r7househould %>% ggplot(aes(Head_Age, Total_Income, color=Marital_Status)) + geom_jitter() + scale_y_continuous(trans = "log10", labels=scales::comma) + ylab("income in php") + ggtitle("Income by Marital Status of Participants")

Married group proves to be more active in higher income generation surpassing the million mark income. although overall, regardless of marital status, we are seeing a wide spread of income range. Married group continues to be active income earners through time from 30-60yo, and the widowed from 60 on wards.
Overall our takeaways for this analysis of household income in Central Visayas are:
Education plays a huge part in alleviating low income/salary
Majority of the household head earners of the sample comprises of individuals who have not even completed their elementary education level or 32% of the sample size.
THe average income for Central Visayas is Php 234,909.3. Regardless of one’s income source, it still averages 230,000-245,000
The active age of the participants in wage/salaries are 30 - 60yo
There is a slim chance of reaching the million peso income mark however, most who have reached the million are wage/salary earners however the higher average are those in entrepreneurial activities
ANNUAL FOOD SPENDING OF PARTICIPANTS BY INCOME SOURCE
spending <- r7househould %>% group_by(Main_Income) %>% summarise(Household=length(Main_Income), Bread_Spending=sum(Bread_Expenditure), Meat_spending=sum(Meat_Expenditure),Veggie_Spending=sum(Vegetables_Expenditure), Fruit_Spending=sum(Fruit_Expenditure))
spending
AVERAGE ANNUAL SPENDING PER INCOME SOURCE
r7househould %>% group_by(Main_Income) %>% summarise(Household=length(Main_Income), Bread_Spending=sum(Bread_Expenditure)/Household, Meat_spending=sum(Meat_Expenditure)/Household,Veggie_Spending=sum(Vegetables_Expenditure)/Household, Fruit_Spending=sum(Fruit_Expenditure)/Household)
NA
ANNUAL FOOD SPENDING PER INCOME SOURCE by WEEKLY SPENDING
r7househould %>% group_by(Main_Income) %>% summarise(Household=length(Main_Income), Bread_Spending=sum(Bread_Expenditure)/Household/52, Meat_spending=sum(Meat_Expenditure)/Household/52,Veggie_Spending=sum(Vegetables_Expenditure)/Household/52, Fruit_Spending=sum(Fruit_Expenditure)/Household/52)
NA
