library(readr)
library(dplyr)
library(tidyverse)
library(ggplot2)

Social Level In Several Major City of Phillipines

Background

Currently the earth has showing rapid and significant change in almost every part of the life circle.

We as a human also becoming the part of this rapid change as we can see for example what we understand some invention born in 20 year ago from now is now gradually becoming obsolete such as transportation, Computer, Food, Education and many other field which continuously challange with a new different understanding and Innovation.

We Can not deny that this significant and rapid change in many field should affect our life as a human being. Therefore to survive and sustain in this current dynamic world we as a human should have a relevant parameter which compiling information on our society status so that we as society has a standard to adapt with the change and therefore we as a human could take advantage from the change and avoid the opportunity that this change on the contrary bring a lot suffer for a human being.

From the above narration, we have an idea to create a small scale clustering modeling based on Philippines Data consist of several prime aspect of human life. We hope that our modeling could be a useful tool to be use or develop for showing the current status of human society standard in many area of the world

For further improvement we also encourage other interested party either government nor private to use our modeling and dashboard to identify social status of certain area which might give insight on executing special purpose such as logistic distribution, charity or any other social and economic activity.

Variable use

We find a suitable data to be use on our purpose from

https://www.kaggle.com/grosvenpaul/family-income-and-expenditure, which containing information given by The Philippine Statistics Authority (PSA) spearheads the conduct of the Family Income and Expenditure Survey (FIES) nationwide.

Below we will show the information show in the dataset

Phillipines <- read.csv("Family Income and Expenditure.csv")

Phillipines

from the dataset, we understand that there were a lot of information is shown from dataset such as total household income, main source of income, education in more specific information, variant food expenditure on every household to variance household inventory and equipment.

In the purpose of our modeling we only use several prime variable which we consider to identify the society status in each region.

Below we provide explanation on our selected variable

Phillipines_select <- Phillipines %>% 
                      select(Total.Household.Income, Region, Total.Food.Expenditure, Restaurant.and.hotels.Expenditure, Alcoholic.Beverages.Expenditure, Tobacco.Expenditure, Clothing..Footwear.and.Other.Wear.Expenditure, Housing.and.water.Expenditure, Imputed.House.Rental.Value, Communication.Expenditure, Transportation.Expenditure, Education.Expenditure, Medical.Care.Expenditure, Household.Head.Age)

Phillipines_select
  1. Total income : Total income on each and every house house hold in phillipines

  2. Region : Region of observation in several province of Philippines

  3. Total food expenditure : Amount of money spend by Philippines house hold on food

  4. Pleasure/entertainment expenditure (Restaurant,alcohol and tobacco) : Amount pf money spend by Philippines house hold on pleasure/entertainment

  5. Primary (Clothing, housing and water, imputed,transportation, communication): Amount pf money spend by Philippines house hold on primary expenditure

  6. Education : Amount pf money spend by Philippines house hold on education

  7. Medical : Amount pf money spend by Philippines house hold on medical

  8. Household Head Age : Age of house hold head in every data observation

Merging Column

Here we will merge some variable into new column name Pleasure and Education

Phillipines_select$Pleasure <- Phillipines_select$Restaurant.and.hotels.Expenditure + Phillipines_select$Alcoholic.Beverages.Expenditure + Phillipines_select$ Tobacco.Expenditure

Phillipines_select$Primary <- Phillipines_select$Clothing..Footwear.and.Other.Wear.Expenditure + Phillipines_select$Housing.and.water.Expenditure + Phillipines_select$Imputed.House.Rental.Value + Phillipines_select$Transportation.Expenditure + Phillipines_select$Communication.Expenditure 
Philipines_ok <- Phillipines_select %>% 
  select(-(Restaurant.and.hotels.Expenditure)) %>% 
  select(-(Alcoholic.Beverages.Expenditure)) %>% 
  select(-(Tobacco.Expenditure)) %>% 
  select(-(Clothing..Footwear.and.Other.Wear.Expenditure)) %>% 
  select(-(Housing.and.water.Expenditure)) %>% 
  select(-(Imputed.House.Rental.Value)) %>% 
  select(-(Transportation.Expenditure)) %>% 
  select(-(Communication.Expenditure))  
Philipines_ok

Data Exploration

Check NA Value

colSums(is.na(Philipines_ok))
##   Total.Household.Income                   Region   Total.Food.Expenditure 
##                        0                        0                        0 
##    Education.Expenditure Medical.Care.Expenditure       Household.Head.Age 
##                        0                        0                        0 
##                 Pleasure                  Primary 
##                        0                        0

Change data type of Variable Region into Factor

Philipines_ok <- Philipines_ok %>% 
  mutate(Region = as.factor(Region))
Philipines_ok
hist(Philipines_ok$Total.Household.Income)

hist(Philipines_ok$Total.Food.Expenditure)

hist(Philipines_ok$Education.Expenditure)

hist(Philipines_ok$Medical.Care.Expenditure)

hist(Philipines_ok$Household.Head.Age)

hist(Philipines_ok$Pleasure)

hist(Philipines_ok$Primary)

### Corelation

GGally::ggcorr(Philipines_ok %>% select_if(is.numeric), label = T)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

From the above diagram we understand that there are strong correlation in some variable such as

  1. Total house hold income vs total Expenditure : This corelation might showing us that the increase of house hold income might affect the food expenditure of the family

  2. Total house hold income vs Primary : The Primary expenditure of a house hold might increase due to increaseing number of house hold income

  3. Total Food expenditure vs Pleasure : This strong correlation might come from some variable in total food and expenditure is including alcoholic and restaurant spending.

ML Modeling and Shinny apps

From the chosen variable we will create scoring on each variable and then accumulate the scoring to see level of society in every house hold. After we get the scoring we will mapping the score level into the several region in Philippines to see