Sectoral Analysis of Employment in Metro Manila Using PSA’s LFS Data
Author
Benedict Bautista
Data cleaning
In this phase of the study, the Labor Force Survey (LFS) dataset provided by the Philippine Statistics Authority (PSA) was utilized as the primary data source. The initial step involved identifying and selecting the key variables necessary for the analysis. Following this, the dataset underwent a data cleaning process, which included handling and removing any missing or incomplete values to ensure the accuracy and reliability of the results.
The following libraries will be used: tidyverse ,pander for APA tables, nortest for the test of normality, and here for easier working directory.
Loading of the necessary libraries
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(pander)library(here)
here() starts at C:/Users/ben/Desktop/portfolio RPUB
library(nortest)
Loading the dataset
The dataset was acquired from psada.gov.ph under Labor Force Survey tab and then the month of March 2024.
data <-read.csv(here("LFS PUF March 2024.csv"))
Filtering variables
In the next step, we wish to filter variables needed for the analysis. The variables needed are PUFC15_PKB coded as Major industry group and Sub industry group, PUFREG coded as Region, PUFC09 coded as Work indicator (1 = employed, 2 = unemployed), and PUFC07 coded as Educational attainment and PUFWGTPRV for the final weight based on projection. Please refer to lfs_metadata (dictionary) uploaded or you can visit the site :https://psada.psa.gov.ph/catalog/LFS/about
# filtering valuesdata1 <- data %>%filter(PUFREG ==13) %>%# 13 for NCRmutate(sectors =case_when( PUFC15_PKB %in%1:3~"Agriculture", PUFC15_PKB %in%5:43~"Industry", PUFC15_PKB >=45~"Services", T ~NA_character_ ),sub.industry =case_when( PUFC15_PKB %in%1:2~"Agriculture and Forestry", PUFC15_PKB ==3~"Fishing and Aquaculture", PUFC15_PKB %in%5:9~"Mining and Quarying", PUFC15_PKB %in%10:33~"Manufacturing", PUFC15_PKB >=35~"Electricity, Gas, Steam, AC", PUFC15_PKB %in%36:39~"Water Supply, Sewerage, Waste Management", PUFC15_PKB %in%41:43~"Construction", PUFC15_PKB %in%45:47~"Wholesale and Retail Trade", PUFC15_PKB %in%49:53~"Transportation and Storage", PUFC15_PKB %in%55:56~"Accommodation and Food Service Activities", PUFC15_PKB %in%58:63~"Information and Communication", PUFC15_PKB %in%64:66~"Financial and Insurance Activities", PUFC15_PKB ==68~"Real Estate Activities", PUFC15_PKB %in%69:75~"Professional, Scientific, and Technical Activities", PUFC15_PKB %in%77:82~"Administrative and Support Service Activities", PUFC15_PKB ==84~"Public Administration and Defense", PUFC15_PKB ==85~"Education", PUFC15_PKB %in%86:88~"Human Health and Social Work", PUFC15_PKB %in%90:93~"Arts, Entertainment and Recreation", PUFC15_PKB %in%94:96~"Other Service Activities", PUFC15_PKB %in%97:98~"Activities of Households as Employers", PUFC15_PKB ==99~"Activities of Extraterritorial Orgs and Bodies", T ~NA_character_ ),employed =case_when( PUFC09_WORK ==1~"Employed", PUFC09_WORK ==2~"Unemployed", T ~NA_character_ ) ) %>%filter(!is.na(sectors), !is.na(sub.industry), !is.na(employed))# select variablesdataset <- data1 %>%select(PUFC15_PKB, PUFREG, PUFC09_WORK, PUFC07_GRADE, PUFPWGTPRV)# renaming column namescolnames(dataset) <-c("Industries", "Region", "Employment Status", "Educational Attainment", "Final weight")head(dataset, n =10)
To gain an initial understanding of the dataset and address the first two research questions, this study begins with an exploratory data analysis. Specifically, it aims to describe how employed Filipino workers are distributed across the country’s major and sub industry sectors, providing foundational insights for further statistical evaluation.
dist.1<- dataset %>%filter(`Employment Status`==1) %>%mutate(major =case_when( Industries %in%1:3~"Agriculture", Industries %in%5:43~"Industry", Industries %in%45:99~"Services", T ~NA_character_)) # aggregate the dist.1major.dist <- dist.1%>%group_by(major) %>%summarise(`total weight`=sum(`Final weight`, na.rm =TRUE), .groups ="drop")major.dist <- major.dist %>%bind_rows(summarise( major.dist,major ="TOTAL",`total weight`=sum(`total weight`)))colnames(major.dist) <-c("Major Industry", "Population")pander(major.dist,caption ="Total Population per Major Industry")
Total Population per Major Industry
Major Industry
Population
Agriculture
5298
Industry
920484
Services
5276199
TOTAL
6201981
We cannot assume that the population is normally distributed without formally testing it. This is where the Shapiro–Francia test comes in. It is a variant of the Shapiro–Wilk test, designed to perform better with larger sample sizes. In our case, we are working with 2,580 samples drawn from a dataset containing 44,603 observations.
# normality test of employed NCR citizen per major sectoremployed <- dist.1%>%filter(`Employment Status`==1)employed %>%group_by(major) %>%summarise(p_value =sf.test(`Final weight`)$p.value )
# A tibble: 3 × 2
major p_value
<chr> <dbl>
1 Agriculture 5.63e- 2
2 Industry 1.28e-21
3 Services 9.31e-41
Plots of Weighted Frequency Distribution in Major Sectors
ggplot(dist.1, aes(x =`Final weight`)) +geom_histogram(fill ="steelblue", bins =28) +facet_wrap(~ major, scales ="free") +labs(title ="Distribution of employed by Major Sector (Weighted)",x ="Major Industry",y ="Population" ) +theme_minimal()
Analysis of Major Sector Distribution
A test of normality was conducted to assess the distribution of employed individuals across the major industry sectors. Results indicated that the Agriculture sector was the only one with a distribution that did not significantly deviate from normality (p = .056), whereas the Industry and Services sectors showed strong evidence against normality. Furthermore, we can notice from the graphs that Industry sector and Services sector is skewed to the right which indicate that we can utilize logarithmic transformation to the weights of Industry and Services sector to achieve normality.
RQ #2
From the code earlier we can tweak it to remove the weights (which represent the number of people in the population that each sampled respondent stands for) to get the actual frequency of each sector.
dist.2<- dataset %>%filter(`Employment Status`==1) %>%mutate(major =case_when( Industries %in%1:3~"Agriculture", Industries %in%5:43~"Industry", Industries %in%45:99~"Services", T ~NA_character_)) # aggregate the dist.2major.dist1 <- dist.2%>%group_by(major) %>%summarise(frequency =n(), .groups ="drop")major.dist1 <- major.dist1 %>%bind_rows(summarise( major.dist1,major ="TOTAL",frequency =sum(frequency)))colnames(major.dist1) <-c("Major Industry", "Population")pander(major.dist1,caption ="Total Frequency per Major Industry")
Total Frequency per Major Industry
Major Industry
Population
Agriculture
9
Industry
427
Services
2133
TOTAL
2569
Plots of Frequency Distribution in Major Sectors
We can graph it to visualize which sector contributes more in the employment in NCR.
ggplot(dist.2, aes(x =fct_rev(fct_infreq(major)))) +geom_bar(fill ="steelblue") +coord_flip() +geom_text(stat ="count", aes(label =after_stat(count)),color ="black",hjust =1.1,size =3.5) +labs(title ="Distribution of Employed by Major Sector",x ="Major Sector",y ="Count" ) +theme_minimal()
Analysis of the population of employed personnel each sector
Upon observing the graph we can notice that the distribution of the frequency of employed personnel across all the major sectors is not normally distributed, because it does not resemble a bell-shaped curve, which also suggest that formal test is not needed. Furthermore, most of the employed personnel in NCR falls under the Services sector placing as the most populated sector in the NCR, Industry sector placing second and Agriculture sector placing last, which indicate that Agriculture sector is the least populated among the three major sectors in NCR alone.
RQ #3
Identifying the Leading Employment Sub-Sector in Services Sector in NCR
Previously, we identified the total number of employed individuals across major sectors. Based on the aggregated and visualized data, it was observed that the majority of the workforce is employed in the Services sector, while the Agriculture sector had the lowest share.
In this section, our goal is to determine which sub-sector within the Services sector accounts for the highest number of employed individuals. To achieve this, we will aggregate relevant data and present visualizations to uncover further insights.
sub <- dist.2%>%filter(major =="Services") %>%mutate(sector2 =case_when( Industries %in%45:47~"Wholesale and Retail Trade", Industries %in%49:53~"Transportation and Storage", Industries %in%55:56~"Accommodation and Food Service Activities", Industries %in%58:63~"Information and Communication", Industries %in%64:66~"Financial and Insurance Activities", Industries ==68~"Real Estate Activities", Industries %in%69:75~"Professional, Scientific, and Technical Activities", Industries %in%77:82~"Administrative and Support Service Activities", Industries ==84~"Public Administration and Defense", Industries ==85~"Education", Industries %in%86:88~"Human Health and Social Work", Industries %in%90:93~"Arts, Entertainment and Recreation", Industries %in%94:96~"Other Service Activities", Industries %in%97:98~"Activies of Households as Employers", Industries %in%99~"Activities of Extraterritorial Orgs and Bodies", T ~NA_character_ ))sub.dist1 <- sub %>%group_by(sector2) %>%summarise(frequency =n(), .groups ="drop")sub.dist1 <- sub.dist1 %>%bind_rows(summarise( sub.dist1,sector2 ="TOTAL",frequency =sum(frequency)))colnames(sub.dist1) <-c("Sub Industry", "Frequency")pander(sub.dist1,caption ="Total Frequency per Sub Industry")
Total Frequency per Sub Industry
Sub Industry
Frequency
Accommodation and Food Service Activities
213
Administrative and Support Service Activities
344
Arts, Entertainment and Recreation
25
Education
60
Financial and Insurance Activities
70
Human Health and Social Work
60
Information and Communication
73
Other Service Activities
196
Professional, Scientific, and Technical Activities
49
Public Administration and Defense
154
Real Estate Activities
32
Transportation and Storage
273
Wholesale and Retail Trade
584
TOTAL
2133
We can graph it to observe the distribution and proportion of each Sub Industry to the overall frequency of the Services Sector.
pie_data <- sub %>%filter(`Employment Status`==1) %>%count(sector2) %>%mutate(percentage = n /sum(n) *100,label =paste0(round(percentage, 1), "%"))colnames(pie_data) <-c("Sub-Industry", "Count", "Percentage", "Rounded")ggplot(pie_data, aes(x =fct_reorder(`Sub-Industry`, Percentage), y = Percentage)) +geom_col(fill ="steelblue") +geom_text(aes(label =paste0(Rounded)), color ="white", hjust =1.1, size =3.5) +coord_flip() +labs(title ="Percentage Distribution of Employed Personnel by Sub-sector",x ="Sub-Industry",y ="Percentage" ) +theme_minimal()
From the graphs presented we can observe that Wholesale and Retail Trade placed first with 584 or 27.4% employed personnel in the most populated sub-industry under Services Sector. Consequently, Administrative and Support Service Activities placed second with 344 or 16.1% employed personnel, while Transportation and Storage placed third with 273 or 12.8% employed personnel, indicating that the top three sub industries under Service sector are Wholesale and Retail Trade, Administrative and Support Service Activities, and Transportation and Storage adding up their population can add up to 1203 employed personnel making about 56.3% over the whole population of employed personnel in Services alone. This can indicate that these 3 sub industries are the most contributors to the overall population of Service sector.
Identifying the Leading Employment Sub-sector Across all Major Sectors.
# using dataset "dist.2" from earlier# Agricultureagri <- dist.2%>%filter(major =="Agriculture") %>%mutate(sub_sector =case_when( Industries %in%1:2~"Agriculture and Forestry", Industries ==3~"Fishing and Aquaculture",TRUE~NA_character_ ))# Industryindustry <- dist.2%>%filter(major =="Industry") %>%mutate(sub_sector =case_when( Industries %in%5:9~"Mining and Quarrying", Industries %in%10:33~"Manufacturing", Industries ==35~"Electricity, Gas, Steam, AC Supply", Industries %in%36:39~"Water Supply, Sewerage, Waste Management", Industries %in%41:43~"Construction",TRUE~NA_character_ ))# Servicesservices <- dist.2%>%filter(major =="Services") %>%mutate(sub_sector =case_when( Industries %in%45:47~"Wholesale and Retail Trade", Industries %in%49:53~"Transportation and Storage", Industries %in%55:56~"Accommodation and Food Services", Industries %in%58:63~"Information and Communication", Industries %in%64:66~"Financial and Insurance Activities", Industries ==68~"Real Estate Activities", Industries %in%69:75~"Professional, Scientific, and Technical Activities", Industries %in%77:82~"Administrative and Support Services", Industries ==84~"Public Administration and Defense", Industries ==85~"Education", Industries %in%86:88~"Human Health and Social Work", Industries %in%90:93~"Arts, Entertainment and Recreation", Industries %in%94:96~"Other Service Activities", Industries %in%97:98~"Household Employers", Industries ==99~"Extraterritorial Orgs and Bodies",TRUE~NA_character_ ))all <-bind_rows(agri, industry, services)freq_table <- all %>%filter(!is.na(sub_sector)) %>%count(major, sub_sector)total_row <-data.frame(major ="Total",sub_sector ="-",n =sum(freq_table$n))freq_table <-bind_rows(freq_table, total_row)pander(freq_table)
major
sub_sector
n
Agriculture
Agriculture and Forestry
2
Agriculture
Fishing and Aquaculture
7
Industry
Construction
197
Industry
Electricity, Gas, Steam, AC Supply
3
Industry
Manufacturing
220
Industry
Mining and Quarrying
1
Industry
Water Supply, Sewerage, Waste Management
6
Services
Accommodation and Food Services
213
Services
Administrative and Support Services
344
Services
Arts, Entertainment and Recreation
25
Services
Education
60
Services
Financial and Insurance Activities
70
Services
Human Health and Social Work
60
Services
Information and Communication
73
Services
Other Service Activities
196
Services
Professional, Scientific, and Technical Activities
49
Services
Public Administration and Defense
154
Services
Real Estate Activities
32
Services
Transportation and Storage
273
Services
Wholesale and Retail Trade
584
Total
-
2569
Now we graph it to visualize the Frequency of Employed Persons by Sub-sector and Major Sector
freq_table2 <- freq_table %>%filter(major !="Total") # removes the row where 'major' is "Total"library(forcats)ggplot( freq_table2, aes(x =fct_rev(fct_reorder(sub_sector, -n)), y = n, fill = sub_sector )) +geom_col(position =position_dodge(width =1)) +facet_wrap(~ major, scales ="free") +coord_flip() +labs(title ="Frequency of Employed Persons by Sub-sector and Major Sector",x ="Sub-sector", y ="Number of Persons",fill ="Sub-sector" ) +theme_minimal() +theme(plot.title =element_text(size =38, face ="bold"),axis.text.x =element_text(angle =45, hjust =1, size =25),axis.text.y =element_text(angle =70, hjust =1, size =25),legend.position ="none" )
Faceted bar graph shows the breakdown of employed individuals by different sub-sectors, classified by main sectors. Different facets depict various major sectors, enabling comparison of sub-sectoral employment both within and between sectors to be seen more clearly.
Under the Agriculture, Forestry, and Fishing sector, the highest employment is in Fishing and Aquaculture. Agriculture and Forestry also participate but to a lesser degree.
The Industry sector indicates high labor in Manufacturing and Construction, with overwhelming dominance of the group’s workforce. Other sub-sectors such as Water Supply and Electricity, Gas, Steam, and AC Supply have comparatively low employment.
In the Services sector, there is much diversification of employment. The Wholesale and Retail Trade sub-sector employs the largest number of people, followed by Administrative and Support Services, Transportation and Storage, and Accommodation and Food Services. Some other smaller sub-sectors also make contributions, including Financial and Insurance Activities, Education, and Public Administration.
In general, the Services sector also seems to be the most populous by employment, signaling a robust trend towards service job employment in the labor market. The distinction between color-coded bars and independent facets facilitates interpretability, allowing for easier observation of which sub-sectors predominate within each of the main economic sectors.
Notes
A logistic regression model was tested to assess the predictive power of educational attainment on employment status. However, due to high insignificance across levels, likely caused by category imbalance and sample constraints, the results were excluded from the main analysis.