The potential spread of Artificial Intelligence has influenced various aspects of our lives. Among them, we point out the possible influence of AI on the labour market. In an interesting study, Yam et al. (2023) proposed a framework to study regional patterns in terms of job anxiety related to technology adoption. They correlate the Job Insecurity measure as “the frequency at which people searched for job-recruiting sites. We collected data for searches on the five most popular job-recruiting sites in the United States: LinkedIn, Glassdoor, ZipRecruiter, Indeed, and Monster.” with data on technological development (Yam et al., 2023). The authors have used the number of installed robots per worker for the technological contribution.
Currently, the world is more bothered by the applications of AI rather than Industrial Robots. Thus, the goal of this project is to explore if the change in labour market conditions was related to two concepts:
Job insecurity, measured as the frequency of searching “Oferty Pracy” in Google Poland
Artificial intelligence interest, measured as the frequency of searching “Sztuczna Inteligencja” in Google Poland
I will use a rich dataset from Google Trends linked to regional data from the Polish Statistics Office to achieve this goal. I will use standard econometric solutions and spatial models with unsupervised learning methods. My primary outcome is the change in the relative position of municipalities in unemployment over 2012 and 2022. I find that whereas job insecurity explains the change in unemployment, AI popularity does not. However, the Geographically Weighted Regression coefficients clustering results identified Southern-Eastern Poland as an area with a negative change in labour market outcomes, accompanied by job insecurity and low interest in Artificial Intelligence. The results are interesting in finding whether there are similarities in technology adoption, technology interest, and active labour market policy effectiveness.
The document will proceed as follows. Firstly, I describe the data collection and data cleaning issues. In chapter three, I introduce Linear and Geographically Weighted Regression. Finally, I compare the clusters in the last part of the project. In conclusion, I point out the possible limitations of the project.
I begin by reading the data for AI and work search. Google Trends scales its search data from 0 to 100, making individual data points non-interpretable, but variations across geographic regions are meaningful. However, there is limited data on searches; users can access information on searches from 200 cities for free.
To address this limitation, I employ data imputation based on geo-coordinates to obtain a complete dataset. Additionally, I use data on the share of employees in the manufacturing industry as a proxy for a routine task indicator. Indeed, in the literature, this measure is strongly correlated with technology adoption (see Acemolgu & Restrepo, 2020; Acemoglu & Restrepo, 2022).
# Clean the environment
rm(list = ls())
# Read the data from Google Trends on Work
setwd("C:/Users/wojci/OneDrive/WSZ/Ekonometria Przestrzenna w R/")
# Read the data drom Google Trends on AI
ai <- read_csv("PROJEKT/AI_Data.csv")
## Rows: 200 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Point
## dbl (3): Long, Lat, AI
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
work <- read_csv("PROJEKT/Work_Data.csv")
## Rows: 200 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Point
## dbl (3): Long, Lat, Work
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Manufacture_data <- read_csv("PROJEKT/Manufacture_data.csv")
## Rows: 380 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): jpt_kod_je
## dbl (4): workers, share_man, Lat, Long
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
unempl_data <- read_csv("PROJEKT/Unempl_data.csv")
## Rows: 380 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): jpt_kod_je, powiat_name
## dbl (3): unempl22, unempl12, unempl_diff
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Read Municipal Geo Data
POW<-st_read("C:/Users/wojci/OneDrive/WSZ/Ekonometria Przestrzenna w R/DANE_DO_ZAJEC/powiaty/powiaty.shp", stringsAsFactors = FALSE)
## Reading layer `powiaty' from data source
## `C:\Users\wojci\OneDrive\WSZ\Ekonometria Przestrzenna w R\DANE_DO_ZAJEC\powiaty\powiaty.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 380 features and 29 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 171677.6 ymin: 133223.7 xmax: 861895.7 ymax: 774923.7
## Projected CRS: ETRS89 / Poland CS92
POW$jpt_nazwa_<-iconv(POW$jpt_nazwa_, "latin1", "UTF-8")
# Transform the projection in the data
POW<-st_transform(POW, 4326)
# checking the properties of object (like summary())
st_geometry(POW)
## Geometry set for 380 features
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 14.12289 ymin: 49.00205 xmax: 24.14578 ymax: 54.83579
## Geodetic CRS: WGS 84
## First 5 geometries:
## MULTIPOLYGON (((21.96675 50.32617, 21.96687 50....
## MULTIPOLYGON (((22.25364 49.70981, 22.25368 49....
## MULTIPOLYGON (((22.23235 49.92066, 22.233 49.92...
## MULTIPOLYGON (((21.46418 49.99318, 21.46426 49....
## MULTIPOLYGON (((21.7467 49.97089, 21.74691 49.9...
# Converse the data from SF to SP
POW.sp<-as_Spatial(POW, cast = TRUE, IDs ="jpt_kod_je")
# Plot the grid to see if it works
plot(POW.sp)
Below, one can see that the scope of the Google Trends Data is limited. In some cases, we obtain information for few cities within a municipality.
# Perfrect! Now we will move to matching the data points to the municipal data
AI.points <- ai %>%
sf::st_as_sf(coords = c("Long","Lat"), crs=4326) # transform to sf object & WGS84 CRS
Work.points <- work %>%
sf::st_as_sf(coords = c("Long","Lat"), crs=4326) # transform to sf object & WGS84 CRS
# AI overview
ggplot() +
geom_sf(data = POW) +
geom_sf(data = AI.points, aes(color = AI))
# Work overview
ggplot() +
geom_sf(data = POW) +
geom_sf(data = Work.points, aes(color = Work))
We can clearly see that there are some data missing for the municipalities (e.g. see Northern Western Poland). Firstly I will calculate averages for the municipalities with accessible data. In the next step, I will data imputation to get rid of missings.
ai_trans <- st_transform(AI.points, crs=4326)
work_trans <- st_transform(Work.points, crs=4326)
pow_trans <- st_transform(POW, crs=4326)
ai_pow <- sf::st_join(pow_trans, ai_trans)
work_pow <- sf::st_join(pow_trans, work_trans)
ai_pow <- ai_pow %>%
group_by(jpt_kod_je) %>%
select(jpt_kod_je, jpt_nazwa_, AI) %>%
summarise(AI = mean(AI, na.rm = T)) %>%
st_drop_geometry()
work_pow <- work_pow %>%
group_by(jpt_kod_je) %>%
select(jpt_kod_je, jpt_nazwa_, Work) %>%
summarise(work = mean(Work,na.rm = T)) %>%
st_drop_geometry()
Manufacture_data <- left_join(Manufacture_data, unempl_data, by = "jpt_kod_je")
work_pow <- left_join(work_pow, Manufacture_data, by = "jpt_kod_je")
work_ai_pow <- left_join(work_pow, ai_pow, by = "jpt_kod_je")
Post-imputation dataset are presented below.
# Plot the graph for WORK
work_search_imputed <- ggplot() +
geom_sf(data = work_pow_post, aes(fill = work)) +
theme_classic()+
scale_fill_viridis()
# Plot the graph for AI
ai_search_imputed <- ggplot() +
geom_sf(data = work_pow_post, aes(fill = AI)) +
theme_classic() +
scale_fill_viridis()
gridExtra::grid.arrange(work_search_imputed, ai_search_imputed, ncol = 2)
To justify the spatial modeling, I run Moran’s test. The null hypothesis in Moran’s Test assumes random spatial distribution of the variable of choice.
# spatial weights matrix – from sf
crds.work <- st_centroid(st_geometry(work_pow_post)) # centroids
knn.work <- knearneigh(crds.work, k=5) # 5 nearest points
work.knn.nb.sf <- knn2nb(knn.work)
work.knn.sym.listw <- nb2listw(make.sym.nb(work.knn.nb.sf))
# Start with Moran test for spatial relationship
moran.test(work_pow_post$AI, work.knn.sym.listw)
##
## Moran I test under randomisation
##
## data: work_pow_post$AI
## weights: work.knn.sym.listw
##
## Moran I statistic standard deviate = 2.9114, p-value = 0.001799
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.0846016255 -0.0026385224 0.0008978883
moran.test(work_pow_post$work, work.knn.sym.listw)
##
## Moran I test under randomisation
##
## data: work_pow_post$work
## weights: work.knn.sym.listw
##
## Moran I statistic standard deviate = -3.676, p-value = 0.9999
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## -0.1131673340 -0.0026385224 0.0009040595
moran.test(work_pow_post$unempl_diff, work.knn.sym.listw)
##
## Moran I test under randomisation
##
## data: work_pow_post$unempl_diff
## weights: work.knn.sym.listw
##
## Moran I statistic standard deviate = 11.119, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.3305120110 -0.0026385224 0.0008976921
moran.test(work_pow_post$share_man, work.knn.sym.listw)
##
## Moran I test under randomisation
##
## data: work_pow_post$share_man
## weights: work.knn.sym.listw
##
## Moran I statistic standard deviate = -1.8776, p-value = 0.9698
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## -0.0591017873 -0.0026385224 0.0009043011
Interestingly, we reject the null hypothesis in the case of AI search and Unemployment long difference. However, there is some evidence on random spatial distribution in the case of Work search and share of employees in manufacturing. Nevertheless we will proceed for the further spatial analysis.
I begin with a linear regression, assuming no spatial relation between the studied objects. My focus is on investigating whether the difference in unemployment between 2012 and 2022 is reflected in Google searches related to work (Pol. Oferty Pracy, Praca), AI (Pol. Sztuczna Inteligencja), and the interaction between these searches.
The inclusion of the interaction term is justified by the possibility that municipalities where only artificial intelligence is frequently searched may experience different labor market changes compared to those where both artificial intelligence and work are often searched.
The dependent variable in the regression is the change in the relative position in terms of the unemployment rate. In other words, a positive value of the dependent variable indicates a decline in the municipality’s unemployment position between 2012 and 2022. For instance, in 2012 in Boleslawiec, the unemployment rate was equal to 103% of the national average, while it decreased to 59% of the national average in 2022. Thus, the outcome is equal to -43.4, which is a proxy for the magnitude of the change.
formula <- unempl_diff ~ share_man + AI + work + AI:work
model_lm <- lm(formula = formula, data= work_pow_post, weights = workers)
summary(model_lm)
##
## Call:
## lm(formula = formula, data = work_pow_post, weights = workers)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -29218 -2443 0 4032 34596
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.830e-01 8.981e+00 -0.043 0.966
## share_man -2.159e+02 3.268e+01 -6.607 1.71e-10 ***
## AI -3.160e-01 2.200e-01 -1.436 0.152
## work 1.078e+00 1.540e-01 6.998 1.61e-11 ***
## AI:work -7.103e-04 2.600e-03 -0.273 0.785
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9002 on 310 degrees of freedom
## Multiple R-squared: 0.1727, Adjusted R-squared: 0.162
## F-statistic: 16.18 on 4 and 310 DF, p-value: 4.813e-12
The results show significant importance of job search in explaining the unemployment difference between years. The job insecurity was positively associated with the increase in unemployment. In contrast, We find no effect for the AI search.
In this part I use GWR, to run local regression and see if there is some spatial heterogeneity.
work.sp <- as(work_pow_post, 'Spatial')
summary(work.sp)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
## min max
## x 14.12289 24.14578
## y 49.00205 54.83579
## Is projected: FALSE
## proj4string : [+proj=longlat +datum=WGS84 +no_defs]
## Data attributes:
## jpt_kod_je.x work workers share_man
## Length:380 Min. : 0.00 Min. : 0 Min. :0.0000
## Class :character 1st Qu.: 42.00 1st Qu.: 14161 1st Qu.:0.1168
## Mode :character Median : 52.00 Median : 27028 Median :0.2011
## Mean : 53.08 Mean : 39918 Mean :0.2029
## 3rd Qu.: 63.00 3rd Qu.: 46062 3rd Qu.:0.2783
## Max. :158.50 Max. :904745 Max. :0.7178
## AI unempl22 unempl12 unempl_diff
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. :-89.50
## 1st Qu.: 0.00 1st Qu.: 64.92 1st Qu.: 73.0 1st Qu.: -7.65
## Median : 0.00 Median :117.30 Median :116.0 Median : 1.35
## Mean : 12.92 Mean :139.69 Mean :122.5 Mean : 17.23
## 3rd Qu.: 23.50 3rd Qu.:201.90 3rd Qu.:173.0 3rd Qu.: 30.73
## Max. :121.00 Max. :526.90 Max. :394.0 Max. :231.50
dMat <- gw.dist(cbind(work_pow$Lat, work_pow$Long)) # coordinates not from sp
bw_gwr <- GWmodel::bw.gwr(formula, data = work.sp, kernel='gaussian', adaptive = TRUE, dMat = dMat)
## Adaptive bandwidth: 242 CV score: 704676.2
## Adaptive bandwidth: 158 CV score: 685359.3
## Adaptive bandwidth: 104 CV score: 664478.7
## Adaptive bandwidth: 73 CV score: 641802.3
## Adaptive bandwidth: 51 CV score: 622865.8
## Adaptive bandwidth: 40 CV score: 608371.9
## Adaptive bandwidth: 30 CV score: 595399.8
## Adaptive bandwidth: 27 CV score: 592799
## Adaptive bandwidth: 22 CV score: 582286.8
## Adaptive bandwidth: 22 CV score: 582286.8
modelGWR <- GWmodel::gwr.basic(formula, data = work.sp, kernel='gaussian', adaptive = TRUE, dMat = dMat, bw = bw_gwr)
modelGWR.df<-as.data.frame(modelGWR$SDF)
work_pow_post$work_coef <- modelGWR.df$work
work_pow_post$ai_coef <- modelGWR.df$AI
work_pow_post$ai_work_coef <- modelGWR.df$`AI:work`
In the second step, we visualize the results
work_coef_plot <- ggplot() +
geom_sf(data = work_pow_post, aes(fill = work_coef)) +
theme_bw() +
theme(text = element_text(color = "grey20", size = 12),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
legend.position = "right",
legend.key.width = unit(1.5, 'cm')) +
scale_fill_viridis() +
labs(fill = "Work Coefficient")
ai_coef_plot <- ggplot() +
geom_sf(data = work_pow_post, aes(fill = ai_coef)) +
theme_classic()+
scale_fill_viridis() +
theme_bw() +
theme(text = element_text(color = "grey20", size = 12),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
legend.position = "right",
legend.key.width = unit(1.5, 'cm')) +
scale_fill_viridis() +
labs(fill = "AI Coefficient")
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
work_ai_coef <- ggplot() +
geom_sf(data = work_pow_post, aes(fill = ai_work_coef)) +
theme_classic()+
scale_fill_viridis() +
theme_bw() +
theme(text = element_text(color = "grey20", size = 12),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
legend.position = "right",
legend.key.width = unit(1.5, 'cm')) +
scale_fill_viridis() +
labs(fill = "Work::AI Coefficient")
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
gridExtra::grid.arrange(gridExtra::arrangeGrob(work_coef_plot, ai_coef_plot), work_ai_coef, ncol = 2 )
We observe that both in terms of work and AI coefficients, Northern-Western Poland is characterized by a higher significance of these factors in the case of increased unemployment. In contrast, in Southern-Eastern Poland, the importance of AI and work search is less associated with a decrease in unemployment.
There are some outlier patterns in the case of work searches; Opole’s neighboring municipalities, Grodzisk Wielkopolski, and Bielsk Podlaski, are characterized by significantly different outcomes. In the former two, the importance of work search was as pronounced as in Southern-Eastern Poland. On the contrary, in Bielsk Podlaski, the association between work search and unemployment change was small and negative.
Since, in general the coefficients are well-beahing in terms of smoothing, I use k-means to cluster the coefficients.
# Select data for clustering
work_cluster <- work_pow_post[c("work_coef", "ai_coef", "ai_work_coef")] %>%
sf::st_drop_geometry()
# we set the maximum number of clusters
k.max <- 20
work_cluster_scaled <- scale(work_cluster)
wss_gwr <- fviz_nbclust(work_cluster_scaled, kmeans, method = "wss")
sill_gwr <-fviz_nbclust(work_cluster_scaled, kmeans, method = "silhouette")
gridExtra::grid.arrange(wss_gwr, sill_gwr, ncol=2)
Based on WSS and Silhouette methods, I would suggest 3 clusters.
clust_gwr <- kmeans(work_cluster_scaled, 3, nstart = 25)
clust_gwr_graph <- fviz_cluster(clust_gwr, work_cluster_scaled, frame = FALSE, geom = "point")
It seems that clusters are fairly different.
work_pow_post$cluster_coeffs <- clust_gwr$cluster
work_ai_cluster <- ggplot() +
geom_sf(data = work_pow_post, aes(fill = as.factor(cluster_coeffs))) +
theme_classic()+
scale_fill_viridis_d() +
theme_bw() +
theme(text = element_text(color = "grey20", size = 12),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
legend.position = "right",
legend.key.width = unit(1.5, 'cm')) +
labs(fill = "Clutser")
work_ai_cluster
work_pow_post %>%
group_by(cluster_coeffs) %>%
summarise(Unempl2022 = weighted.mean(unempl22, w =workers),
Unempl2012 = weighted.mean(unempl12, w =workers),
Work_Search = weighted.mean(work, w =workers),
AI_Search = weighted.mean(AI, w =workers))
## Simple feature collection with 3 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 14.12289 ymin: 49.00205 xmax: 24.14578 ymax: 54.83579
## Geodetic CRS: WGS 84
## # A tibble: 3 × 6
## cluster_coeffs Unempl2022 Unempl2012 Work_Search AI_Search
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 236. 165. 93.2 23.1
## 2 2 168. 145. 69.9 32.7
## 3 3 123. 133. 69.8 27.0
## # ℹ 1 more variable: geometry <MULTIPOLYGON [°]>
The results can be summarized:
Structurally, the order in terms of the Unemployment Structure did not change over the years.
Cluster 3 is characterized by the lowest unemployment rate, which even decreased (in relation to the average) over the years. At the same time, the google search was moderate in terms of work and AI.
In cluster 2, the unemployment position increased over the time. The work search was similar as in cluster 3.
Cluster 1 standouts in terms of unemployment in 2022. On average, in cluster 1, the unemployment rate was more than two times larger than in Poland.
In 2012, the distance between clusters in terms of average unemployment positions was relatively smaller than in 2022.
In cluster 1, individuals fairly often look for jobs online. This may serve as low frequency of using internet for job advertisement in cluster 1.
At the same time, the interest in AI was fairly low in cluster 1.
The results demonstrate, in an exploratory manner, the relationship between job insecurity and changes in unemployment. However, it is still uncertain whether this relationship is causal in nature. Developing spatial econometric methods that allow for quasi-experimental manipulation would be extremely important in assessing the impact of job insecurity on job creation and separation. In particular, labor offices would benefit from insights into the impact of digital infrastructure on the probability of finding employment. Simultaneously, the study should incorporate information on employers’ strategies, especially in terms of the dominant strategy for job advertisements. Currently, further use of methods such as Causal Spatial Forests seems like a good idea.
The datsaset used in the final analysis can be downloaded via clicking on the button.
library(downloadthis)
work_pow_post %>%
sf::st_drop_geometry() %>%
download_this(
output_name = "FinalData_SpatialEconometrics_Szymczak_AI_Work_Unempl",
output_extension = ".csv",
button_label = "Download data",
button_type = "warning",
has_icon = TRUE,
icon = "fa fa-save"
)
Acemoglu, D., & Restrepo, P. (2020). Robots and jobs: Evidence from US labor markets. Journal of political economy, 128(6), 2188-2244.
Acemoglu, D., & Restrepo, P. (2022). Tasks, automation, and the rise in us wage inequality. Econometrica, 90(5), 1973-2016.
Yam, K. C., Tang, P. M., Jackson, J. C., Su, R., & Gray, K. (2023). The rise of robots increases job insecurity and maladaptive workplace behaviors: Multimethod evidence. Journal of Applied Psychology, 108(5), 850.