1. Introduction

The potential spread of Artificial Intelligence has influenced various aspects of our lives. Among them, we point out the possible influence of AI on the labour market. In an interesting study, Yam et al. (2023) proposed a framework to study regional patterns in terms of job anxiety related to technology adoption. They correlate the Job Insecurity measure as “the frequency at which people searched for job-recruiting sites. We collected data for searches on the five most popular job-recruiting sites in the United States: LinkedIn, Glassdoor, ZipRecruiter, Indeed, and Monster.” with data on technological development (Yam et al., 2023). The authors have used the number of installed robots per worker for the technological contribution.

Currently, the world is more bothered by the applications of AI rather than Industrial Robots. Thus, the goal of this project is to explore if the change in labour market conditions was related to two concepts:

  • Job insecurity, measured as the frequency of searching “Oferty Pracy” in Google Poland

  • Artificial intelligence interest, measured as the frequency of searching “Sztuczna Inteligencja” in Google Poland

I will use a rich dataset from Google Trends linked to regional data from the Polish Statistics Office to achieve this goal. I will use standard econometric solutions and spatial models with unsupervised learning methods. My primary outcome is the change in the relative position of municipalities in unemployment over 2012 and 2022. I find that whereas job insecurity explains the change in unemployment, AI popularity does not. However, the Geographically Weighted Regression coefficients clustering results identified Southern-Eastern Poland as an area with a negative change in labour market outcomes, accompanied by job insecurity and low interest in Artificial Intelligence. The results are interesting in finding whether there are similarities in technology adoption, technology interest, and active labour market policy effectiveness.

The document will proceed as follows. Firstly, I describe the data collection and data cleaning issues. In chapter three, I introduce Linear and Geographically Weighted Regression. Finally, I compare the clusters in the last part of the project. In conclusion, I point out the possible limitations of the project.

2. Data Cleaning

I begin by reading the data for AI and work search. Google Trends scales its search data from 0 to 100, making individual data points non-interpretable, but variations across geographic regions are meaningful. However, there is limited data on searches; users can access information on searches from 200 cities for free.

To address this limitation, I employ data imputation based on geo-coordinates to obtain a complete dataset. Additionally, I use data on the share of employees in the manufacturing industry as a proxy for a routine task indicator. Indeed, in the literature, this measure is strongly correlated with technology adoption (see Acemolgu & Restrepo, 2020; Acemoglu & Restrepo, 2022).

# Clean the environment 
rm(list = ls())
# Read the data from Google Trends on Work 
setwd("C:/Users/wojci/OneDrive/WSZ/Ekonometria Przestrzenna w R/")

# Read the data drom Google Trends on AI   
ai <- read_csv("PROJEKT/AI_Data.csv")
## Rows: 200 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Point
## dbl (3): Long, Lat, AI
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
work <- read_csv("PROJEKT/Work_Data.csv")
## Rows: 200 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Point
## dbl (3): Long, Lat, Work
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Manufacture_data <- read_csv("PROJEKT/Manufacture_data.csv")
## Rows: 380 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): jpt_kod_je
## dbl (4): workers, share_man, Lat, Long
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
unempl_data <- read_csv("PROJEKT/Unempl_data.csv")
## Rows: 380 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): jpt_kod_je, powiat_name
## dbl (3): unempl22, unempl12, unempl_diff
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Read Municipal Geo Data  
POW<-st_read("C:/Users/wojci/OneDrive/WSZ/Ekonometria Przestrzenna w R/DANE_DO_ZAJEC/powiaty/powiaty.shp", stringsAsFactors = FALSE)
## Reading layer `powiaty' from data source 
##   `C:\Users\wojci\OneDrive\WSZ\Ekonometria Przestrzenna w R\DANE_DO_ZAJEC\powiaty\powiaty.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 380 features and 29 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 171677.6 ymin: 133223.7 xmax: 861895.7 ymax: 774923.7
## Projected CRS: ETRS89 / Poland CS92
POW$jpt_nazwa_<-iconv(POW$jpt_nazwa_, "latin1", "UTF-8")

# Transform the projection in the data 
POW<-st_transform(POW, 4326) 

# checking the properties of object (like summary())
st_geometry(POW)
## Geometry set for 380 features 
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 14.12289 ymin: 49.00205 xmax: 24.14578 ymax: 54.83579
## Geodetic CRS:  WGS 84
## First 5 geometries:
## MULTIPOLYGON (((21.96675 50.32617, 21.96687 50....
## MULTIPOLYGON (((22.25364 49.70981, 22.25368 49....
## MULTIPOLYGON (((22.23235 49.92066, 22.233 49.92...
## MULTIPOLYGON (((21.46418 49.99318, 21.46426 49....
## MULTIPOLYGON (((21.7467 49.97089, 21.74691 49.9...
# Converse the data from SF to SP
POW.sp<-as_Spatial(POW, cast = TRUE, IDs ="jpt_kod_je")

# Plot the grid to see if it works 
plot(POW.sp)

Below, one can see that the scope of the Google Trends Data is limited. In some cases, we obtain information for few cities within a municipality.

# Perfrect! Now we will move to matching the data points to the municipal data 
AI.points <- ai %>% 
  sf::st_as_sf(coords = c("Long","Lat"), crs=4326) # transform to sf object & WGS84 CRS
Work.points <- work %>% 
  sf::st_as_sf(coords = c("Long","Lat"), crs=4326) # transform to sf object & WGS84 CRS


# AI overview
ggplot() +
  geom_sf(data = POW) +
  geom_sf(data = AI.points, aes(color = AI)) 

# Work overview
ggplot() +
  geom_sf(data = POW) +
  geom_sf(data = Work.points, aes(color = Work)) 

We can clearly see that there are some data missing for the municipalities (e.g. see Northern Western Poland). Firstly I will calculate averages for the municipalities with accessible data. In the next step, I will data imputation to get rid of missings.

ai_trans <- st_transform(AI.points, crs=4326)
work_trans <- st_transform(Work.points, crs=4326)
pow_trans <- st_transform(POW, crs=4326)

ai_pow <- sf::st_join(pow_trans, ai_trans)
work_pow <- sf::st_join(pow_trans, work_trans)

ai_pow <- ai_pow %>%  
  group_by(jpt_kod_je) %>% 
  select(jpt_kod_je, jpt_nazwa_, AI) %>% 
  summarise(AI = mean(AI, na.rm = T)) %>% 
  st_drop_geometry()


work_pow <- work_pow %>% 
  group_by(jpt_kod_je) %>% 
  select(jpt_kod_je, jpt_nazwa_, Work) %>% 
  summarise(work = mean(Work,na.rm = T)) %>% 
  st_drop_geometry()
Manufacture_data <- left_join(Manufacture_data, unempl_data, by = "jpt_kod_je")
work_pow <- left_join(work_pow, Manufacture_data, by =  "jpt_kod_je")
work_ai_pow <- left_join(work_pow, ai_pow, by =  "jpt_kod_je")

Post-imputation dataset are presented below.

# Plot the graph for WORK
work_search_imputed <- ggplot() +
  geom_sf(data = work_pow_post, aes(fill = work)) + 
  theme_classic()+
  scale_fill_viridis()


# Plot the graph for AI
ai_search_imputed <- ggplot() +
  geom_sf(data = work_pow_post, aes(fill = AI)) + 
  theme_classic() +
  scale_fill_viridis()

gridExtra::grid.arrange(work_search_imputed, ai_search_imputed, ncol = 2)

3. Geographical weighted regression

3.1 Spatial Correlation

To justify the spatial modeling, I run Moran’s test. The null hypothesis in Moran’s Test assumes random spatial distribution of the variable of choice.

# spatial weights matrix – from sf
crds.work <- st_centroid(st_geometry(work_pow_post)) # centroids
knn.work <- knearneigh(crds.work, k=5)     # 5 nearest points
work.knn.nb.sf <- knn2nb(knn.work)
work.knn.sym.listw <- nb2listw(make.sym.nb(work.knn.nb.sf))

# Start with Moran test for spatial relationship
moran.test(work_pow_post$AI, work.knn.sym.listw)
## 
##  Moran I test under randomisation
## 
## data:  work_pow_post$AI  
## weights: work.knn.sym.listw    
## 
## Moran I statistic standard deviate = 2.9114, p-value = 0.001799
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      0.0846016255     -0.0026385224      0.0008978883
moran.test(work_pow_post$work, work.knn.sym.listw)
## 
##  Moran I test under randomisation
## 
## data:  work_pow_post$work  
## weights: work.knn.sym.listw    
## 
## Moran I statistic standard deviate = -3.676, p-value = 0.9999
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##     -0.1131673340     -0.0026385224      0.0009040595
moran.test(work_pow_post$unempl_diff, work.knn.sym.listw)
## 
##  Moran I test under randomisation
## 
## data:  work_pow_post$unempl_diff  
## weights: work.knn.sym.listw    
## 
## Moran I statistic standard deviate = 11.119, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      0.3305120110     -0.0026385224      0.0008976921
moran.test(work_pow_post$share_man, work.knn.sym.listw)
## 
##  Moran I test under randomisation
## 
## data:  work_pow_post$share_man  
## weights: work.knn.sym.listw    
## 
## Moran I statistic standard deviate = -1.8776, p-value = 0.9698
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##     -0.0591017873     -0.0026385224      0.0009043011

Interestingly, we reject the null hypothesis in the case of AI search and Unemployment long difference. However, there is some evidence on random spatial distribution in the case of Work search and share of employees in manufacturing. Nevertheless we will proceed for the further spatial analysis.

3.2 Linear Regression

I begin with a linear regression, assuming no spatial relation between the studied objects. My focus is on investigating whether the difference in unemployment between 2012 and 2022 is reflected in Google searches related to work (Pol. Oferty Pracy, Praca), AI (Pol. Sztuczna Inteligencja), and the interaction between these searches.

The inclusion of the interaction term is justified by the possibility that municipalities where only artificial intelligence is frequently searched may experience different labor market changes compared to those where both artificial intelligence and work are often searched.

The dependent variable in the regression is the change in the relative position in terms of the unemployment rate. In other words, a positive value of the dependent variable indicates a decline in the municipality’s unemployment position between 2012 and 2022. For instance, in 2012 in Boleslawiec, the unemployment rate was equal to 103% of the national average, while it decreased to 59% of the national average in 2022. Thus, the outcome is equal to -43.4, which is a proxy for the magnitude of the change.

formula <- unempl_diff ~  share_man + AI + work + AI:work
model_lm <- lm(formula = formula, data= work_pow_post, weights = workers)
summary(model_lm)
## 
## Call:
## lm(formula = formula, data = work_pow_post, weights = workers)
## 
## Weighted Residuals:
##    Min     1Q Median     3Q    Max 
## -29218  -2443      0   4032  34596 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.830e-01  8.981e+00  -0.043    0.966    
## share_man   -2.159e+02  3.268e+01  -6.607 1.71e-10 ***
## AI          -3.160e-01  2.200e-01  -1.436    0.152    
## work         1.078e+00  1.540e-01   6.998 1.61e-11 ***
## AI:work     -7.103e-04  2.600e-03  -0.273    0.785    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9002 on 310 degrees of freedom
## Multiple R-squared:  0.1727, Adjusted R-squared:  0.162 
## F-statistic: 16.18 on 4 and 310 DF,  p-value: 4.813e-12

The results show significant importance of job search in explaining the unemployment difference between years. The job insecurity was positively associated with the increase in unemployment. In contrast, We find no effect for the AI search.

3.3 Geographically Weighted Regression

In this part I use GWR, to run local regression and see if there is some spatial heterogeneity.

work.sp <- as(work_pow_post, 'Spatial')
summary(work.sp) 
## Object of class SpatialPolygonsDataFrame
## Coordinates:
##        min      max
## x 14.12289 24.14578
## y 49.00205 54.83579
## Is projected: FALSE 
## proj4string : [+proj=longlat +datum=WGS84 +no_defs]
## Data attributes:
##  jpt_kod_je.x            work           workers         share_man     
##  Length:380         Min.   :  0.00   Min.   :     0   Min.   :0.0000  
##  Class :character   1st Qu.: 42.00   1st Qu.: 14161   1st Qu.:0.1168  
##  Mode  :character   Median : 52.00   Median : 27028   Median :0.2011  
##                     Mean   : 53.08   Mean   : 39918   Mean   :0.2029  
##                     3rd Qu.: 63.00   3rd Qu.: 46062   3rd Qu.:0.2783  
##                     Max.   :158.50   Max.   :904745   Max.   :0.7178  
##        AI            unempl22         unempl12      unempl_diff    
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.0   Min.   :-89.50  
##  1st Qu.:  0.00   1st Qu.: 64.92   1st Qu.: 73.0   1st Qu.: -7.65  
##  Median :  0.00   Median :117.30   Median :116.0   Median :  1.35  
##  Mean   : 12.92   Mean   :139.69   Mean   :122.5   Mean   : 17.23  
##  3rd Qu.: 23.50   3rd Qu.:201.90   3rd Qu.:173.0   3rd Qu.: 30.73  
##  Max.   :121.00   Max.   :526.90   Max.   :394.0   Max.   :231.50
dMat <- gw.dist(cbind(work_pow$Lat, work_pow$Long)) # coordinates not from sp
bw_gwr <- GWmodel::bw.gwr(formula, data = work.sp, kernel='gaussian', adaptive = TRUE, dMat = dMat)
## Adaptive bandwidth: 242 CV score: 704676.2 
## Adaptive bandwidth: 158 CV score: 685359.3 
## Adaptive bandwidth: 104 CV score: 664478.7 
## Adaptive bandwidth: 73 CV score: 641802.3 
## Adaptive bandwidth: 51 CV score: 622865.8 
## Adaptive bandwidth: 40 CV score: 608371.9 
## Adaptive bandwidth: 30 CV score: 595399.8 
## Adaptive bandwidth: 27 CV score: 592799 
## Adaptive bandwidth: 22 CV score: 582286.8 
## Adaptive bandwidth: 22 CV score: 582286.8
modelGWR <- GWmodel::gwr.basic(formula, data = work.sp, kernel='gaussian', adaptive = TRUE, dMat = dMat, bw = bw_gwr)  

modelGWR.df<-as.data.frame(modelGWR$SDF)
work_pow_post$work_coef <- modelGWR.df$work
work_pow_post$ai_coef <- modelGWR.df$AI
work_pow_post$ai_work_coef <- modelGWR.df$`AI:work`

In the second step, we visualize the results

work_coef_plot <- ggplot() +
  geom_sf(data = work_pow_post, aes(fill = work_coef)) +
  theme_bw() +
  theme(text = element_text(color = "grey20", size = 12),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        legend.position = "right",
        legend.key.width = unit(1.5, 'cm')) +
  scale_fill_viridis() +
   labs(fill = "Work Coefficient")


ai_coef_plot <- ggplot() +
  geom_sf(data = work_pow_post, aes(fill = ai_coef)) + 
  theme_classic()+
  scale_fill_viridis() +
  theme_bw() +
  theme(text = element_text(color = "grey20", size = 12),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        legend.position = "right",
        legend.key.width = unit(1.5, 'cm')) +
  scale_fill_viridis() +
   labs(fill = "AI Coefficient")
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
work_ai_coef <- ggplot() +
  geom_sf(data = work_pow_post, aes(fill = ai_work_coef)) + 
  theme_classic()+
  scale_fill_viridis() +
  theme_bw() +
  theme(text = element_text(color = "grey20", size = 12),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        legend.position = "right",
        legend.key.width = unit(1.5, 'cm')) +
  scale_fill_viridis() +
   labs(fill = "Work::AI Coefficient")
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
gridExtra::grid.arrange(gridExtra::arrangeGrob(work_coef_plot, ai_coef_plot), work_ai_coef, ncol = 2 )

We observe that both in terms of work and AI coefficients, Northern-Western Poland is characterized by a higher significance of these factors in the case of increased unemployment. In contrast, in Southern-Eastern Poland, the importance of AI and work search is less associated with a decrease in unemployment.

There are some outlier patterns in the case of work searches; Opole’s neighboring municipalities, Grodzisk Wielkopolski, and Bielsk Podlaski, are characterized by significantly different outcomes. In the former two, the importance of work search was as pronounced as in Southern-Eastern Poland. On the contrary, in Bielsk Podlaski, the association between work search and unemployment change was small and negative.

3.4 GWR Clustering

Since, in general the coefficients are well-beahing in terms of smoothing, I use k-means to cluster the coefficients.

# Select data for clustering
work_cluster <- work_pow_post[c("work_coef", "ai_coef", "ai_work_coef")] %>% 
  sf::st_drop_geometry()


# we set the maximum number of clusters
k.max <- 20
work_cluster_scaled <- scale(work_cluster)


wss_gwr <- fviz_nbclust(work_cluster_scaled, kmeans, method = "wss")
sill_gwr <-fviz_nbclust(work_cluster_scaled, kmeans, method = "silhouette")

gridExtra::grid.arrange(wss_gwr, sill_gwr, ncol=2)

Based on WSS and Silhouette methods, I would suggest 3 clusters.

clust_gwr <- kmeans(work_cluster_scaled, 3, nstart = 25)
clust_gwr_graph <- fviz_cluster(clust_gwr, work_cluster_scaled, frame = FALSE, geom = "point")

It seems that clusters are fairly different.

work_pow_post$cluster_coeffs <- clust_gwr$cluster

work_ai_cluster <- ggplot() +
  geom_sf(data = work_pow_post, aes(fill = as.factor(cluster_coeffs))) + 
  theme_classic()+
  scale_fill_viridis_d() +
  theme_bw() +
  theme(text = element_text(color = "grey20", size = 12),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        legend.position = "right",
        legend.key.width = unit(1.5, 'cm')) +
   labs(fill = "Clutser")

work_ai_cluster

work_pow_post %>% 
  group_by(cluster_coeffs) %>% 
  summarise(Unempl2022 = weighted.mean(unempl22, w =workers), 
            Unempl2012 = weighted.mean(unempl12, w =workers),
            Work_Search = weighted.mean(work, w =workers),
            AI_Search = weighted.mean(AI, w =workers))
## Simple feature collection with 3 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 14.12289 ymin: 49.00205 xmax: 24.14578 ymax: 54.83579
## Geodetic CRS:  WGS 84
## # A tibble: 3 × 6
##   cluster_coeffs Unempl2022 Unempl2012 Work_Search AI_Search
##            <int>      <dbl>      <dbl>       <dbl>     <dbl>
## 1              1       236.       165.        93.2      23.1
## 2              2       168.       145.        69.9      32.7
## 3              3       123.       133.        69.8      27.0
## # ℹ 1 more variable: geometry <MULTIPOLYGON [°]>

The results can be summarized:

  • Structurally, the order in terms of the Unemployment Structure did not change over the years.

  • Cluster 3 is characterized by the lowest unemployment rate, which even decreased (in relation to the average) over the years. At the same time, the google search was moderate in terms of work and AI.

  • In cluster 2, the unemployment position increased over the time. The work search was similar as in cluster 3.

  • Cluster 1 standouts in terms of unemployment in 2022. On average, in cluster 1, the unemployment rate was more than two times larger than in Poland.

  • In 2012, the distance between clusters in terms of average unemployment positions was relatively smaller than in 2022.

  • In cluster 1, individuals fairly often look for jobs online. This may serve as low frequency of using internet for job advertisement in cluster 1.

  • At the same time, the interest in AI was fairly low in cluster 1.

4. Conclusions and Further Research

The results demonstrate, in an exploratory manner, the relationship between job insecurity and changes in unemployment. However, it is still uncertain whether this relationship is causal in nature. Developing spatial econometric methods that allow for quasi-experimental manipulation would be extremely important in assessing the impact of job insecurity on job creation and separation. In particular, labor offices would benefit from insights into the impact of digital infrastructure on the probability of finding employment. Simultaneously, the study should incorporate information on employers’ strategies, especially in terms of the dominant strategy for job advertisements. Currently, further use of methods such as Causal Spatial Forests seems like a good idea.

Appendix: Download the data

The datsaset used in the final analysis can be downloaded via clicking on the button.

library(downloadthis)

work_pow_post %>% 
  sf::st_drop_geometry() %>% 
  download_this(
    output_name = "FinalData_SpatialEconometrics_Szymczak_AI_Work_Unempl",
    output_extension = ".csv",
    button_label = "Download data",
    button_type = "warning",
    has_icon = TRUE,
    icon = "fa fa-save"
  )

References

Acemoglu, D., & Restrepo, P. (2020). Robots and jobs: Evidence from US labor markets. Journal of political economy, 128(6), 2188-2244.

Acemoglu, D., & Restrepo, P. (2022). Tasks, automation, and the rise in us wage inequality. Econometrica, 90(5), 1973-2016.

Yam, K. C., Tang, P. M., Jackson, J. C., Su, R., & Gray, K. (2023). The rise of robots increases job insecurity and maladaptive workplace behaviors: Multimethod evidence. Journal of Applied Psychology, 108(5), 850.