What factors determine uneven suburbanization?

New evidence from Warsaw, Poland


Introduction

Suburbanization is a shift of population from central urban areas into suburbs, resulting in the formation of (sub)urban sprawl. As a consequence of the movement of households and businesses out of city centers, low-density, peripheral urban areas grow (Caves, 2004). Most of the residents of metropolitan cities work within the central urban area, but choose to live in satellite communities called suburbs. These processes are most advanced in more economically developed countries. The United States is believed to be the first country wherein the majority of the population lives in the suburbs, rather than in the cities or rural areas. Urban sprawl, a direct consequence of suburbanization, describes the unrestricted growth in many urban areas of housing, commercial development, and roads over large expanses of land – with little concern for urban planning. The negative impact of urban sprawl include: an increase in residential energy consumption and land use; degradation of air quality, along with increased usage of natural resources and greenhouse gas emissions (Kahn, 2000); increased infrastructure costs (Downs et al., 2005); decline of social capital; residential segregation resulting in class and racial divisions (Duany et al., 2010); growing fiscal deficit (Downs et al., 2005); and health deterioration (Sturm and Cohen, 2004).

During communism, most countries in the Eastern Bloc were characterized by under-urbanization, which meant that industrial growth occurred well in advance of urban growth and was sustained by rural-urban commuting (Murray and Szelenyi, 2009). City growth, residential mobility, and land and housing development were under tight political control. Consequently, suburbanization in post-communist Europe is not only a recent but also a specific phenomenon. Warsaw is a particular example of such circumstances – 80-90% of the city’s buildings were destroyed during the 1944 uprising, and this paved the wave for uncompounded communist city planning. Suburbanization ”in the Western sense” has been a recent phenomenon - it is believed to have begun in the post-communist countries in the 90s, following the political transformation (Nuissl and Rink, 2005; Timar and Varadi, 2001). In 2019, about 43% of the inhabitants of the Warsaw metropolitan area were living in the suburbs.

The causes of metropolitan suburbanization have been heavily discussed in the literature and several theories have been offered (Mieszkowski and Mills, 1993), mostly based on the situation in Western cities. Some papers offering insight on the suburbanization processes in post-communist cities are: Kok (1999); Lisowski (2004); Murray and Szelenyi (2009); Nuissl and Rink (2005). The quantitative measures of suburbanization determinants existing in the literature include works by Jordan et al. (1998); Kok (1999); and Loibl (2004). Jordan et al. (1998) identified several pulling factors of the target suburban areas in 35 US metropolises. Loibl (2004) identified the attractiveness measures of Vienna’s suburban areas. Kok (1999) used micro-level data to investigate the motives of individuals to move out of the city and to the suburbs in Budapest and Warsaw. In all of the three mentioned works, simple regression models were used (logit in the case of Kok (1999)) and only a few possible pulling factors were included. Surprisingly, no application of the gravity model of migration (the most well-known quantitative migration model (Poot et al., 2016)) in context of suburban migration has been offered in the literature.

These three identified gaps indicate the further research potential of the topic of suburbanization determinants. They are all filled in this article – and this makes our approach innovative for several reasons. First of all, we use the gravity model of migration framework to predict the number of migrants choosing different suburban boroughs of Warsaw. On the other hand, we implement a much wider selection of possible pulling factors (30) than hitherto offered in the literature. When including that many regressors, it is reasonable to assume that the relationships between the dependent variable and some of the regressors might be non-linear. In addition, one can expect interactions between various predictors. The wide selection of pulling factors and the possibility of non-linear relationships, interactions, and collinearity call for the use of robust methods for such issues. Hence, we apply various predictive models that take into account potential non-linearities without any prior assumptions of their shape. These models are capable of dealing efficiently with a large number of potential predictors that are also highly correlated. OLS is used as a simple benchmark. In addition, to explain the estimated machine learning models and unhide the identified shape of relationships, the novel approach of eXplainable Artificial Intelligence (XAI) is used.

Our aim is to a) identify the pulling factors of Warsaw suburban boroughs which most significantly contribute to the choice of one borough over another and b) find a machine learning algorithm that predicts the number of migrants most accurately.


Methods

The gravity model of migration is one of the oldest and most popular analytical models of migration flows. According to this model, spatial flows of people depend positively on the size of target areas and negatively on the distance between them. In that sense, it resembles Newton’s law of universal gravitation, as was first noticed by Ravenstein (1885, 1889). Poot et al. (2016) deliver a thorough description of the model and the commonly applied form is:

\[M_{ij}=G \dfrac{P^{\alpha}_iP^{\beta}_j}{D^{\gamma}_{ij}},\]

where \(M_{ij}\) is the migration number of people who previously lived in area j (i) and move to area i (j). \(P_{i}(P_{j})\) is the population of that area at the beginning of the migration and \(D_{ij}\) is the distance between the two areas. \(G\) is the constant that measures the proportion and \(α\), \(β\), \(γ\) are parameters to be estimated. It is customary to logarithm the above equation, in order to express it in a common, econometric framework.

Several extended forms of the gravity model of migration have been proposed (Beine et al., 2016; Fan, 2005; Lowry, 1966). The choice of additional variables is context-specific. In our setting, there are 30 regressors for 70 observations, and this results in an increased dimensionality. Even though in such a setting the OLS estimator remains unbiased, high variance typically makes it perform very poorly (unless the matrix of observations is orthogonal) resulting in increased Mean Squared Error. Hence, we intend to apply the Elastic Net penalized regression technique, including Lasso and Ridge Regression as its special cases, which can be thought as an extension of OLS with an additional constraint on model parameters (Hastie et al., 2009). As a result of this, constraint model parameters are shrunken towards zero; some are even reduced to zero. That is why these methods (except for Ridge Regression) are known for their capability of performing variable selection. However, this approach still assumes a linear relationship between the number of migrants and the features of the target place.

Several extended forms of the gravity model of migration have been proposed (Beine et al., 2016; Fan, 2005; Lowry, 1966). The choice of additional variables is context-specific. In our setting, there are 30 regressors for 70 observations, and this results in an increased dimensionality. Even though in such a setting the OLS estimator remains unbiased, high variance typically makes it perform very poorly (unless the matrix of observations is orthogonal) resulting in increased Mean Squared Error. Hence, we intend to apply the Elastic Net penalized regression technique, including Lasso and Ridge Regression as its special cases, which can be thought as an extension of OLS with an additional constraint on model parameters (Hastie et al., 2009). As a result of this, constraint model parameters are shrunken towards zero; some are even reduced to zero. That is why these methods (except for Ridge Regression) are known for their capability of performing variable selection. However, this approach still assumes a linear relationship between the number of migrants and the features of the target place.

While the foregoing might be true for the standard independent variables of the gravity model of migration (population size and distance), there is no reason to expect this in terms of the additional regressors, such as measures of average income or amenities. If relationships between the dependent variable and the regressors fail to be linear, a linear specification is inappropriate and may lead to incorrect inference. As the shape of the relationship is not known in advance, we use machine learning tools that can flexibly adjust to data and reveal the true relationships. As no machine learning algorithms other than OLS have to date been applied to predict the number of people migrating from the city to the suburbs, it is not known which method yields the best predictions in this framework. As a consequence, we try a variety of methods representing various estimation approaches. Support Vector Regression is included as another enhanced variant of OLS in the sense that, just as OLS, it finds a hyperplane that best fits to the data. In addition, with the use of a selected kernel function, SVR applies an implicit non-linear mapping of the matrix with explanatory variables into a higher dimensional feature space, where it is more probable to find an appropriate hyperplane (Vapnik, 1995).

The other approaches are based on tree models which allow for non-linearities and take into account interactions. As single trees are not useful in predicting a continuous outcome, we use two distinct approaches that are based on multiple models in different ways – bootstrap averaging (bagging) and boosting. Random forest (Breiman, 2001) is an example of the bagging approach. It consists of estimating multiple independent tree models, each trained on a different bootstrap sample of the original dataset. In addition, at each split of each tree, only a random subset of all predictors is considered. Extreme Gradient Boosting (XGB) is an example of the boosting approach, which is also usually based on trees models. It builds the model in an iterative fashion at each step trying to improve the previous model by giving higher weights to observations that were not fitting well to the previous step. In addition to capturing non-linear relationships, XGB is also capable of performing regularization, for example by shrinkage like in the Elastic Net. Each of these models has various hyper-parameters, the values of which have to be assumed before the optimization starts (e.g., number of trees in the random forest). Their optimal values can be found with the use of cross-validation (Hastie et al., 2009). To eliminate randomness from the model validation process, we use the leave-one-out cross-validation. Since machine learning models can flexibly adjust to the data, no functional transformations of predictors are applied.

The high variety of regressors included can be beneficial in terms of finding the pulling factors which contribute to the outflow to boroughs in the widest extent, but it can also result in collinearity problems. However, Lasso and Elastic Net are believed to be successful in selecting the most important of the correlated variables and in leaving the redundant features out of the model. In our dataset, many variables are highly correlated in groups – for example, we have different measures of distance at disposal. The models based on trees (e.g., Random Forests, XGB) are not robust to correlated features, which may disturb their results. The same problem might occur for OLS, SVR, and Ridge. Hence, a pre-selection of variables out of correlated groups for these models is needed. To address this issue, we performed Principal Component Analysis with varimax rotation and chose one feature with the highest loading from each identified group of highly-correlated variables. We decided to keep 15 rotated components that cover 95% of the variance of the original variables. The dropped variables are mentioned in the Empirical Analysis part of the paper. Even though Lasso and Elastic Net are capable of performing variable selection, we decide to use the restricted dataset for all algorithms, due to reasons of reliable comparison.

We intend to compare the performance of all the algorithms by the common benchmarks of Root Mean Squared Error, Mean Absolute Error, and R2. We report them both with regard to the training sample, as well as the validation sample. Many machine learning models automatically optimize fit on the training sample almost perfectly, but this doesn’t necessarily mean that they perform well in general. If a model performs radically better on the training sample than on the new data (validation sample), the issue of overfitting is very likely present. Selecting the best model based solely on the training sample can therefore lead to incorrect inference. Hence, introducing a validation sample is necessary. We choose the model with the most accurate predictions based on the validation sample and interpret its results. While the outcome of linear algorithms is fairly easy to explain, the interpretability of non-linear models poses a challenge. These structures are usually called ”black box models” as the shape of the relationship between variables cannot be easily derived from them in the functional form that allows for interpretation. Statistical models allow us to fit a specific probability model of a defined form and usually require a set of assumptions, such as normal distribution of the variables. On the other hand, machine learning methods find patterns in rich and ponderous data with a minimal set of assumptions.

Explainable Artificial Intelligence (XAI) is a group of methods that allow us to understand the complex structure of the black-box inner working. Multiple methods on a global (whole sample) and local (a single observation) level have been proposed and a review of them can be found in Molnar (2019). Here, we focus on a brief description of the two methods we use: Permutation Feature Importance (PFI) and Accumulated Local Effects (ALE). In our case, we are interested only in interpretability on the global level, not in the performance of the model with respect to individual boroughs.

Permutation-based variable importance was first introduced by Breiman (2001) in a Random Forest algorithm. Further research was done by Fisher et al. (2019), who proposed a model agnostic tool for calculating the contribution of individual features into prediction accuracy. Variable importance is calculated by randomly permuting a variable and computing an increase in prediction error with the newly created learning sample. The loss function, which quantifies the goodness-of-fit of a model for each variable, is plotted for visual inspection of variable importance. Permutation Feature Importance allows for ranking the used regressors in terms of their contribution to prediction accuracy. This ranking is applied to the results of each of our models.

Furthermore, the most commonly used black-box visualization tool is the Partial Dependence Plot (PDP) introduced by Friedman (2001). It depicts the marginal effect of an input variable and the model outcome (ceteris paribus) and is a graphical representation of predictions. For a given variable, PDP averages model predictions keeping Xi feature values constant. However, this method assumes no correlation between predictors, as averaging incorporates the dependence between two features. As we show in the next chapter, that assumption is unrealistic in our case. Recently Apley and Zhu (2019) proposed an extension of PDP that takes the correlation bias into account, called Accumulated Local Effects (ALE). ALE calculates the average predicted outcome with respect to the predictor value with slight modifications. For a given predictor, ALE calculates average changes in prediction for observations in close neighbourhood to the original one. The graphical representation is then analogous to the PDP. In this way, we plot the relationships of the most important predictors (as indicated by Permutation Feature Importance) on the outcome variable for each of the models using ALE plots.

All calculations and visualizations presented in this report were prepared in R software.


Data Description

Our dataset consists of 70 observations corresponding to the boroughs defined as a part of the Warsaw Metropolitan Area according to the EU NUTS2 norm. Localization of each of those borughs can be seen on the map below. The coordinates used to create the map (and all other maps) come from Open Street Map, which means, in fact, Polish Registry of Borders.

The observations are for one year only and we use the latest data available, which means the years 2018–2020 depending on the variable. We intend to measure differences between boroughs, rather than in time, and hence we assume that no serious changes in the variables’ levels happened between these years.

Our dependent variable is the number of migrants who previously lived in Warsaw and moved to one of the suburban boroughs. The source of the information regarding migrants is registration of permanent residence data provided by the Polish Statistical Office on a borough level. We want to predict the number of migrants using 30 regressors. Their description and sources can be found in the table below.



Population density and distance are the standard explanatory variables of the gravity model of migration (population density can be used instead of population total, in order to make the population measure robust to the spatial size of the borough). Following the literature and accounting for data availability, we have chosen 28 supplementary measures. We included relative income as a measure of relative wealthiness in a borough and the unemployment rate as a measure of the job market condition. The percentage of votes for the ruling conservative party (Prawo i Sprawiedliwość – PiS) and the liberal opponent in 2019’s parliamentary election (Koalicja Obywatelska – KO) should depict the conservatism/liberalness of the community. The three levels of borough-type classification (according to Polish administrative law, they can be classified as rural, urban-rural, or urban) are an alternative measure of urbanization. Area is an alternative (to population) borough size characteristic. The total forest area captures the extent to which forest areas hinder habitable spaces. Total greenery area, the number of kindergartens, nurseries, shops, tourist sites, leisure sites, sport sites, restaurants, and places of worship are measures of institutional amenities. The mean of minimum distance between two houses and the number of dwelling units per km2 capture metropolitan density. The mean of minimum distance from a house to a large road, the presence of a suburban train station, driving distance from borough to the city center (separately in meters and minutes), as well as travel time from the borough to the city center with public transport depict the transport system’s condition. The price of m2 of housing, the number of parcels available for sale and their mean price measure the real estate market.

The map below depicts the geographical distribution of our dependent variable and 3 independent ones that are most relevant according the the available literature - population density, distance & income (in our case it’s relative). Logarithm were used for visualization purposes. All variables seem to follow a similar pattern - highest values (lowest for distance) are measured in the boroughs that are bordering Warsaw and the lowest (highest for distance) on the East side of the map. Therefore, we can conclude that the number of migrants seems to be higher in the boroughs of greater population density, smaller distance to Warsaw, and higher relative income.

Three variables were calcualted on our own based on the data collected from OSM - the mean minimum distance between two houses, the mean minimum distance form a house to road and Metropolition Density (no. of dwellings per km2). For the first two metrics, we calculated the minimum distance from each house to another house and to a closest road, and then averaged the reliability by visually insepcting the maps for exmepleary boroughs and comparing them with Google Maps satelitte pictures. On the map below, we can see the distribution of houses and roads for Pruszków boro’ugh. We can see that the houses are distributed algong the roads and that there are some blank areas which account for the forests and green areas which are inhabited (e.g. ‘Park Kultury i Wypoczynku Mazowsze w Pruszkowie’ in the North-West corner).

Summary statistics for all variables are located in the table below. The mean number of migrants was 147.51 in 2019. There were boroughs where no migrants reported residence, while 909 people moved to the most popular one. The average population density is 5.59 people/km2, but it differs greatly between boroughs (0.25 min, 39.93 max). The average distance measured as a straight line is 29.05 km. The closest borough is only 9.07 km away from Warsaw’s city center, while the farthest one is located almost 60 km away.



Figure below presents the correlation matrix. It can be seen that the variables are correlated up to 95%, which justifies the use of methods robust to correlation, such as Accumulated Local Effects. In order to identify groups of correlated variables, we will use the PCA (Prinical Component Analysis). Very high correlation may result in tree-based algorithm selecting different variables (from higly correlated groups) at different splits, and therefore reduces the power of those algorithms.

PCA with 16 components was able to explain about 95% of total variance. The proportaion explained by each componenet can be seen below. The second figure shows the correlation plot - all values round to 0, which proves that there is no correlation between the componenets

Empirical Analysis

The choice of variables used in all models is based on exploratory data analysis with PCA and we include one or two features from each identified group of correlated regressors (groups can be identified on the heat map below). Hence, the following 8 regressors are dropped: driving time and distance from the borough to Warsaw; the number of restaurants and sport sites; metropolitan density; the percentage of votes obtained by the ruling conservative party; and three borough type classifiers. As a result, 21 regressors are used in all models.



We choose the optimal values of hyper-parameters with the use of leave-one-out cross-validation (LOOCV). One thing to note is that the optimal α for Elastic Net is 1, resulting in Lasso. Hence, 6 algorithms are considered and run: OLS, Ridge, Lasso, RF, SVR, and XGB. We then compare RMSE, MAE, and R2 of the all models and choose to interpret the outcome of the most accurate one. The summary of model accuracy measures is presented in Table 1 for both the training and validation sample. The validation errors correspond to the average from 70 models run by LOOCV, while the training errors are for one, final model with optimal hyper-parameters. In addition, the errors and R2 were calculated for an OLS model that was obtained by the general-to-specific approach. It includes only four variables significant at the 10% level: population density, distance, number of places in nurseries, and the presence of a suburban train station.

In terms of the training sample, Extreme Gradient Boosting yields the lowest RMSE, while Support Vector Regression – the lowest MAE, and Random Forest – the highest R2. The errors and R2 for OLS, Ridge and Lasso are significantly worse. This result confirms that the models which catch non-linear relationships between predictors and the outcome explain substantially more variability of the modelled phenomenon than the linear models. However, due to the potential risk of the overfitting of machine learning models, their performance is usually assessed based on the validation sample. This can be viewed as checking the predictive performance of models on the new data, not used in the estimation process. It is clearly seen that Extreme Gradient Boosting yields the most accurate predictions there as well. Next comes the OLS with only a slightly worse performance, followed by RF and SVR. In terms of the validation sample, penalized regression algorithms exhibit the poorest accuracy of predictions.

In order to identify the pulling factors which, contribute to prediction accuracy in the widest extent, we calculate Permutation based Feature Importance (PFI) and report it as a percentage change in RMSE after permuting each variable, for each of the models. We also calculate the average importance of a feature taking into account all models and then we rank the features with respect to this value, from the most to the least important one. On the plot below we can see 6 most significant variables according the each model and according to the mean of all models

The top 6 average-rated features are the number of places in nurseries, distance to Warsaw as a straight line, relative income, percentage of votes obtained by the liberal party in the 2019 election, the number of worship sites, and population density. The number of places in nurseries is a strong pulling factor since migrants are mostly young couples and families with children moving out of flats to a single-family house. Moreover, suburban migrants usually continue to work in the core city and commute to work by their own vehicle or public transport. Distance measured as a straight line is another feature ranked highly – a pushing factor. As can be seen in Figure A2, the boroughs populated most densely are also the ones located closer to Warsaw’s core. Relative income is ranked the third and population density – the sixth most important measure, which is strongly in line with the existing literature on determinants of suburbanization. The percentage of votes obtained by the liberal party is possibly a proxy for age in boroughs or progressivity of its inhabitants. The number of worship sites, which can be summarized as an institutional amenity, was ranked fifth. In Poland, religious beliefs play a significant role in people’s lives – approximately 90% of Poles are baptized Catholics. Hence, it is not surprising that the availability of a church in close proximity plays a role in settlement processes. The other features ranked high on average are total area, total greenery spaces, the ratio of forests to the total area, the number of shops, and the presence of a suburban train station. For reasons of brevity, we plot Accumulated Local Effects for all models to interpret the (non-linear) relationship between the top 6 ranked features and the number of migrants.

Figure below presents ALE plots for these features. It is plain that assuming a linear relationship between the number of migrants and these features is infeasible. The relationship between the number of migrants and the number of places in nurseries is positively linear for all algorithms, except for XGB and RF (where it appears constant after 200 (XGB) and 380 (RF)). The largest positive slope is for OLS and Lasso, but there is a clear positive tendency for Ridge and SVR as well.

The number of nurseries is an institutional measure. Nurseries are desirable especially for families with children and young couples – the parents of both groups usually work in Warsaw and it is comfortable to leave offspring safely in a nursery for a long workday. A nursery can sometimes be replaced by grandparents who take care of infants while parents are at work, but in the case of Warsaw, there has been an ongoing influx of people from other parts of Poland to the whole agglomeration in the last 30 years. Migrants from other parts of Poland rather do not have parents in Warsaw who could take care of their children. Hence, the typical need for nurseries. Furthermore, the relationship between the number of migrants and distance is clearly negative for all algorithms, which conforms with the existing literature on suburbanization, as well as the gravity model of migration. The number of migrants drops in both RF and XGB around 20 km, after which the value remains almost constant. Income is another important and strong pulling factor. The number of migrants rises from 0 to around 80 for income equal 0.75 (which can be interpreted as the average income in a borough being 75% of the average income in all municipalities, weighted by population) in both RF and XGB. It can be seen that the relationship between the number of migrants and the percentage of votes obtained by the largest liberal opponent in 2019’s parliamentary election is clearly positive for all algorithms. In RF, the number of migrants rises delicately as the percentage of votes equals 40%. Approval for the largest liberal party in Poland is likely a proxy for age and progressiveness in boroughs. Both income and preference for the liberal party depict the role of social affiliations in migration in our study. The number of worship sites is another pulling factor. It can be seen that the number of migrants rises at value 5 for both RF and XGB. Surprisingly, the slope is slightly negative for SVR, but positive for other algorithms where this feature was used. It is worth mentioning that the preference to choose liberal rulers does not necessarily contradict a need for religiousness. Although the vast majority of Poles is formally Catholic, not all are devout – especially not in agglomerations. Services provided by the Catholic church (such as Mass, weddings, funerals) are still desirable as religion is strongly related to tradition in Poland. Finally, population density, with respect to the number of migrants, has a constant slope for Ridge Regression, RF, and XGB, while it is negative for Lasso, OLS, and SVR. This result is the most surprising from all those obtained in this study, as it is not in line with the theory of the gravity model of migration. However, it does conform to the ”natural evolution theory” described here in the Review of the Literature. It turns out that migrants prefer less densely populated municipalities offering more living space.

After visual inspection of the plots and taking the average values of PFI measured as percentage change in RMSE into consideration, we may conclude that migrants choose boroughs with more places available in nurseries (PFI = 59%), especially those with more than 200 places. They settle in municipalities located closer to Warsaw (PFI = 40%), in particular those located under 20 km to the center of Warsaw, and they prefer those of higher relative income (PFI = 26%) - primarily above 0.75. Migrants choose more liberal boroughs (PFi = 14%), but at the same time those with a greater number of houses of worship (PFI = 13%). Finally, they prefer sparsely populated municipalities (PFI = 9%) that provide more living space.

Conclusions

In this paper, we investigated the phenomenon of suburbanization in the Warsaw agglomeration. We aimed to identify the features of boroughs which are key pulling factors for migrants and describe the choice of one borough over another. Basing on the extended gravity model of migration, we built several predictive models and assessed their performance by the common benchmarks of RMSE, MAE, and R2. Extreme Gradient Boosting turned out to yield the most accurate predictions. Permutation-based Feature Importance was calculated for each chosen feature, for each model, and Accumulated Local Effects were plotted for the top 6 most important variables as indicated by the average PFI for all models. We identified 4 pulling factors by mean PFI: the number of places in nurseries, relative income, percentage of votes obtained by KO in the 2019 parliamentary election, and the number of worship sites. While income and the percentage of votes are likely proxies for social affiliation preferences, the two latter features are institutional amenities. Our finding with respect to these four measures is especially valuable in terms of spatial planning and can be used by local authorities. We also determined 2 pushing factors: migrants settle in municipalities located closer to Warsaw, in terms of distance in km, and they prefer sparsely populated places. In order to attract migrants to more distant boroughs, attractive means of transport such as suburban trains can be built.

Our findings are (to some extent) in contrast with the previous, quantitative research on the topic. For example, population density turned out to be a pushing, rather than a pulling factor. Nonetheless, previous studies were based solely on OLS, which we proved to be ineffective when non-linear relationships are observed in the data and thus they have a lower predictive power than Extreme Gradient Boosting. It appears that, even though OLS is quite accurate in predicting the number of suburban migrants, it can still be beaten by a more sophisticated algorithm. In addition, other algorithms let us identify more relevant features than OLS, where only four were significant on the 10% level. Finally, the results that conform with the previous findings are: the negative relationship between the number of migrants and the distance from the city center; and the positive influence of the average income on the number of migrants.

In addition to shedding light on the local context of Warsaw, we have also filled gaps not addressed in previous studies, as we have considered a wider variety of possible pulling factors and have identified non-linear relationships between the dependent variable and the regressors. We believe that our work can be valuable for spatial planners not only in Poland, but in other countries of similar suburbanization patterns. Several possible extensions of this paper are possible. In our work, some of the features were scraped from Internet sources, such as Google Maps, where historical data is not available. Yet, taking a 30-year time frame (in the case of Poland) could possibly lead to deeper insight into the outflux from Warsaw to the suburbs – for example, by taking into consideration time effects. An individual-level analysis could also yield valuable conclusions about the mechanisms behind people’s decisions. Finally, a comparable study of different metropolises in Poland and the region should be conducted to verify if the processes are indeed similar in all post-communist countries.

References

Apley D and Zhu J (2019) Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society, Statistical Methodology, Series B 82(4): 1059-1086.

Beine M, Bertoli S, and Fernandez-Huertas Moraga, J (2016) A practitioners’ guide to gravity models of international migration. The World Economy 39(4): 496-512.

Belot M and Ederveen S (2012) Cultural barriers in migration between OECD countries. Journal of Population Economics 25(3): 1077-1105. Breiman L (2001) Random Forests. Machine Learning 45: 5-32.

Brueckner J (1987) The structure of urban equilibria: A unified treatment of the Muth-Mills model. In: Mills ES (ed) Handbook of regional and urban economics. Elsevier, pp. 821-845.

Caves R (2004) Encyclopaedia of the city. Taylor and Francis.

Downs A, McCann B and Mukherji S (2005) Sprawl costs: Economic impacts of unchecked development. Island Press.

Duany A, Plater-Zyberk E and Speck J (2010) Suburban nation: the rise of sprawl and the decline of the American dream. Farrar, Straus and Giroux. Fan CC (2005) Modelling interprovincial migration in china, 1985-2000. Eurasian Geography and Economics 46(3): 165-184.

Fisher A, Rudin C, and Dominici F (2019) All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research 20(177): 1-81.

Friedman J (2001) Greedy function approximation: A gradient boosting machine. The Annals of Statistics 29(5): 1189-1232. Google Maps. Available at: maps.google.com (accessed 03 November 2020).

Hastie T, Tibshirani R, and Friedman J (2009) The elements of statistical learning: Data mining, inference, and prediction. Springer.

Jordan S, Ross J, and Usowski K (1998) U.S. suburbanization in the 1980s. Regional Science and Urban Economics 28(5): 611-627.

Kahn ME (2000) The environmental impact of suburbanization. Journal of Policy Analysis and Management 19(4): 569-586.

Kok H (1999) Migration from the city to the countryside in Hungary and Poland. GeoJournal 49(1): 53-62.

Lisowski A (2004) Social aspects of the suburbanisation stage in the agglomeration of Warsaw. Dela 21: 531-541.

Loibl, W. (2004). Simulation of suburban migration: driving forces, socio-economic characteristics, migration behaviour and resulting land-use patterns. Vienna Yearbook of Population Research, Vienna Institute of Demography (VID) of the Austrian Academy of Sciences in Vienna 2(1): 203-226. Lowry IS (1966) Migration and metropolitan growth: two analytical models. Chandler Pub.

Mieszkowski P and Mills E (1993) The causes of metropolitan suburbanization. Journal of Economic Perspectives 7(3): 135-147.

Molnar C (2019) Interpretable machine learning. A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/

Mulder C (1993) Migration dynamics: A life course approach. PhD Thesis, University of Amsterdam: Thesis Publishers.

Murray P and Szelenyi I (2009) The city in the transition to socialism. International Journal of Urban and Regional Research 8(1): 90 - 107.

Nuissl H and Rink D (2005) The “production” of urban sprawl in eastern Germany as a phenomenon of post-socialist transformation. Cities 22(2): 123-134.

Pietrzak M, Wilk J, and Matusik S (2013) Gravity model as the tool for internal migration analysis in Poland in 2004-2010. Institute of Economic Research Working Papers 28/2013.

Poot J, Alimi O, Cameron M, and Mare D (2016) The gravity model of migration: The successful comeback of an ageing superstar in regional science. Investigaciones Regionales – Journal of Regional Research 36: 63-86.

Ravenstein E (1885) The laws of migration, part 1. Journal of the Statistical Society 48(2): 167-235.

Ravenstein E (1889) The laws of migration, part 2. Journal of the Royal Statistical Society 52(2): 241-305.

Real estate service gratka.pl. Available at: gratka.pl (accessed 03 November 2020).

Sturm R and Cohen D (2004) Suburban sprawl and physical and mental health. Public health 118(7): 488-96.

Tiebout C (1956) A pure theory of local expenditures. Journal of Political Economy 64(5): 416-424. Timar J and Varadi M (2001) The uneven development of suburbanization during transition in Hungary. European Urban and Regional Studies 8(4): 349-360.

Transport service e-podroznik.pl. Available at: e-podroznik.pl (accessed 03 November 2020).

Vapnik VN (1995) The nature of statistical learning theory. Berlin, Heidelberg: Springer-Verlag.