1. Introduction

The fitness industry is becoming a growing and evolving market in all Poland, especially in a city like Kraków. In this city, this trend is particularly important due to the significant increase in the number of students that live in the city each year, exceeding 200,000 students anually. This makes gyms not only a place for training, but a social environment where youngsters can meet and exercise at the same time. In Poland, this is even more noticeable in winter, when outdoor activities are less possible due to the cold weather.

When we talk about the fitness industry, we should put the focus on the pace at which it is evolving, where the shift from small, traditional gyms to large commercial fitness centers is becoming more evident with time.

This leads to our project question: do gyms in Kraków follow a random pattern or do they follow clustering in the city? However, to do this, we will focus on the whole industry but with little observations in large commercial gyms to see if their placement strategy is in any term different than the one that the traditional small gyms follow. Additionally, testing if the location of these bussineses depend on the distance to the center and to the main university campuses will be crucial for our study.

2. Literature Review

  1. Decline in mortality from coronary heart disease in Poland (…) - Piotr Bandosz et. al.

The study shows how increasing lifestyle awareness and leisure physical activity is proven to reduce risk of coronary heart disease, halving its death rate from 1991 to 2005. Despite this factors not being the complete determinants of the reduction in deaths (it is about 54%), it may explain how the evolution on health awareness in Poland may have resulted in an increase of fitness centres.

  1. The Fitness Market in Poland - Edyta Bartejczuk-Wolak

Edyta shows the fitness market is expanding in Poland and it doesn’t seem to face a slowdown. With over 3300 fitness centres and an estimated of 70.000 instructors and trainers, it is becoming easier for citizens in Poland to adopt physical activity into their veryday life.

New trends are attracting more people into these facilities via the implementation of activities like yoga or pilates into their repertory. The addition of service and guidance to the already existing equipment make gyms more attractive to potential customers in Poland. However, there are still some challenges to be faced, such as the lack of proper formation by individual trainers, due to recent deregulation by the state.

  1. Changes in Anthropometric Measurements and Physical Fitness of Polish Students in 20-Year Period - Jarosław Fugiel et. al.

Jarosław makes an analysis on whether the newer generations mantain the physical fitness levels of the prior. The study indicates that, despite assuring that they get the same physical activity that older generations, participants show lower flexibility and strenght levels than 20 years ago. The conclusión reached states that recent development of technologies led to a more sedentary lifestyle among people, meaning the routinary physical stimulation (like walking, for example) is reduced. This effect increased with the COVID pandemic, specially in young people like university students.

The need for fitness centres near universities is noticeable when looking at the data, meaning we may see some clustering near areas where these individuals move.

3. Data: Description and Graphs

The Data is extracted from OpenStreetMap (OSM).

3.1. Loading packages

library(sf)
library(spatstat)
library(osmdata)
library(ggplot2)

3.2. Define Kraków Boundary

krakow_bb <- getbb("Krakow, Poland", format_out = "sf_polygon")
if(inherits(krakow_bb, "list")) krakow_bb <- krakow_bb[[1]] 

krakow_3857 <- st_transform(krakow_bb, 3857)
win <- as.owin(krakow_3857)

The administrative boundary of Kraków is projected into EPSG:3857. This metric system converts degrees into meters.

3.3. Extract Gym Locations

q_raw <- opq(bbox = st_bbox(krakow_bb)) %>%
  add_osm_feature(key = 'leisure', value = 'fitness_centre')

gyms_raw_sf <- osmdata_sf(q_raw)$osm_points
gyms_raw_3857 <- st_transform(gyms_raw_sf, crs = 3857)

coords_raw <- st_coordinates(gyms_raw_3857)
gyms_raw_ppp <- ppp(x = coords_raw[,1], y = coords_raw[,2], window = win)

plot(win, main="Initial OSM Data (209 points)", col="white", border="darkgrey")
plot(gyms_raw_ppp, add=TRUE, pch=16, col="red", cex=0.5)

The data obtained from OSM gives us 209 points by tagging leisure as “fitness_centre”. However, data may include noise by including private training studios, non-commercial sports clubs and multiple points that are just single building infrastructures. Because of this, data filtering and brand labeling are necessary to isolate the fitness market and make the marked point pattern analysis at a micro-level precisely.

3.4. Labeling data

Gyms with missing names (NA) were removed to ensure the observations have names and therefore, that they are real.

*Platinium gyms were absorbed by Well Fitness, so we must include them inside the category “Well Fitness”.

gyms_filtered_sf <- gyms_raw_sf[!is.na(gyms_raw_sf$name), ]

# Distinguish franchises
gyms_filtered_sf$Branch <- "Independent"
gyms_filtered_sf$Branch[grepl("Well Fitness|Platinium", gyms_filtered_sf$name, ignore.case=T)] <- "Well Fitness"
gyms_filtered_sf$Branch[grepl("My Fitness Place", gyms_filtered_sf$name, ignore.case=T)] <- "My Fitness Place"
gyms_filtered_sf$Branch[grepl("CityFit", gyms_filtered_sf$name, ignore.case=T)] <- "CityFit"


gyms_clean_3857 <- st_transform(gyms_filtered_sf, 3857)
coords_clean <- st_coordinates(gyms_clean_3857)
gyms_ppp <- ppp(x = coords_clean[,1], 
                y = coords_clean[,2], 
                window = win, 
                marks = as.factor(gyms_filtered_sf$Branch))

gyms_ppp <- unique(gyms_ppp)

plot(gyms_ppp, which.marks="marks", 
     main=paste("Final Labeled Data (n =", gyms_ppp$n, ")"), 
     cols=c("blue", "green", "purple", "orange"), 
     pch=16, cex=0.7)

3.5. Comparative maps

par(mfrow = c(1, 2))

# Raw Data (n=209)
plot(unmark(gyms_raw_ppp), main="1. Raw Data (n=209)", 
     cols="firebrick", pch=16, cex=0.5)

# Cleaned Data (n=94)
plot(unmark(gyms_ppp), main=paste("2. Cleaned Data (n=", gyms_ppp$n, ")"), 
     cols="firebrick", pch=16, cex=0.5)

par(mfrow = c(1, 1))

The comparative maps show how data has changed from 209 initial observations to 94 observations after the filtering process.

4. Quantitative Analysis

We analyze how gyms are distributed across Kraków using spatial methods through unmarked and marked point pattern analysis.

4.1. Unmarked Point Pattern Analysis

First of all, we study the fitness industry as a whole, without labeling the different gyms by franchises or independents.

4.1.1. First-Order Analysis: Spatial Intensity

summary(gyms_ppp)
## Marked planar point pattern:  94 points
## Average intensity 1.187023e-07 points per square unit
## 
## Coordinates are given to 9 decimal places
## 
## Multitype:
##                  frequency proportion    intensity
## CityFit                  1 0.01063830 1.262790e-09
## Independent             69 0.73404260 8.713254e-08
## My Fitness Place        15 0.15957450 1.894186e-08
## Well Fitness             9 0.09574468 1.136511e-08
## 
## Window: polygonal boundary
## single connected closed polygon with 5695 vertices
## enclosing rectangle: [2203261.6, 2250584.6] x [6440678, 6468149] units
##                      (47320 x 27470 units)
## Window area = 791897000 square units
## Fraction of frame area: 0.609
## 
## *** 7 illegal points stored in attr(,"rejects") ***
lambda_raw <- intensity(gyms_ppp)

lambda_km2 <- lambda_raw * 1000000
cat("Global Gym Density in Kraków:", round(lambda_km2, 2), "gyms per km2\n")
## Global Gym Density in Kraków: 0 0.09 0.02 0.01 gyms per km2
gyms_ppp_km <- rescale(gyms_ppp, 1000, "km")

gym_density_map <- density(gyms_ppp_km, edge=TRUE)
summary(gym_density_map)
## real-valued pixel image
## 128 x 128 pixel array (ny, nx)
## enclosing rectangle: [2203.262, 2250.585] x [6440.678, 6468.149] km
## dimensions of each pixel: 0.37 x 0.2146135 km
## Image is defined on a subset of the rectangular grid
## Subset area = 791.149232620458 square km
## Subset area fraction = 0.609
## Pixel values (inside window):
##  range = [7.097487e-09, 0.5184971]
##  integral = 99.01801
##  mean = 0.1251572
plot(gym_density_map, main = "Gym Density Heatmap in Kraków")

The density map shows a strong concentration in the center of the city. The market seems centralized, as we move away from the historical center, the hotspot intensity drops.

4.1.2. First-Order Analysis: Quadrat Test

qdr_test_clustered <- quadrat.test(gyms_ppp, nx = 5, ny = 5, alternative = "clustered")
print(qdr_test_clustered)
## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  gyms_ppp
## X2 = 178.8, df = 22, p-value < 2.2e-16
## alternative hypothesis: clustered
## 
## Quadrats: 23 tiles (irregular windows)
qdr_test_regular <- quadrat.test(gyms_ppp, nx = 5, ny = 5, alternative = "regular")
print(qdr_test_regular)
## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  gyms_ppp
## X2 = 178.8, df = 22, p-value = 1
## alternative hypothesis: regular
## 
## Quadrats: 23 tiles (irregular windows)
plot(unmark(gyms_ppp), main = "Quadrat Test Grid (5x5 Counts)", pch = 16, col = "darkgrey", cex = 0.5)
plot(qdr_test_clustered, col = "red", add = TRUE, cex = 0.6)

The 5x5 grid map gives us the number of gyms in each area, confirming that number in the grids of the center have the highest amounts.

By running a chi-squared test using quadrant counts, we observe strong statistical evidence (p-value close to 0) to reject the null hypothesis of spatial randomness in favour of the alternative, which is the clustering.

4.1.3. Second-Order Analysis: Global Spatial Interaction

The goal of Ripley’s K-function is to see how it behaves on different distance.

gym_G <- Gest(gyms_ppp)
plot(gym_G, main = "Nearest Neighbor Distance G Function")

gym_K <- Kest(gyms_ppp)
plot(gym_K, main = "Ripley's K-Function for Kraków Gyms")

The observed K-function and G function indicate that Kraków gyms follow a strong clustering pattern. In both plots, the curves (our real data) lie above the blue dashed line, which represents a completely random market. This proves that fitness centers are spatially concentrated and located near each other.

4.2. Marked Point Pattern Analysis

We incorporate the different categories as marks to analyze if their behavior depends on the branch or not.

4.2.1. First-Order Analysis of Marked Patterns

plot(split(gyms_ppp)) 

par(mfrow = c(2,2))
plot(density(split(gyms_ppp))) 

par(mfrow = c(1,1))


lambda <- intensity(gyms_ppp)
probs <- lambda/sum(lambda)
print(probs)
##          CityFit      Independent My Fitness Place     Well Fitness 
##       0.01063830       0.73404255       0.15957447       0.09574468
probs_rr <- relrisk(gyms_ppp) 
par(mfrow = c(2, 2))
plot(probs_rr, main = "Relative Risk Probability")

par(mfrow = c(1, 1))

The split maps and density heatmaps show the different location strategies that gyms have. Independent gyms are widely distributed, being closer to where people live and large corporate brands (CityFit, My Fitness Place and Well Fitness) are more targeted and centralized.

The relative risk maps confirm these differences. Independent gyms have a high probability of dominance in most residential districts and neighborhoods, while large corporate gyms focus on major hubs and probably in places like shopping centers, where daily movement of people is higher.

4.2.2. Second-Order Analysis of Marked Patterns

The multi-type Cross-L functions show how different brands interact with independent gyms at various distances.

cross_L1 <- Lcross(gyms_ppp, from = "Independent", to = "Well Fitness")
plot(cross_L1, . - r ~ r, main = "Cross-L Function: Independent vs Well Fitness", legend = TRUE)

cross_L2 <- Lcross(gyms_ppp, from = "Independent", to = "CityFit")
plot(cross_L2, . - r ~ r, main = "Cross-L Function: Independent vs CityFit", legend = TRUE)

cross_L3 <- Lcross(gyms_ppp, from = "Independent", to = "My Fitness Place")
plot(cross_L3, . - r ~ r, main = "Cross-L Function: Independent vs My Fitness Place", legend = TRUE)

For Well Fitness, the curve stays below zero at short distances but it goes above zero at longer distances when they share commercial areas.

For CityFit, the curve shows jumps because it only has one location. As the line stays far above zero, it proves this single gym was placed right in the busiest center of the city.

For My Fitness Place, the curve is above the zero line. This confirms a pattern of clustering locating it in the same high-traffic areas as independent gyms.

4.3. Point Process Models (PPM)

We use PPM to analyze how distance to the city center and to the main universities campuses affect the location of the gyms.

gyms_ppp_km <- rescale(gyms_ppp, 1000, "km")

miasto <- c('Krakow_Center')
xc <- c(19.937357) 
yc <- c(50.061691) 
capital <- data.frame(xc, yc, miasto)

pwt.sf <- st_as_sf(capital, coords = c("xc", "yc"), crs = 4326)
pwt.sf <- st_transform(st_as_sf(pwt.sf), crs = st_crs(3857)) 

center_ppp <- ppp(x=st_coordinates(pwt.sf)[,1], 
                  y=st_coordinates(pwt.sf)[,2],
                  window = Window(gyms_ppp))

center_ppp_km <- rescale(center_ppp, 1000, "km")
dist_to_center <- distfun(center_ppp_km)
dist.im.center <- as.im(dist_to_center)

university <- c('Main_Campus')
xu <- c(19.923100) 
yu <- c(50.066400) 
uni_data <- data.frame(xu, yu, university)

uni.sf <- st_as_sf(uni_data, coords = c("xu", "yu"), crs = 4326)
uni.sf <- st_transform(st_as_sf(uni.sf), crs = st_crs(3857))

uni_ppp <- ppp(x=st_coordinates(uni.sf)[,1], 
                y=st_coordinates(uni.sf)[,2],
                window = Window(gyms_ppp))

uni_ppp_km <- rescale(uni_ppp, 1000, "km")
dist_to_universities <- distfun(uni_ppp_km)
dist.im.uni <- as.im(dist_to_universities)

# Plot the distance covariates
par(mfrow = c(1, 2))
plot(dist.im.center, main = "Distance to City Center (km)", las = 1)
plot(dist.im.uni, main = "Distance to Universities (km)", las = 1)

par(mfrow = c(1, 1))

# Fitted model with brand marks and urban distances
gyms_model <- ppm(gyms_ppp_km ~ marks + dist.im.center + dist.im.uni)
summary(gyms_model)
## Point process model
## Fitted to data: gyms_ppp_km
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.formula(Q = gyms_ppp_km ~ marks + dist.im.center + dist.im.uni)
## Edge correction: "border"
##  [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
## 
## Data pattern:
## Marked planar point pattern:  94 points
## Average intensity 0.119 points per square km
## Multitype:
##                  frequency proportion intensity
## CityFit                  1     0.0106   0.00126
## Independent             69     0.7340   0.08710
## My Fitness Place        15     0.1600   0.01890
## Well Fitness             9     0.0957   0.01140
## 
## Window: polygonal boundary
## single connected closed polygon with 5695 vertices
## enclosing rectangle: [2203.2616, 2250.5846] x [6440.678, 6468.149] km
##                      (47.32 x 27.47 km)
## Window area = 791.897 square km
## Unit of length: 1 km
## Fraction of frame area: 0.609
## 
## Dummy quadrature points:
##      32 x 32 grid of dummy points, plus 4 corner points
##      dummy spacing: 1.4788446 x 0.8584542 km
## 
## Original dummy parameters: =
## Marked planar point pattern:  3054 points
## Average intensity 3.86 points per square km
## Multitype:
##                  frequency proportion intensity
## CityFit                786      0.257     0.993
## Independent            718      0.235     0.907
## My Fitness Place       772      0.253     0.975
## Well Fitness           778      0.255     0.982
## 
## Window: polygonal boundary
## single connected closed polygon with 5695 vertices
## enclosing rectangle: [2203.2616, 2250.5846] x [6440.678, 6468.149] km
##                      (47.32 x 27.47 km)
## Window area = 791.897 square km
## Unit of length: 1 km
## Fraction of frame area: 0.609
## Quadrature weights:
##      (counting weights based on 32 x 32 array of rectangular tiles)
## All weights:
##  range: [0.0428, 1.27]   total: 3170
## Weights on data points:
##  range: [0.212, 0.635]   total: 47.2
## Weights on dummy points:
##  range: [0.0428, 1.27]   total: 3120
## --------------------------------------------------------------------------------
## FITTED :
## 
## Nonstationary multitype Poisson process
## Possible marks:
## CityFit Independent My Fitness Place Well Fitness
## ---- Intensity: ----
## 
## Log intensity: ~marks + dist.im.center + dist.im.uni
## Model depends on external covariates 'dist.im.center' and 'dist.im.uni'
## Covariates provided:
##  dist.im.center: im
##  dist.im.uni: im
## 
## Fitted trend coefficients:
##           (Intercept)      marksIndependent marksMy Fitness Place 
##            -4.2445917             4.2341052             2.7080489 
##     marksWell Fitness        dist.im.center           dist.im.uni 
##             2.1972233            -0.5281212             0.2281020 
## 
##                         Estimate       S.E.    CI95.lo    CI95.hi Ztest
## (Intercept)           -4.2445917 1.01467409 -6.2333164 -2.2558670   ***
## marksIndependent       4.2341052 1.00721968  2.2599909  6.2082195   ***
## marksMy Fitness Place  2.7080489 1.03279494  0.6838080  4.7322898    **
## marksWell Fitness      2.1972233 1.05409195  0.1312410  4.2632055     *
## dist.im.center        -0.5281212 0.09539180 -0.7150857 -0.3411567   ***
## dist.im.uni            0.2281020 0.09129248  0.0491720  0.4070320     *
##                            Zval
## (Intercept)           -4.183207
## marksIndependent       4.203755
## marksMy Fitness Place  2.622059
## marksWell Fitness      2.084470
## dist.im.center        -5.536337
## dist.im.uni            2.498584
## 
## ----------- gory details -----
## 
## Fitted regular parameters (theta):
##           (Intercept)      marksIndependent marksMy Fitness Place 
##            -4.2445917             4.2341052             2.7080489 
##     marksWell Fitness        dist.im.center           dist.im.uni 
##             2.1972233            -0.5281212             0.2281020 
## 
## Fitted exp(theta):
##           (Intercept)      marksIndependent marksMy Fitness Place 
##            0.01434159           68.99991174           14.99998081 
##     marksWell Fitness        dist.im.center           dist.im.uni 
##            8.99998849            0.58971187            1.25621342

The fitted model confirms that both brand type and distances to university campuses and city center significantly influence gym intensity. City Fit serves as the reference group, while all other brands show different intensities in Kraków.

The negative coefficient for “dist.im.center” indicates that gym density decreases as distance from Rynek Główny increases, suggesting a strong centralizing effect around the city center. The exponential of this coefficient gives a value of approximately 0.59, meaning that the intensity drops by about 41% for each 1 km increase in distance from the center.

On the other hand, the positive coefficient for “dist.im.uni” shows that gym intensity increases as you move away from the university campus. The exponential value is approximately 1.26, showing that intensity grows by about 26% for each 1 km increase in distance from the campus.

This proves that fitness centers prefer to cluster in the main commercial core of Kraków where movement of people is higher, rather than strictly inside student residential areas.

5. Conclusions

First of all, by studying the industry as a whole, both the Quadrat test and Ripley’s K-function strongly rejected spatial randomness in favour of clustering. The density heatmap showed a massive concentration of gyms in the center of Kraków. From an economic perspective, this proves that the fitness industry is highly centralized. Gyms prefer to have high competition in exchange for being close to the busiest areas of the city.

Secondly, when we incorporated brands as marks, we saw that large commercial gyms and independent gyms follow completely different placement and location strategies: Independent gyms act are widely distributed by dominating residential neighborhoods and areas further from the center, focusing on being close to where people live. Corporate franchises (CityFit, Well Fitness and My Fitness Place) follow a much more centralized and targeted strategy. They focus on major commercial hubs, shopping centers, and high-traffic areas.

Finally, our Poisson Point Process Model (PPM) gave us the statistical proof of these behaviors. This result provides a great economic overview: large commercial fitness centers in Kraków do not strictly target student residential zones. Instead, they strategically choose locations in the center of the city, prioritizing areas with the highest daily movement of people.