Introduction

1.1 Background

Short-term rental platforms such as Airbnb have transformed urban hospitality markets by enabling flexible, peer-to-peer accommodation (Guttentag, 2015). Listing prices vary widely within and across cities, shaped by location, room type, proximity to amenities, temporal demand cycles (weekdays/weekends), and perceived quality signals such as cleanliness and guest ratings (Teubner, Hawlitschek, & Dann, 2017; Dogru & Pekin, 2017). Understanding these determinants supports fair pricing, urban tourism policy, and platform governance (Zervas, Proserpio, & Byers, 2017).

1.2 Problem Statement

Despite abundant city-level studies, systematic multi-city comparisons using a harmonized dataset remain limited (Wachsmuth & Weisler, 2018). This project quantifies how room type, spatial attributes (latitude/longitude, metro access, attractiveness/restaurant indices), cleanliness, and temporal factors jointly explain price differences across major European cities.

1.3 Objectives

Describe the distribution of key variables (price, ratings, spatial features) across cities and between weekdays/weekends.

Estimate the association between price, room type, cleanliness rating, metro distance, and urban amenity indices.

Diagnose model assumptions and assess multicollinearity.

Derive policy and managerial implications for city stakeholders and hosts.

1.4 Research Questions

RQ1: How do price levels and dispersion differ across European cities and by day-type? RQ2: What is the marginal association between price and room type, cleanliness rating, metro distance, and urban amenity indices? RQ3: Do spatial attributes (lat/lng) and city context remain important after controlling for service quality?

1.5 Hypotheses

H1: Private rooms and shared rooms are priced lower than entire homes, ceteris paribus. H2: Higher cleanliness ratings and attractiveness/restaurant indices are positively associated with price. H3: Greater distance to metro is negatively associated with price. H4: Price patterns differ between weekdays and weekends.

1.6 Significance of the Study

Findings inform hosts’ pricing strategies, platform search design, and city tourism policy (e.g., transit-oriented planning). A multi-city lens improves generalizability beyond single-city case studies (Li, Moreno, & Zhang, 2019).

1.7 Scope and Delimitations

The analysis covers multiple European cities with weekday/weekend snapshots. Results reflect listing supply and platform dynamics at the time of data collection and may not capture regulatory changes or seasonality beyond the sampled periods.

1.8 Operational Definitions

Price (realSum): Listing price in local currency units.

Room type: Categorical type of accommodation (entire home/apt, private room, etc.).

Cleanliness rating: Numeric rating reported on the platform.

Attractiveness/Restaurant indices: City grid indices proxying proximity/abundance of attractions and restaurants.

Metro distance: Distance to the nearest metro station.

Business day: Weekday vs. weekend segmentation.

Literature Review

2.1 Pricing Determinants in Peer-to-Peer Accommodation

Prior work links room type to price premiums (entire homes > private rooms > shared rooms). Quality signals (cleanliness, reviews) often command higher willingness to pay. Spatial accessibility (centrality, transit proximity) and urban amenity density (cultural and culinary clusters) also influence pricing. Multi-city comparisons highlight heterogeneous urban structures and tourism flows.

2.2 Spatial Heterogeneity and Urban Form

Urban cores concentrate demand; peripheries price lower unless compensated by unique attributes. Transit and walk ability increase accessibility and perceived value. City-specific zoning/regulatory environments further shape supply elasticity and price dispersion.

2.3 Temporal Dynamics

Weekend/holiday periods typically fetch higher prices due to leisure demand. Weekdays may reflect business travel and local events. Temporal segmentation is essential in models.

2.4 Gaps Addressed

This project contributes a harmonized multi-city data set, side-by-side visualization of city/day-type patterns, and a unified regression framing determinants across spatial, temporal, and quality dimensions.

Methodology

3.1 Research Design

A quantitative, cross-sectional, multi-city analysis combining exploratory visualization, correlation assessment, and linear regression with diagnostic checks.

3.2 Data and Variables

Data include listings from Amsterdam, Athens, Barcelona, Berlin, Budapest, Lisbon, London, Paris, Rome, and Vienna, each by weekdays and weekends. Core variables include price, room type, guest satisfaction, cleanliness rating, bedrooms, amenity indices (attr_index, rest_index and normalized variants), metro distance, latitude/longitude.

3.3 Data Processing and Models

Original data were imported and harmonized across cities. Data wrangling steps included: combining datasets, creating identifiers for city and business day, checking missing values, and performing feature engineering (dummy variables, transformations, clustering, normalization). Regression models estimated determinants of price.

Load dataset

amsterdam_weekdays <- read.csv("amsterdam_weekdays.csv")
amsterdam_weekends <- read.csv("amsterdam_weekends.csv")
athens_weekdays <- read.csv("athens_weekdays.csv")
athens_weekends <- read.csv("athens_weekends.csv")
barcelona_weekdays <- read.csv("barcelona_weekdays.csv")
barcelona_weekends <- read.csv("barcelona_weekends.csv")
berlin_weekdays <- read.csv("berlin_weekdays.csv")
berlin_weekends <- read.csv("berlin_weekends.csv")
budapest_weekdays <- read.csv("budapest_weekdays.csv")
budapest_weekends <- read.csv("budapest_weekends.csv")
lisbon_weekdays <- read.csv("lisbon_weekdays.csv")
lisbon_weekends <- read.csv("lisbon_weekends.csv")
london_weekdays <- read.csv("london_weekdays.csv")
london_weekends <- read.csv("london_weekends.csv")
paris_weekends <- read.csv("paris_weekends.csv")
paris_weekdays <- read.csv("paris_weekdays.csv")
rome_weekdays <- read.csv("rome_weekdays.csv")
rome_weekends <- read.csv("rome_weekends.csv")
vienna_weekdays <- read.csv("vienna_weekdays.csv")
vienna_weekends <- read.csv("vienna_weekends.csv")

A source column tags each observation with its origin (e.g., berlin_weekends), preserving provenance for stratified analysis.

Visualize categorical variables

Categorical Distributions

Count of rows per Country

The bar plot illustrates the distribution of Airbnb listings across the sampled cities. The variation in bar height reflects differences in sample size, with larger bars indicating cities that contributed more listings to the dataset. A greater number of listings enhances the precision and reliability of city-level estimates, while smaller sample sizes may limit the stability of results and increase sampling variability. This distribution is therefore an important consideration when comparing determinants of pricing across cities, as uneven representation may influence the strength of statistical inferences.

Count by Business Day (Weekdays vs Weekends)

The temporal balance analysis compares the number of weekday and weekend observations within the dataset. The relative distribution between these two categories is important for assessing the reliability of temporal effects on pricing. A reasonably balanced distribution ensures that comparisons between weekday and weekend patterns are stable and less prone to bias. However, a substantial imbalance in observations could weaken the validity of temporal comparisons, as estimates for the underrepresented category may be less precise and more sensitive to sampling variability.

Grouped bar chart (Country by business_day)

The chart evaluates whether both weekdays and weekends are adequately represented within each city. Ensuring balanced representation across temporal categories is critical for making unbiased comparisons of weekday versus weekend pricing patterns. If both time periods are well captured, the resulting contrasts more accurately reflect true temporal effects. In contrast, underrepresentation of either weekdays or weekends in certain cities could introduce bias, limiting the validity of temporal inferences at the city level.

Component bar chart (stacked)

The stacked proportion plot illustrates each city’s relative composition of weekday and weekend listings. This visualization highlights how temporal representation varies across cities, providing insights into the balance of observations within each location. Cities with more even proportions enable more reliable contrasts between weekday and weekend pricing, whereas cities with skewed distributions may produce biased temporal comparisons. Thus, the stacked proportions not only reveal temporal patterns across locations but also serve as a diagnostic tool for assessing the robustness of city-level temporal analyses.

Pie Chart of Business Days

combined_data %>%
  count(business_day) %>%
  ggplot(aes(x = "", y = n, fill = business_day)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar(theta = "y") +
  labs(title = "Business Day Distribution") +
  theme_void()

Overall, the results suggest a potential small to moderate imbalance between weekday and weekend listings across the dataset. While both categories are represented, the uneven distribution indicates that one temporal category may be slightly more dominant. Such imbalance does not invalidate temporal comparisons but may influence the precision of estimates, particularly for the underrepresented category. Recognizing this imbalance is therefore important when interpreting weekday–weekend contrasts in pricing.

Numerical Variables Overview

Summary Statistics

##        X           realSum         person_capacity     multi       
##  Min.   :   0   Min.   :   34.78   Min.   :2.000   Min.   :0.0000  
##  1st Qu.: 646   1st Qu.:  148.75   1st Qu.:2.000   1st Qu.:0.0000  
##  Median :1334   Median :  211.34   Median :3.000   Median :0.0000  
##  Mean   :1621   Mean   :  279.88   Mean   :3.162   Mean   :0.2914  
##  3rd Qu.:2382   3rd Qu.:  319.69   3rd Qu.:4.000   3rd Qu.:1.0000  
##  Max.   :5378   Max.   :18545.45   Max.   :6.000   Max.   :1.0000  
##       biz         cleanliness_rating guest_satisfaction_overall
##  Min.   :0.0000   Min.   : 2.000     Min.   : 20.00            
##  1st Qu.:0.0000   1st Qu.: 9.000     1st Qu.: 90.00            
##  Median :0.0000   Median :10.000     Median : 95.00            
##  Mean   :0.3502   Mean   : 9.391     Mean   : 92.63            
##  3rd Qu.:1.0000   3rd Qu.:10.000     3rd Qu.: 99.00            
##  Max.   :1.0000   Max.   :10.000     Max.   :100.00            
##     bedrooms           dist            metro_dist          attr_index     
##  Min.   : 0.000   Min.   : 0.01504   Min.   : 0.002301   Min.   :  15.15  
##  1st Qu.: 1.000   1st Qu.: 1.45314   1st Qu.: 0.248480   1st Qu.: 136.80  
##  Median : 1.000   Median : 2.61354   Median : 0.413269   Median : 234.33  
##  Mean   : 1.159   Mean   : 3.19129   Mean   : 0.681540   Mean   : 294.20  
##  3rd Qu.: 1.000   3rd Qu.: 4.26308   3rd Qu.: 0.737840   3rd Qu.: 385.76  
##  Max.   :10.000   Max.   :25.28456   Max.   :14.273577   Max.   :4513.56  
##  attr_index_norm      rest_index      rest_index_norm         lng         
##  Min.   :  0.9263   Min.   :  19.58   Min.   :  0.5928   Min.   :-9.2263  
##  1st Qu.:  6.3809   1st Qu.: 250.85   1st Qu.:  8.7515   1st Qu.:-0.0725  
##  Median : 11.4683   Median : 522.05   Median : 17.5422   Median : 4.8730  
##  Mean   : 13.4238   Mean   : 626.86   Mean   : 22.7862   Mean   : 7.4261  
##  3rd Qu.: 17.4151   3rd Qu.: 832.63   3rd Qu.: 32.9646   3rd Qu.:13.5188  
##  Max.   :100.0000   Max.   :6696.16   Max.   :100.0000   Max.   :23.7860  
##       lat       
##  Min.   :37.95  
##  1st Qu.:41.40  
##  Median :47.51  
##  Mean   :45.67  
##  3rd Qu.:51.47  
##  Max.   :52.64

Distributions and Bivariate Plots

Ratings appear concentrated at higher scores (moderate clustering), suggesting generally positive experiences.

Airbnb prices typically a strong right-skew in price, indicating many budget/mid listings and fewer premium properties.

# Boxplot of realSum by Country
ggplot(combined_data, aes(x = Country, y = realSum, fill = Country)) +
  geom_boxplot() +
  theme_minimal() +
  labs(title = "Price (realSum) by Country", y = "Price", x = "Country") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

There appears to be moderate to large differences in median price and dispersion across cities. Visual differences suggest heterogeneity; statistical significance would require formal tests.

# Boxplot of cleanliness_rating by business_day
ggplot(combined_data, aes(x = business_day, y = cleanliness_rating, fill = business_day)) +
  geom_boxplot() +
  labs(title = "Cleanliness Rating by Business Day", y = "Rating", x = "Business Day") +
  theme_minimal()

There appears to be little difference in cleanliness by day-type .

Scatterplots to assess relationships

Guest satisfaction vs Price

ggplot(combined_data, aes(x = guest_satisfaction_overall, y = realSum)) +
  geom_point(alpha = 0.5, color = "#d62728") +
  labs(title = "Guest Satisfaction vs Price", x = "Satisfaction", y = "Price") +
  theme_minimal()

A positive trend suggests higher‑priced listings also earn higher satisfaction (possibly due to amenities/location). Weak/no trend implies satisfaction is not price‑driven after all else.

Distance to metro vs Price

ggplot(combined_data, aes(x = metro_dist, y = realSum)) +
  geom_point(alpha = 0.5, color = "#9467bd") +
  labs(title = "Metro Distance vs Price", x = "Metro Distance", y = "Price") +
  theme_minimal()

There appears to be a negative relationship between distance to metro and price. listings farther from metro stations tend to be cheaper.

Grouped bar chart of average values

Average realSum by Country

combined_data %>%
  group_by(Country) %>%
  summarise(avg_price = mean(realSum, na.rm = TRUE)) %>%
  ggplot(aes(x = reorder(Country, avg_price), y = avg_price)) +
  geom_col(fill = "#FF7F0E") +
  coord_flip() +
  labs(title = "Average Price by Country", x = "Country", y = "Average Price") +
  theme_minimal()

The analysis indicates moderate to large differences in average prices across cities, suggesting that location plays an important role in determining Airbnb pricing. These differences may reflect variations in demand, cost of living, tourism activity, and local market conditions. Cities with higher average prices likely capture premium markets or stronger demand, while lower-priced cities may reflect more affordable or less competitive markets. Such cross-city variation highlights the importance of including location-specific factors in the analysis to avoid biased or oversimplified conclusions about pricing determinants.

Spatial Visualizations

The Geospatial scatter plot reveals the presence of clusters with moderately higher attractiveness, indicating that certain neighborhoods or areas within the cities draw relatively stronger demand. These clusters suggest localized effects where factors such as proximity to tourist attractions, accessibility, or neighborhood quality contribute to elevated listing appeal. The moderate strength of clustering implies that while location is influential, it interacts with other determinants—such as room type, amenities, and pricing strategy—in shaping overall demand.

The comparison of amenity distribution across day-types (weekday versus weekend) indicates that the spread of amenities appears broadly similar, with only minor observable differences. These variations are small and, since no formal statistical tests were conducted, cannot be interpreted as significant. The findings therefore suggest that amenity availability is generally consistent across temporal categories, implying that temporal price differences are more likely to be driven by other factors rather than variation in amenity provision.

The analysis shows that transit accessibility varies across cities, with moderate differences observed in the distribution of listings relative to transportation options. These patterns suggest that accessibility may contribute to pricing and demand differences across locations. However, since this conclusion is based on visual inspection, statistical significance has not been formally established. Further testing would be required to determine whether the observed differences in transit accessibility are meaningful predictors of pricing outcomes. Compares spatial distribution of metro accessibility by city, highlighting denser transit grids vs more car‑oriented peripheries.

The spatial patterns reveal co-located hotspots of cultural and culinary amenities with moderate strength, suggesting that certain neighborhoods concentrate multiple attractive features for visitors. Such clustering indicates that areas offering both cultural and dining experiences may hold a competitive advantage in shaping demand and pricing. However, these findings are based on visual analysis, and their statistical significance requires confirmation through regression modeling to establish whether amenity concentration is a meaningful predictor of listing performance.

# Use real basemaps with leaflet (interactive)

leaflet(combined_data) %>%
  addTiles() %>%
  addCircleMarkers(~lng, ~lat,
                   radius = ~attr_index_norm * 5,
                   color = ~colorNumeric("YlOrRd", attr_index_norm)(attr_index_norm),
                   popup = ~paste("City:", Country,
                                  "<br>Attr Index:", attr_index,
                                  "<br>Rest Index:", rest_index)) %>%
  addLegend("bottomright", pal = colorNumeric("YlOrRd", combined_data$attr_index_norm),
            values = ~attr_index_norm, title = "Attr Index (Norm)")

The interactive mapping results highlight localized clusters of high attractiveness, which appear visually moderate to strong in intensity. These clusters suggest that specific neighborhoods or districts consistently draw higher demand, likely due to their proximity to cultural, commercial, or leisure amenities. The spatial concentration of attractiveness reinforces the importance of location in shaping Airbnb performance. Nonetheless, as these observations are derived from visual inspection, further statistical validation is necessary to confirm the strength and significance of these localized effects.

The color-legend mapping highlights hotspots corresponding to areas of higher normalized attractiveness, with patterns that appear visually moderate to strong in intensity. These hotspots indicate that certain locations consistently outperform others in terms of perceived desirability, likely reflecting proximity to amenities, centrality, or neighborhood quality. While the visualization provides useful insights into spatial concentration, the strength of these effects remains descriptive, and statistical validation through regression is required to confirm whether these hotspots significantly influence listing performance.

Feature Engineering

The target variable is renamed to price for readability

The correlation matrix reveals notable positive and negative associations among the numeric variables, with strengths ranging from small to strong depending on the variable pairs.

# Plot correlation
corrplot(cor_matrix, method = "color", type = "upper", 
         tl.col = "black", addCoef.col = "black", number.cex = 0.7,
         title = "Correlation Matrix", mar = c(0,0,1,0))

There appear to be notable positive/negative correlations among numeric variables (small to strong depending on cells).

Transformations (log price, clustering, scaling, dummies) are created for potential analysis extension

Regression and Diagnostics

The OLS specification relates price to room type, spatial position, cleanliness, amenity indices, and transit proximity

The regression coefficients align with the theoretical expectations set out in the study. Specifically, a negative coefficient is anticipated for metro distance (H3), suggesting that listings located farther from metro stations are expected to command lower prices. Conversely, positive coefficients are expected for cleanliness rating, attractiveness index, and restaurant index (H2), indicating that higher quality and amenity measures should be associated with higher prices. For room type dummies, the interpretation is made relative to the baseline category (typically entire home/apartment), with alternative room types expected to show lower prices (H1). The overall model fit, as reflected in the R² and adjusted R², indicates the proportion of variance in listing prices explained by the predictors. Finally, the statistical significance of the coefficients is determined by p-values below 0.05, which signal associations that are unlikely to have arisen by chance.

## 
## Regression Results: Determinants of Price
## ==================================================
##                           Dependent variable:     
##                       ----------------------------
##                                  price            
## --------------------------------------------------
## room_typePrivate room         -160.185***         
##                                 (2.859)           
##                                                   
## room_typeShared room          -213.302***         
##                                 (15.914)          
##                                                   
## lat                            14.323***          
##                                 (0.263)           
##                                                   
## lng                            -5.915***          
##                                 (0.144)           
##                                                   
## cleanliness_rating             10.575***          
##                                 (1.406)           
##                                                   
## attr_index                      0.287***          
##                                 (0.011)           
##                                                   
## rest_index                     -0.033***          
##                                 (0.005)           
##                                                   
## metro_dist                     -16.643***         
##                                 (1.585)           
##                                                   
## Constant                      -422.746***         
##                                 (18.841)          
##                                                   
## --------------------------------------------------
## Observations                     51,707           
## R2                               0.152            
## Adjusted R2                      0.152            
## Residual Std. Error       301.957 (df = 51698)    
## F Statistic           1,161.575*** (df = 8; 51698)
## ==================================================
## Note:                  *p<0.1; **p<0.05; ***p<0.01

The formatted output provides coefficient estimates with standard errors and fit statistics suitable for inclusion in manuscripts.

## Regression Diagnostics
# Residual Plot
plot(model, which = 1)  # Residuals vs Fitted

The residuals plot provides a visual diagnostic for heteroskedasticity. In this case, the residuals appear as a random scatter around zero with no discernible funnel shape, indicating that there is no significant evidence of heteroskedasticity. This suggests that the variance of the errors is approximately constant across fitted values, supporting the assumption of homoscedasticity. By contrast, the presence of systematic curvature or a fanning pattern would have indicated a potentially significant deviation from this assumption, with a strength ranging from small to moderate depending on the extent of the pattern.

#Normality of Residuals
plot(model, which = 2)  # QQ Plot

The Q-Q plot assesses whether residuals follow a normal distribution. In this case, the points adhere closely to the reference line, suggesting that there is no significant deviation from normality, aside from small and acceptable deviations. This supports the assumption of normally distributed errors required for valid inference in regression. By contrast, the presence of heavy tails or extreme outliers would indicate moderate to strong departures from normality, potentially affecting the reliability of hypothesis testing and confidence intervals.

# Multicollinearity Check
vif(model)
##                        GVIF Df GVIF^(1/(2*Df))
## room_type          1.069815  2        1.017015
## lat                1.079686  1        1.039079
## lng                1.129019  1        1.062553
## cleanliness_rating 1.022819  1        1.011345
## attr_index         3.683698  1        1.919296
## rest_index         3.767038  1        1.940886
## metro_dist         1.049048  1        1.024230

Variance Inflation Factor (VIF) values provide an assessment of potential multicollinearity among predictors. A VIF of 10 or higher indicates a significant multicollinearity concern, with moderate to strong magnitude, suggesting that the predictor may be highly redundant with other variables in the model. VIF values between 5 and 10 suggest possible multicollinearity of small to moderate concern, warranting closer inspection but not necessarily invalidating the model. By contrast, VIF values below 5 indicate no significant multicollinearity, suggesting that the predictors are sufficiently independent of one another. These thresholds help determine whether corrective actions, such as variable reduction or transformation, are needed to maintain model stability.

4.2 Synthesis of Findings

Descriptive patterns show substantive cross‑city heterogeneity in average prices and dispersion.

Bivariate insights indicate plausible relationships: higher amenity indices and better accessibility associate with higher prices; greater metro distance associates with lower prices.

Regression results (expected):

Room type: Private/shared rooms priced below entire homes, supporting H1.

Quality and amenities: Positive coefficients for cleanliness and attractiveness/restaurant indices, supporting H2.

Transit: Negative coefficient for metro_dist, supporting H3.

Diagnostics: If residual patterns or high VIFs emerge, a log‑price specification and/or reduced collinearity design is recommended.

Conclusion and Implications

5.1 Policy and Managerial Implications

Hosts: Leverage cleanliness and amenity proximity in listing descriptions; price premiums align with centrality and quality signals.

Platforms: Improve search relevance by incorporating accessibility and amenity scores.

Cities: Transit‑oriented development sustains tourism value; monitor spatial concentration to balance resident and visitor needs.

5.2 Limitations

Cross‑sectional snapshots may not capture seasonality or policy shifts.

Potential measurement error in amenity indices and distances.

Unobserved host behaviors (min‑stay, dynamic pricing) not modeled.

5.3 Recommendations for Future Research

Model log(price) with interaction terms (e.g., room type × city).

Incorporate temporal panels (monthly/seasonal) and policy dummies.

Explore mixed‑effects or spatial econometric models to better capture city‑ and neighborhood‑level dependence.

Ethical Considerations and Reproducibility

All analyses use publicly obtained, non‑identifiable listing‑level summaries.

Code is provided in full for transparency and replication.

Where variable naming mismatches may exist (e.g., day_type vs business_day, city vs Country), users should align fields before execution.

5.4 Conclusion

This multi‑city analysis suggests that service quality, urban amenities, and transit accessibility are key determinants of Airbnb pricing, alongside room type. Spatial heterogeneity across European cities underscores the role of urban form and localized demand.

References

Dogru, T., & Pekin, O. (2017). What do guests value in Airbnb? An examination of online reviews. Journal of Hospitality Marketing & Management, 26(6), 665–686. https://doi.org/10.1080/19368623.2017.1306976

Guttentag, D. (2015). Airbnb: Disruptive innovation and the rise of an informal tourism accommodation sector. Current Issues in Tourism, 18(12), 1192–1217. https://doi.org/10.1080/13683500.2013.827159

Li, J., Moreno, A., & Zhang, D. J. (2019). Agent behavior in the sharing economy: Evidence from Airbnb. Management Science, 65(12), 5441–5468. https://doi.org/10.1287/mnsc.2018.3353

Teubner, T., Hawlitschek, F., & Dann, D. (2017). Price determinants on Airbnb: How reputation pays off in the sharing economy. Journal of Self-Governance and Management Economics, 5(4), 53–80. https://doi.org/10.22381/JSME5420173

Wachsmuth, D., & Weisler, A. (2018). Airbnb and the rent gap: Gentrification through the sharing economy. Environment and Planning A: Economy and Space, 50(6), 1147–1170. https://doi.org/10.1177/0308518X18778038

Zervas, G., Proserpio, D., & Byers, J. W. (2017). The rise of the sharing economy: Estimating the impact of Airbnb on the hotel industry. Journal of Marketing Research, 54(5), 687–705. https://doi.org/10.1509/jmr.15.0204