First we load the libraries and datasets that we are going to use. We have, on the one hand, the map of neighborhoods (** Barrios ) and a dataframe of the prices of AMBA rentals to which we will filter so that it only shows us the prices of units located in CABA ( PrecioAlqCap **). Then we load the shapefiles with the stations, taking care to convert them all to the same coordinate systems.

Then we use the ** ggplot ** package to make our first graph. Using the ** labs ** function we add our legend in which we indicate the presence of railway stations (blue), public bicycles (green) and subway (red).

ggplot() + 
  geom_sf(data = Barrios) +
  geom_sf(data = PreciosAlqCap, size=0.5,aes(color="darkgrey")) +
  geom_sf(data = LineasSubte, colour= "red") +
  geom_sf(data = EstSubtes, size = 0.8, aes(color="red")) +
  geom_sf(data = EstacionTren, aes(color="blue"))+
  geom_sf(data = EstacionBici, size = 0.8,aes(color="green"))+
  labs(title = "Transport Stations in Buenos Aires and Property Offer",
         subtitle = "Subway, Public Bikes and Train Stations", caption = "Source: BA Data")+
  scale_colour_manual(name = 'Type', 
         values =c('red'='red','blue'='blue','green'='green','darkgrey'='darkgrey'), labels = c("Train Station",'Public Bike',"Subway","Property Offer"))+
  theme_void()

Before proceeding with the proximity calculations, we do a final check to make sure that all our data is expressed in the same coordinate system (and in meters).

With the help of the ** st_distance ** function we calculate the distance from the apartments to each of the subway stations. Then, using the ** mutate ** function, we generate a new dataset that we will call ** datosPreciosAlqCap ** that contains a new column (** CercaSub **) that tells us if it is less than 500 meters from a subway station .

distanciasS <- st_distance(PreciosAlqCap, EstSubtes)
str(distanciasS)
##  Units: [m] num [1:15099, 1:90] 4253 4094 5231 5246 8155 ...
dim(distanciasS)
## [1] 15099    90
PreciosAlqCap <- PreciosAlqCap %>% mutate(distanciaSubte = apply(distanciasS, 
    1, function(x) min(x)))

datosPreciosAlqCap  <- PreciosAlqCap  %>% 
                       filter(!is.na(l3)) %>%
                       mutate(ClosetoSubway = ifelse(distanciaSubte<=500,TRUE,FALSE))
ggplot() + 
  geom_sf(data = Barrios) +
  geom_sf(data = datosPreciosAlqCap, size=0.5, aes(color=ClosetoSubway))  +
  geom_sf(data = LineasSubte) +
  geom_sf(data = EstSubtes, size = 0.8, colour = "red")

We do the same for the distance to the railway stations …

distanciasT <- st_distance(datosPreciosAlqCap, EstacionTren)
str(distanciasT)
##  Units: [m] num [1:15099, 1:47] 3511 3795 2221 2208 973 ...
dim(distanciasT)
## [1] 15099    47
datosPreciosAlqCap <- datosPreciosAlqCap  %>% mutate(distanciaTren = apply(distanciasT, 
    1, function(x) min(x)))

datosPreciosAlqCap  <- datosPreciosAlqCap  %>% 
                       filter(!is.na(l3)) %>%
                       mutate(ClosetoTrain = ifelse(distanciaTren<=500,TRUE,FALSE))
ggplot() + 
  geom_sf(data = Barrios) +
  geom_sf(data = datosPreciosAlqCap, size=0.5, aes(color=ClosetoTrain))  +
  geom_sf(data = LineasSubte) +
  geom_sf(data = EstacionTren, size = 0.8, colour = "blue")

… and the same for the distance to the public bicycle stations, that is, the stations of the official EcoBici program. In this case, we consider that walking 500 meters just to start unlocking a bicycle is hardly an alternative of proximity, so here our definition of “near” will become 200 meters, and this is how we define it in our ** ifelse **.

distanciasB <- st_distance(datosPreciosAlqCap, EstacionBici)
str(distanciasB)
##  Units: [m] num [1:15099, 1:199] 3671 3946 2387 2469 4174 ...
dim(distanciasB)
## [1] 15099   199
datosPreciosAlqCap <- datosPreciosAlqCap %>% mutate(distanciaBici = apply(distanciasB, 
    1, function(x) min(x)))

datosPreciosAlqCap  <- datosPreciosAlqCap  %>% 
                       filter(!is.na(l3)) %>%
                       mutate(ClosetoBike = ifelse(distanciaBici<=200,TRUE,FALSE))
ggplot() + 
  geom_sf(data = Barrios) +
  geom_sf(data = datosPreciosAlqCap, size=0.5, aes(color=ClosetoBike))  +
  geom_sf(data = LineasSubte) +
  geom_sf(data = EstacionBici, size = 0.8, colour = "green")

Next, we filter the data to keep only those apartments with less than 60,000 pesos a month for rent and thus eliminate outliers, since we understand that residents of many of the higher-income areas are not usually near transshipment stations since they are not usually use public transport.

mean(datosPreciosAlqCap$price)
## [1] 25351.83

Now we do a linear regression to understand if there is a relationship between prices and the distance to a subway station.

Through this function we see that both variables have a negative relationship. For each additional meter of distance to the subway, the rental price is reduced by 1.07 pesos. The ordinate to the origin is 22,937 pesos, this is what an apartment should be worth if it is 0 meters away from a subway station. If we graph it, we observe that the line that minimizes the distance between all the points, with the ordinate to the origin and the slope given by that coefficient, is negative, and that there are many more departments a short distance from the subway but with a lot of variability in their prices. Based on R2, our model can predict less than 1 percent of the change in price.

We run a second regression that attempts to measure the relationship between the rental price of the units and their distance to a train station.

We see that they also have a negative relationship, but less than that of the subway. For each additional meter of distance to the train station, the rental price is reduced by 1.43 pesos. The ordinate to the origin is 23,542 pesos, this should be worth an apartment 0 meters away from a train station. If we graph it, we see the line that minimizes the distance between all the points has less slope. Based on R2, our model can predict only 0.5 percent of the change in price.

Following the order of our development, we now make a third regression, between the rental price of apartments and their distance to a station in the EcoBici network.

We see that they have a negative relationship, less than that of the subway and similar to that of the train. For every meter of distance to the public bicycle station, the rental price is reduced by 0.71 pesos. The ordinate to the origin is 23,072 pesos, this should be worth an apartment located 0 meters from a public bicycle station. It is a lower price than the intercept in models 1 and 2. Based on R2, our model can predict only 1.7 percent of the change in price. We graph it as follows:

ggplot(datosPreciosAlqCapfiltrados) +
  geom_abline(slope = coef(regresion3)[2],intercept = coef(regresion3)[1]) +
  geom_point(aes(x=distanciaBici, y=price), size = 0.1) +
  labs(x='Distancia a estacion de bici publica (m)', y = 'Precio ($)') +
  theme(axis.title = element_text(size=10))

Now we do a multiple regression between the price (as a dependent variable) and the distance to the subway station, the number of rooms, the surface area and the number of amenities (as independent variables):

regresion_multiple1 <- lm (data = datosPreciosAlqCapfiltrados, formula = price ~ distanciaSubte + distanciaTren + distanciaBici + rooms + surface_co + amenities )
summary (regresion_multiple1)
## 
## Call:
## lm(formula = price ~ distanciaSubte + distanciaTren + distanciaBici + 
##     rooms + surface_co + amenities, data = datosPreciosAlqCapfiltrados)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -85387  -3939   -957   2640  36809 
## 
## Coefficients:
##                  Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)    6948.72531  217.08175  32.010 < 0.0000000000000002 ***
## distanciaSubte    0.52573    0.10602   4.959      0.0000007176615 ***
## distanciaTren    -1.60028    0.12314 -12.996 < 0.0000000000000002 ***
## distanciaBici    -0.77352    0.05113 -15.128 < 0.0000000000000002 ***
## rooms           633.91677   93.85106   6.754      0.0000000000149 ***
## surface_co      265.09441    3.59063  73.830 < 0.0000000000000002 ***
## amenities1     5993.87725  135.89789  44.106 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6748 on 13886 degrees of freedom
## Multiple R-squared:  0.5419, Adjusted R-squared:  0.5417 
## F-statistic:  2738 on 6 and 13886 DF,  p-value: < 0.00000000000000022

We see how the R2 increases, which indicates that this model is a better fit and explains 54 percent of the change in price. The number of rooms, amenities and the covered area (** surface_co **) are positively related to the change in the rental price of the apartment. For example, one more room increases the rental value by 633 pesos (it should be noted that if the covered area variable does not exist, the “influence” of each room would be greater). You can also see how the coefficient of distance to the subway went from -1.07 to 0.52 when incorporating these new variables.

To detect the spatial correlation, we incorporate each of the neighborhoods into the regression in order to eliminate the bias.

regresion_conbarrios <- lm(formula = price ~  distanciaSubte + distanciaBici + distanciaTren + rooms + surface_co + amenities + l3,
data = datosPreciosAlqCapfiltrados, na.action=na.exclude)
summary(regresion_conbarrios)
## 
## Call:
## lm(formula = price ~ distanciaSubte + distanciaBici + distanciaTren + 
##     rooms + surface_co + amenities + l3, data = datosPreciosAlqCapfiltrados, 
##     na.action = na.exclude)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -74900  -3622   -765   2427  39456 
## 
## Coefficients:
##                          Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)             3964.7401   657.4749   6.030 0.000000001678737051 ***
## distanciaSubte             1.1438     0.1710   6.689 0.000000000023343895 ***
## distanciaBici             -0.3644     0.1231  -2.959             0.003092 ** 
## distanciaTren             -0.8358     0.1413  -5.913 0.000000003435115134 ***
## rooms                    900.4950    85.9490  10.477 < 0.0000000000000002 ***
## surface_co               241.6544     3.3251  72.676 < 0.0000000000000002 ***
## amenities1              4911.9321   126.8406  38.725 < 0.0000000000000002 ***
## l3Agronomía            -2110.4873  1971.5228  -1.070             0.284419    
## l3Almagro               1211.2771   681.1462   1.778             0.075378 .  
## l3Balvanera            -1597.8682   714.6627  -2.236             0.025378 *  
## l3Barracas             -2037.6794   908.4811  -2.243             0.024916 *  
## l3Barrio Norte          3501.0789   677.1635   5.170 0.000000237089956110 ***
## l3Belgrano              4464.7078   697.3694   6.402 0.000000000158084555 ***
## l3Boca                 -2916.7347  1065.8474  -2.737             0.006217 ** 
## l3Boedo                -1233.3600   841.7982  -1.465             0.142903    
## l3Caballito              153.6862   669.7270   0.229             0.818502    
## l3Catalinas              623.1325  4368.3214   0.143             0.886570    
## l3Centro / Microcentro  -632.8901   777.8190  -0.814             0.415846    
## l3Chacarita             -937.7134   924.9516  -1.014             0.310697    
## l3Coghlan               1131.9353   979.7258   1.155             0.247963    
## l3Colegiales            1116.1759   753.1215   1.482             0.138345    
## l3Congreso             -1321.3782   825.9250  -1.600             0.109649    
## l3Constitución         -3678.6226   952.1518  -3.863             0.000112 ***
## l3Flores               -1465.5035   784.1731  -1.869             0.061665 .  
## l3Floresta             -2849.3219   956.5638  -2.979             0.002900 ** 
## l3Las Cañitas           6279.2594   815.1815   7.703 0.000000000000014203 ***
## l3Liniers              -4735.8188  1322.2923  -3.582             0.000343 ***
## l3Mataderos            -2724.9118  1427.0103  -1.910             0.056215 .  
## l3Monserrat            -1564.5334   830.9302  -1.883             0.059739 .  
## l3Monte Castro         -3329.0195  1285.2488  -2.590             0.009603 ** 
## l3Nuñez                 4648.3380   828.6202   5.610 0.000000020652522533 ***
## l3Once                 -2320.5952   767.7696  -3.023             0.002511 ** 
## l3Palermo               5217.6118   648.9377   8.040 0.000000000000000969 ***
## l3Parque Avellaneda    -2206.0264  2323.7927  -0.949             0.342474    
## l3Parque Centenario      810.8586   977.9682   0.829             0.407048    
## l3Parque Chacabuco      -158.6450   920.9527  -0.172             0.863234    
## l3Parque Chas            180.8649  1461.3693   0.124             0.901504    
## l3Parque Patricios     -1459.1679  1048.6757  -1.391             0.164115    
## l3Paternal             -2443.3241   957.3157  -2.552             0.010713 *  
## l3Pompeya              -2331.5624  1578.7942  -1.477             0.139752    
## l3Puerto Madero        15618.4805   829.4538  18.830 < 0.0000000000000002 ***
## l3Recoleta              4866.5846   664.2877   7.326 0.000000000000250209 ***
## l3Retiro                3913.7513   736.4548   5.314 0.000000108722331346 ***
## l3Saavedra              1385.3172   972.9967   1.424             0.154537    
## l3San Cristobal         -328.7790   831.6930  -0.395             0.692618    
## l3San Nicolás           -163.8802   808.7793  -0.203             0.839430    
## l3San Telmo             -414.4537   748.7827  -0.554             0.579928    
## l3Tribunales           -1589.1469  1122.5900  -1.416             0.156913    
## l3Velez Sarsfield      -1438.3380  2041.7394  -0.704             0.481154    
## l3Versalles            -5261.0013  1709.9004  -3.077             0.002097 ** 
## l3Villa Crespo          1011.2204   678.2993   1.491             0.136032    
## l3Villa del Parque     -2925.5255   917.0362  -3.190             0.001425 ** 
## l3Villa Devoto         -1566.8516  1049.9173  -1.492             0.135628    
## l3Villa General Mitre  -2583.8236  1307.7199  -1.976             0.048195 *  
## l3Villa Lugano         -7503.3245  1356.2287  -5.532 0.000000032144775184 ***
## l3Villa Luro           -3447.9683  1238.7307  -2.783             0.005385 ** 
## l3Villa Ortuzar           -8.2323  1118.8689  -0.007             0.994130    
## l3Villa Pueyrredón      -866.4443  1035.6129  -0.837             0.402804    
## l3Villa Real           -1995.2748  2955.2285  -0.675             0.499581    
## l3Villa Riachuelo      -8279.2829  4466.7537  -1.854             0.063827 .  
## l3Villa Santa Rita      -925.7407  1483.3156  -0.624             0.532571    
## l3Villa Soldati        -6494.5015  3178.2953  -2.043             0.041033 *  
## l3Villa Urquiza         2145.2355   763.0355   2.811             0.004939 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6110 on 13830 degrees of freedom
## Multiple R-squared:  0.626,  Adjusted R-squared:  0.6243 
## F-statistic: 373.4 on 62 and 13830 DF,  p-value: < 0.00000000000000022