First we load the libraries and datasets that we are going to use. We have, on the one hand, the map of neighborhoods (** Barrios ) and a dataframe of the prices of AMBA rentals to which we will filter so that it only shows us the prices of units located in CABA ( PrecioAlqCap **). Then we load the shapefiles with the stations, taking care to convert them all to the same coordinate systems.
Then we use the ** ggplot ** package to make our first graph. Using the ** labs ** function we add our legend in which we indicate the presence of railway stations (blue), public bicycles (green) and subway (red).
ggplot() +
geom_sf(data = Barrios) +
geom_sf(data = PreciosAlqCap, size=0.5,aes(color="darkgrey")) +
geom_sf(data = LineasSubte, colour= "red") +
geom_sf(data = EstSubtes, size = 0.8, aes(color="red")) +
geom_sf(data = EstacionTren, aes(color="blue"))+
geom_sf(data = EstacionBici, size = 0.8,aes(color="green"))+
labs(title = "Transport Stations in Buenos Aires and Property Offer",
subtitle = "Subway, Public Bikes and Train Stations", caption = "Source: BA Data")+
scale_colour_manual(name = 'Type',
values =c('red'='red','blue'='blue','green'='green','darkgrey'='darkgrey'), labels = c("Train Station",'Public Bike',"Subway","Property Offer"))+
theme_void()
Before proceeding with the proximity calculations, we do a final check to make sure that all our data is expressed in the same coordinate system (and in meters).
With the help of the ** st_distance ** function we calculate the distance from the apartments to each of the subway stations. Then, using the ** mutate ** function, we generate a new dataset that we will call ** datosPreciosAlqCap ** that contains a new column (** CercaSub **) that tells us if it is less than 500 meters from a subway station .
distanciasS <- st_distance(PreciosAlqCap, EstSubtes)
str(distanciasS)
## Units: [m] num [1:15099, 1:90] 4253 4094 5231 5246 8155 ...
dim(distanciasS)
## [1] 15099 90
PreciosAlqCap <- PreciosAlqCap %>% mutate(distanciaSubte = apply(distanciasS,
1, function(x) min(x)))
datosPreciosAlqCap <- PreciosAlqCap %>%
filter(!is.na(l3)) %>%
mutate(ClosetoSubway = ifelse(distanciaSubte<=500,TRUE,FALSE))
ggplot() +
geom_sf(data = Barrios) +
geom_sf(data = datosPreciosAlqCap, size=0.5, aes(color=ClosetoSubway)) +
geom_sf(data = LineasSubte) +
geom_sf(data = EstSubtes, size = 0.8, colour = "red")
We do the same for the distance to the railway stations …
distanciasT <- st_distance(datosPreciosAlqCap, EstacionTren)
str(distanciasT)
## Units: [m] num [1:15099, 1:47] 3511 3795 2221 2208 973 ...
dim(distanciasT)
## [1] 15099 47
datosPreciosAlqCap <- datosPreciosAlqCap %>% mutate(distanciaTren = apply(distanciasT,
1, function(x) min(x)))
datosPreciosAlqCap <- datosPreciosAlqCap %>%
filter(!is.na(l3)) %>%
mutate(ClosetoTrain = ifelse(distanciaTren<=500,TRUE,FALSE))
ggplot() +
geom_sf(data = Barrios) +
geom_sf(data = datosPreciosAlqCap, size=0.5, aes(color=ClosetoTrain)) +
geom_sf(data = LineasSubte) +
geom_sf(data = EstacionTren, size = 0.8, colour = "blue")
… and the same for the distance to the public bicycle stations, that is, the stations of the official EcoBici program. In this case, we consider that walking 500 meters just to start unlocking a bicycle is hardly an alternative of proximity, so here our definition of “near” will become 200 meters, and this is how we define it in our ** ifelse **.
distanciasB <- st_distance(datosPreciosAlqCap, EstacionBici)
str(distanciasB)
## Units: [m] num [1:15099, 1:199] 3671 3946 2387 2469 4174 ...
dim(distanciasB)
## [1] 15099 199
datosPreciosAlqCap <- datosPreciosAlqCap %>% mutate(distanciaBici = apply(distanciasB,
1, function(x) min(x)))
datosPreciosAlqCap <- datosPreciosAlqCap %>%
filter(!is.na(l3)) %>%
mutate(ClosetoBike = ifelse(distanciaBici<=200,TRUE,FALSE))
ggplot() +
geom_sf(data = Barrios) +
geom_sf(data = datosPreciosAlqCap, size=0.5, aes(color=ClosetoBike)) +
geom_sf(data = LineasSubte) +
geom_sf(data = EstacionBici, size = 0.8, colour = "green")
Next, we filter the data to keep only those apartments with less than 60,000 pesos a month for rent and thus eliminate outliers, since we understand that residents of many of the higher-income areas are not usually near transshipment stations since they are not usually use public transport.
mean(datosPreciosAlqCap$price)
## [1] 25351.83
Now we do a linear regression to understand if there is a relationship between prices and the distance to a subway station.
Through this function we see that both variables have a negative relationship. For each additional meter of distance to the subway, the rental price is reduced by 1.07 pesos. The ordinate to the origin is 22,937 pesos, this is what an apartment should be worth if it is 0 meters away from a subway station. If we graph it, we observe that the line that minimizes the distance between all the points, with the ordinate to the origin and the slope given by that coefficient, is negative, and that there are many more departments a short distance from the subway but with a lot of variability in their prices. Based on R2, our model can predict less than 1 percent of the change in price.
We run a second regression that attempts to measure the relationship between the rental price of the units and their distance to a train station.
We see that they also have a negative relationship, but less than that of the subway. For each additional meter of distance to the train station, the rental price is reduced by 1.43 pesos. The ordinate to the origin is 23,542 pesos, this should be worth an apartment 0 meters away from a train station. If we graph it, we see the line that minimizes the distance between all the points has less slope. Based on R2, our model can predict only 0.5 percent of the change in price.
Following the order of our development, we now make a third regression, between the rental price of apartments and their distance to a station in the EcoBici network.
We see that they have a negative relationship, less than that of the subway and similar to that of the train. For every meter of distance to the public bicycle station, the rental price is reduced by 0.71 pesos. The ordinate to the origin is 23,072 pesos, this should be worth an apartment located 0 meters from a public bicycle station. It is a lower price than the intercept in models 1 and 2. Based on R2, our model can predict only 1.7 percent of the change in price. We graph it as follows:
ggplot(datosPreciosAlqCapfiltrados) +
geom_abline(slope = coef(regresion3)[2],intercept = coef(regresion3)[1]) +
geom_point(aes(x=distanciaBici, y=price), size = 0.1) +
labs(x='Distancia a estacion de bici publica (m)', y = 'Precio ($)') +
theme(axis.title = element_text(size=10))
Now we do a multiple regression between the price (as a dependent variable) and the distance to the subway station, the number of rooms, the surface area and the number of amenities (as independent variables):
regresion_multiple1 <- lm (data = datosPreciosAlqCapfiltrados, formula = price ~ distanciaSubte + distanciaTren + distanciaBici + rooms + surface_co + amenities )
summary (regresion_multiple1)
##
## Call:
## lm(formula = price ~ distanciaSubte + distanciaTren + distanciaBici +
## rooms + surface_co + amenities, data = datosPreciosAlqCapfiltrados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -85387 -3939 -957 2640 36809
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6948.72531 217.08175 32.010 < 0.0000000000000002 ***
## distanciaSubte 0.52573 0.10602 4.959 0.0000007176615 ***
## distanciaTren -1.60028 0.12314 -12.996 < 0.0000000000000002 ***
## distanciaBici -0.77352 0.05113 -15.128 < 0.0000000000000002 ***
## rooms 633.91677 93.85106 6.754 0.0000000000149 ***
## surface_co 265.09441 3.59063 73.830 < 0.0000000000000002 ***
## amenities1 5993.87725 135.89789 44.106 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6748 on 13886 degrees of freedom
## Multiple R-squared: 0.5419, Adjusted R-squared: 0.5417
## F-statistic: 2738 on 6 and 13886 DF, p-value: < 0.00000000000000022
We see how the R2 increases, which indicates that this model is a better fit and explains 54 percent of the change in price. The number of rooms, amenities and the covered area (** surface_co **) are positively related to the change in the rental price of the apartment. For example, one more room increases the rental value by 633 pesos (it should be noted that if the covered area variable does not exist, the “influence” of each room would be greater). You can also see how the coefficient of distance to the subway went from -1.07 to 0.52 when incorporating these new variables.
To detect the spatial correlation, we incorporate each of the neighborhoods into the regression in order to eliminate the bias.
regresion_conbarrios <- lm(formula = price ~ distanciaSubte + distanciaBici + distanciaTren + rooms + surface_co + amenities + l3,
data = datosPreciosAlqCapfiltrados, na.action=na.exclude)
summary(regresion_conbarrios)
##
## Call:
## lm(formula = price ~ distanciaSubte + distanciaBici + distanciaTren +
## rooms + surface_co + amenities + l3, data = datosPreciosAlqCapfiltrados,
## na.action = na.exclude)
##
## Residuals:
## Min 1Q Median 3Q Max
## -74900 -3622 -765 2427 39456
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3964.7401 657.4749 6.030 0.000000001678737051 ***
## distanciaSubte 1.1438 0.1710 6.689 0.000000000023343895 ***
## distanciaBici -0.3644 0.1231 -2.959 0.003092 **
## distanciaTren -0.8358 0.1413 -5.913 0.000000003435115134 ***
## rooms 900.4950 85.9490 10.477 < 0.0000000000000002 ***
## surface_co 241.6544 3.3251 72.676 < 0.0000000000000002 ***
## amenities1 4911.9321 126.8406 38.725 < 0.0000000000000002 ***
## l3Agronomía -2110.4873 1971.5228 -1.070 0.284419
## l3Almagro 1211.2771 681.1462 1.778 0.075378 .
## l3Balvanera -1597.8682 714.6627 -2.236 0.025378 *
## l3Barracas -2037.6794 908.4811 -2.243 0.024916 *
## l3Barrio Norte 3501.0789 677.1635 5.170 0.000000237089956110 ***
## l3Belgrano 4464.7078 697.3694 6.402 0.000000000158084555 ***
## l3Boca -2916.7347 1065.8474 -2.737 0.006217 **
## l3Boedo -1233.3600 841.7982 -1.465 0.142903
## l3Caballito 153.6862 669.7270 0.229 0.818502
## l3Catalinas 623.1325 4368.3214 0.143 0.886570
## l3Centro / Microcentro -632.8901 777.8190 -0.814 0.415846
## l3Chacarita -937.7134 924.9516 -1.014 0.310697
## l3Coghlan 1131.9353 979.7258 1.155 0.247963
## l3Colegiales 1116.1759 753.1215 1.482 0.138345
## l3Congreso -1321.3782 825.9250 -1.600 0.109649
## l3Constitución -3678.6226 952.1518 -3.863 0.000112 ***
## l3Flores -1465.5035 784.1731 -1.869 0.061665 .
## l3Floresta -2849.3219 956.5638 -2.979 0.002900 **
## l3Las Cañitas 6279.2594 815.1815 7.703 0.000000000000014203 ***
## l3Liniers -4735.8188 1322.2923 -3.582 0.000343 ***
## l3Mataderos -2724.9118 1427.0103 -1.910 0.056215 .
## l3Monserrat -1564.5334 830.9302 -1.883 0.059739 .
## l3Monte Castro -3329.0195 1285.2488 -2.590 0.009603 **
## l3Nuñez 4648.3380 828.6202 5.610 0.000000020652522533 ***
## l3Once -2320.5952 767.7696 -3.023 0.002511 **
## l3Palermo 5217.6118 648.9377 8.040 0.000000000000000969 ***
## l3Parque Avellaneda -2206.0264 2323.7927 -0.949 0.342474
## l3Parque Centenario 810.8586 977.9682 0.829 0.407048
## l3Parque Chacabuco -158.6450 920.9527 -0.172 0.863234
## l3Parque Chas 180.8649 1461.3693 0.124 0.901504
## l3Parque Patricios -1459.1679 1048.6757 -1.391 0.164115
## l3Paternal -2443.3241 957.3157 -2.552 0.010713 *
## l3Pompeya -2331.5624 1578.7942 -1.477 0.139752
## l3Puerto Madero 15618.4805 829.4538 18.830 < 0.0000000000000002 ***
## l3Recoleta 4866.5846 664.2877 7.326 0.000000000000250209 ***
## l3Retiro 3913.7513 736.4548 5.314 0.000000108722331346 ***
## l3Saavedra 1385.3172 972.9967 1.424 0.154537
## l3San Cristobal -328.7790 831.6930 -0.395 0.692618
## l3San Nicolás -163.8802 808.7793 -0.203 0.839430
## l3San Telmo -414.4537 748.7827 -0.554 0.579928
## l3Tribunales -1589.1469 1122.5900 -1.416 0.156913
## l3Velez Sarsfield -1438.3380 2041.7394 -0.704 0.481154
## l3Versalles -5261.0013 1709.9004 -3.077 0.002097 **
## l3Villa Crespo 1011.2204 678.2993 1.491 0.136032
## l3Villa del Parque -2925.5255 917.0362 -3.190 0.001425 **
## l3Villa Devoto -1566.8516 1049.9173 -1.492 0.135628
## l3Villa General Mitre -2583.8236 1307.7199 -1.976 0.048195 *
## l3Villa Lugano -7503.3245 1356.2287 -5.532 0.000000032144775184 ***
## l3Villa Luro -3447.9683 1238.7307 -2.783 0.005385 **
## l3Villa Ortuzar -8.2323 1118.8689 -0.007 0.994130
## l3Villa Pueyrredón -866.4443 1035.6129 -0.837 0.402804
## l3Villa Real -1995.2748 2955.2285 -0.675 0.499581
## l3Villa Riachuelo -8279.2829 4466.7537 -1.854 0.063827 .
## l3Villa Santa Rita -925.7407 1483.3156 -0.624 0.532571
## l3Villa Soldati -6494.5015 3178.2953 -2.043 0.041033 *
## l3Villa Urquiza 2145.2355 763.0355 2.811 0.004939 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6110 on 13830 degrees of freedom
## Multiple R-squared: 0.626, Adjusted R-squared: 0.6243
## F-statistic: 373.4 on 62 and 13830 DF, p-value: < 0.00000000000000022