El presente estudio se enmarca en el análisis del mercado inmobiliario europeo, realizado a solicitud de una empresa inmobiliaria que busca comprender los factores determinantes de los precios de los inmuebles. La empresa opera en varias ciudades y ha proporcionado datos que incluyen variables de características de las viviendas (como el área media y el número de habitaciones) y variables económicas (como el PIB per cápita y la tasa de desempleo).
El objetivo principal es identificar las variables que influyen significativamente en el precio medio de los inmuebles (Price_Median) y cuantificar su impacto mediante un modelo de regresión lineal múltiple. Para ello, se sigue una metodología que incluye: (i) un análisis exploratorio de los datos (estadísticos descriptivos y matriz de correlaciones), (ii) la especificación y estimación de un modelo de regresión, (iii) la verificación de los supuestos de Mínimos Cuadrados Ordinarios (MCO) y (iv) la interpretación de los resultados para emitir recomendaciones.
El informe se estructura de la siguiente manera: después de esta introducción, se presenta un análisis aprioris con las expectativas teóricas. Luego, se realiza un análisis exploratorio y estadístico descriptivo, seguido de la modelación y interpretación de los resultados. Finalmente, se verifican los supuestos del modelo y se ofrecen recomendaciones para la empresa.
Basándonos en la teoría económica y en la literatura existente sobre el mercado inmobiliario, se pueden establecer expectativas respecto a la relación entre el precio medio de los inmuebles (Price_Median) y las variables independientes consideradas en el modelo.
En primer lugar, se espera que las características físicas de las viviendas tengan un impacto positivo y significativo sobre el precio. Concretamente, el área media (Area_Median) y el número medio de habitaciones (Room_Median) son variables que, según la teoría, se correlacionan positivamente con el precio. Esto se debe a que una mayor superficie y un mayor número de habitaciones incrementan la utilidad de la vivienda y, por tanto, su valor de mercado.
En cuanto a las variables demográficas, la densidad de población (Density) y la población total (Population) pueden tener efectos ambiguos. Por un lado, una mayor densidad puede indicar una mayor demanda de vivienda en áreas urbanas, lo que presionaría los precios al alza. Por otro lado, una densidad muy alta podría estar asociada a congestión y externalidades negativas, lo que podría tener un efecto negativo. La población total, por su parte, se espera que tenga una relación positiva con el precio, ya que una mayor población implica una mayor demanda potencial.
Respecto a las variables macroeconómicas, el PIB per cápita (GDP_PC) es un indicador del nivel de renta de la región. Se anticipa una relación positiva, ya que un mayor nivel de renta permite a los individuos destinar más recursos a la vivienda. La tasa de desempleo (URate) se espera que tenga un efecto negativo sobre los precios, ya que un mayor desempleo reduce la capacidad de compra de los hogares. Finalmente, la tasa de interés hipotecaria (MIR2010) también se espera que tenga un efecto negativo, puesto que unas mayores tasas de interés encarecen el coste del crédito y reducen la demanda de vivienda.
En resumen, se anticipa que las variables relacionadas con las características de la vivienda (área y habitaciones) y el nivel de renta (PIB per cápita) tendrán un efecto positivo, mientras que las variables que reflejan condiciones económicas adversas (desempleo y tasas de interés) tendrán un efecto negativo. Las variables demográficas presentan expectativas menos claras y requerirán un análisis empírico para determinar su relación con los precios.
library(dplyr)
##
## Adjuntando el paquete: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2) # <- agrega esta
library(GGally)
library(Hmisc)
##
## Adjuntando el paquete: 'Hmisc'
## The following objects are masked from 'package:dplyr':
##
## src, summarize
## The following objects are masked from 'package:base':
##
## format.pval, units
library(corrplot)
## corrplot 0.95 loaded
library(readxl)
# Cargar archivo Excel
base_taller <- read_excel("C:/Users/carab/Downloads/Rosi_files/dp2015-13_Dataset.xls")
# Ver tabla en pestaña nueva
View(base_taller)
head(base_taller)
## # A tibble: 6 × 184
## City City_Eng City_Short NAds Price_Median Price_Mean Area_Median Area_Mean
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Amste… Amsterd… AMS 9520 3420. 3426. 71 75.3
## 2 Athina Athens ATH 10782 2064. 2109. 76 77.2
## 3 Barce… Barcelo… BCN 5479 3140 3268. 77 80.8
## 4 Beogr… Belgrade BEG 12797 1417. 1466. 58 60.2
## 5 Berlin Berlin BER 16772 2150. 2314. 75 82.3
## 6 Bruxe… Brussels BRU 7879 2357. 2428. 95 98.4
## # ℹ 176 more variables: Room_Median <dbl>, Room_Mean <dbl>, Euro_area <dbl>,
## # EU <dbl>, Population <dbl>, City_Area <dbl>, Density <dbl>, GDP_PC <dbl>,
## # GDP_PC_PPS <dbl>, GDP_PC2008 <dbl>, GDP_PC2009 <dbl>, GDP_PC2010 <dbl>,
## # Gini <dbl>, HOR <dbl>, Kearny_GCI2010 <dbl>, LRIR <dbl>,
## # Inflation2010 <dbl>, Inflation2011 <dbl>, URate <dbl>, MIR2009 <dbl>,
## # MIR2010 <dbl>, Mortgage_PC2010 <dbl>, Tppl1989_1993 <dbl>,
## # Tppl1994_1998 <dbl>, Tppl1999_2002 <dbl>, Tppl2003_2006 <dbl>, …
tail(base_taller)
## # A tibble: 6 × 184
## City City_Eng City_Short NAds Price_Median Price_Mean Area_Median Area_Mean
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Tall… Tallinn TAL 8876 1062. 1059. 52.6 54.1
## 2 Tori… Turin TUR 8525 2225 2321. 75 80.5
## 3 Vale… Valencia VAL 14537 1741. 1849. 93 96.5
## 4 Viln… Vilnius VIL 2794 1251. 1289. 58 59.1
## 5 Wars… Warsaw WAW 155154 1934. 1976. 54 56.7
## 6 Wien Vienna VIE 10370 3571. 3657. 87 93.0
## # ℹ 176 more variables: Room_Median <dbl>, Room_Mean <dbl>, Euro_area <dbl>,
## # EU <dbl>, Population <dbl>, City_Area <dbl>, Density <dbl>, GDP_PC <dbl>,
## # GDP_PC_PPS <dbl>, GDP_PC2008 <dbl>, GDP_PC2009 <dbl>, GDP_PC2010 <dbl>,
## # Gini <dbl>, HOR <dbl>, Kearny_GCI2010 <dbl>, LRIR <dbl>,
## # Inflation2010 <dbl>, Inflation2011 <dbl>, URate <dbl>, MIR2009 <dbl>,
## # MIR2010 <dbl>, Mortgage_PC2010 <dbl>, Tppl1989_1993 <dbl>,
## # Tppl1994_1998 <dbl>, Tppl1999_2002 <dbl>, Tppl2003_2006 <dbl>, …
summary(base_taller)
## City City_Eng City_Short NAds
## Length:50 Length:50 Length:50 Min. : 576
## Class :character Class :character Class :character 1st Qu.: 4932
## Mode :character Mode :character Mode :character Median : 9694
## Mean : 17924
## 3rd Qu.: 15981
## Max. :155154
##
## Price_Median Price_Mean Area_Median Area_Mean
## Min. : 503.1 Min. : 534.7 Min. : 47.00 Min. : 51.12
## 1st Qu.:1305.3 1st Qu.:1322.1 1st Qu.: 55.03 1st Qu.: 59.10
## Median :2107.2 Median :2200.7 Median : 65.50 Median : 68.07
## Mean :2436.7 Mean :2535.0 Mean : 68.56 Mean : 72.14
## 3rd Qu.:3092.2 3rd Qu.:3176.5 3rd Qu.: 78.50 3rd Qu.: 82.63
## Max. :8590.9 Max. :8865.7 Max. :100.00 Max. :109.49
##
## Room_Median Room_Mean Euro_area EU Population
## Min. :2.00 Min. :1.805 Min. :0.0 Min. :0.0 Min. : 401389
## 1st Qu.:2.00 1st Qu.:2.172 1st Qu.:0.0 1st Qu.:0.0 1st Qu.: 799132
## Median :2.00 Median :2.476 Median :0.5 Median :1.0 Median : 1148799
## Mean :2.47 Mean :2.515 Mean :0.5 Mean :0.7 Mean : 1872288
## 3rd Qu.:3.00 3rd Qu.:2.832 3rd Qu.:1.0 3rd Qu.:1.0 3rd Qu.: 1707910
## Max. :3.00 Max. :3.394 Max. :1.0 Max. :1.0 Max. :12915158
##
## City_Area Density GDP_PC GDP_PC_PPS
## Min. : 84.8 Min. : 1315 Min. : 2535 Min. : 4696
## 1st Qu.: 169.9 1st Qu.: 2571 1st Qu.:11862 1st Qu.:15047
## Median : 375.2 Median : 3297 Median :29550 Median :30400
## Mean : 523.4 Mean : 4552 Mean :31026 Mean :31335
## 3rd Qu.: 500.0 3rd Qu.: 5551 3rd Qu.:42750 3rd Qu.:42750
## Max. :5512.0 Max. :20618 Max. :86464 Max. :76200
##
## GDP_PC2008 GDP_PC2009 GDP_PC2010 Gini
## Min. : 2535 Min. : 1872 Min. : 2301 Min. :23.60
## 1st Qu.:10537 1st Qu.: 9384 1st Qu.:10685 1st Qu.:28.35
## Median :25848 Median :24680 Median :25899 Median :31.85
## Mean :31226 Mean :28888 Mean :30422 Mean :32.89
## 3rd Qu.:47699 3rd Qu.:43128 3rd Qu.:48444 3rd Qu.:36.40
## Max. :88300 Max. :84421 Max. :85861 Max. :52.10
##
## HOR Kearny_GCI2010 LRIR Inflation2010
## Min. :12.80 Min. :0.0000 Min. : 0.690 Min. :-1.600
## 1st Qu.:30.02 1st Qu.:0.0000 1st Qu.: 3.770 1st Qu.: 1.450
## Median :66.15 Median :0.0000 Median : 4.515 Median : 2.000
## Mean :58.74 Mean :0.9976 Mean : 6.525 Mean : 3.885
## 3rd Qu.:83.25 3rd Qu.:2.3050 3rd Qu.:10.706 3rd Qu.: 8.400
## Max. :97.42 Max. :5.8600 Max. :15.970 Max. :10.500
##
## Inflation2011 URate MIR2009 MIR2010
## Min. : 1.200 Min. : 0.300 Min. : 1.260 Min. : 0.920
## 1st Qu.: 2.500 1st Qu.: 5.802 1st Qu.: 3.277 1st Qu.: 2.993
## Median : 3.100 Median : 7.990 Median : 4.360 Median : 3.815
## Mean : 4.816 Mean : 9.446 Mean : 7.457 Mean : 7.026
## 3rd Qu.: 5.275 3rd Qu.:11.908 3rd Qu.: 9.352 3rd Qu.: 8.615
## Max. :53.200 Max. :23.124 Max. :26.000 Max. :26.200
## NA's :6 NA's :2 NA's :2
## Mortgage_PC2010 Tppl1989_1993 Tppl1994_1998 Tppl1999_2002
## Min. : 0.190 Min. : 464773 Min. : 421249 Min. : 399685
## 1st Qu.: 0.395 1st Qu.: 702444 1st Qu.: 704303 1st Qu.: 722104
## Median : 6.475 Median :1067365 Median : 964346 Median : 993134
## Mean :10.195 Mean :1450095 Mean :1393183 Mean :1570878
## 3rd Qu.:14.090 3rd Qu.:1655272 3rd Qu.:1611954 3rd Qu.:1698320
## Max. :45.160 Max. :6829300 Max. :6901300 Max. :8803468
## NA's :2 NA's :17 NA's :19 NA's :14
## Tppl2003_2006 Tppl2007_2009 GDP_PC_PPS1989_1993 GDP_PC_PPS1994_1998
## Min. : 392306 Min. : 401389 Mode:logical Min. : 6900
## 1st Qu.: 689874 1st Qu.: 704162 NA's:50 1st Qu.:13675
## Median :1000174 Median :1021956 Median :20700
## Mean :1363199 Mean :1467902 Mean :23215
## 3rd Qu.:1622183 3rd Qu.:1695450 3rd Qu.:29500
## Max. :7413100 Max. :7668300 Max. :54300
## NA's :14 NA's :20 NA's :16
## GDP_PC_PPS1999_2002 GDP_PC_PPS2003_2006 GDP_PC_PPS2007_2009 CITIES
## Min. :10600 Min. :13900 Min. :16000 Length:50
## 1st Qu.:21175 1st Qu.:22700 1st Qu.:26550 Class :character
## Median :27800 Median :31100 Median :34300 Mode :character
## Mean :30556 Mean :33851 Mean :39020
## 3rd Qu.:37575 3rd Qu.:43300 3rd Qu.:48450
## Max. :65800 Max. :70100 Max. :76200
## NA's :16 NA's :15 NA's :15
## DemoDepend1989_1993 DemoDepend1994_1998 DemoDepend1999_2002
## Min. :48.10 Min. :49.40 Min. :48.40
## 1st Qu.:53.83 1st Qu.:53.40 1st Qu.:54.40
## Median :61.30 Median :61.00 Median :57.00
## Mean :60.58 Mean :59.36 Mean :57.36
## 3rd Qu.:66.72 3rd Qu.:63.80 3rd Qu.:59.30
## Max. :72.40 Max. :68.10 Max. :71.50
## NA's :22 NA's :25 NA's :19
## DemoDepend2003_2006 DemoDepend2007_2009 DemoODepend1989_1993
## Min. :48.20 Min. :45.10 Min. :11.60
## 1st Qu.:52.30 1st Qu.:51.85 1st Qu.:21.00
## Median :55.50 Median :55.40 Median :24.30
## Mean :57.08 Mean :56.45 Mean :24.02
## 3rd Qu.:59.90 3rd Qu.:59.80 3rd Qu.:26.30
## Max. :73.10 Max. :73.90 Max. :35.20
## NA's :17 NA's :23 NA's :17
## DemoODepend1994_1998 DemoODepend1999_2002 DemoODepend2003_2006
## Min. :19.90 Min. :17.80 Min. : 8.00
## 1st Qu.:22.10 1st Qu.:23.05 1st Qu.:22.30
## Median :24.90 Median :25.30 Median :25.60
## Mean :24.97 Mean :25.87 Mean :25.63
## 3rd Qu.:26.50 3rd Qu.:27.60 3rd Qu.:27.90
## Max. :33.80 Max. :39.70 Max. :41.20
## NA's :21 NA's :14 NA's :13
## DemoODepend2007_2009 Thh1989_1993 Thh1994_1998 Thh1999_2002
## Min. :16.50 Min. : 181700 Min. : 172189 Min. : 173215
## 1st Qu.:23.60 1st Qu.: 321757 1st Qu.: 300098 1st Qu.: 301566
## Median :27.25 Median : 468546 Median : 501000 Median : 459053
## Mean :27.29 Mean : 640428 Mean : 677489 Mean : 650780
## 3rd Qu.:29.25 3rd Qu.: 745727 3rd Qu.: 784475 3rd Qu.: 757578
## Max. :42.10 Max. :2841000 Max. :3002000 Max. :3015997
## NA's :20 NA's :20 NA's :27 NA's :13
## Thh2003_2006 Thh2007_2009 Ndwe1989_1993 Ndwe1994_1998
## Min. : 169913 Min. : 179823 Min. : 245810 Min. : 172218
## 1st Qu.: 315000 1st Qu.: 311600 1st Qu.: 336712 1st Qu.: 318326
## Median : 445413 Median : 441678 Median : 640641 Median : 483800
## Mean : 740042 Mean : 709285 Mean : 675929 Mean : 656970
## 3rd Qu.: 861892 3rd Qu.: 778750 3rd Qu.: 826938 3rd Qu.: 756840
## Max. :3112000 Max. :3243000 Max. :1734320 Max. :1792443
## NA's :23 NA's :27 NA's :35 NA's :43
## Ndwe1999_2002 Ndwe2003_2006 Ndwe2007_2009 Napart1989_1993
## Min. : 172753 Min. : 175200 Min. : 180500 Min. : 216902
## 1st Qu.: 318722 1st Qu.: 309745 1st Qu.: 300026 1st Qu.: 293286
## Median : 459753 Median : 473332 Median : 398317 Median : 493569
## Mean : 714348 Mean : 761303 Mean : 575072 Mean : 583210
## 3rd Qu.: 779094 3rd Qu.: 827429 3rd Qu.: 743266 3rd Qu.: 783494
## Max. :3401080 Max. :3414094 Max. :1890837 Max. :1128801
## NA's :13 NA's :19 NA's :29 NA's :46
## Napart1994_1998 Napart1999_2002 Napart2003_2006 Napart2007_2009
## Mode:logical Min. : 50230 Min. : 52280 Min. : 165200
## NA's:50 1st Qu.: 278208 1st Qu.: 287361 1st Qu.: 270472
## Median : 364509 Median : 371303 Median : 310492
## Mean : 533246 Mean : 542447 Mean : 579920
## 3rd Qu.: 651279 3rd Qu.: 660014 3rd Qu.: 663682
## Max. :1692262 Max. :1694180 Max. :1721929
## NA's :16 NA's :27 NA's :38
## Nhouse1989_1993 Nhouse1994_1998 Nhouse1999_2002 Nhouse2003_2006
## Min. : 10383 Mode:logical Min. : 5354 Min. : 10162
## 1st Qu.: 19111 NA's:50 1st Qu.: 21275 1st Qu.: 27275
## Median : 28908 Median : 43705 Median : 45387
## Mean : 50611 Mean : 103348 Mean : 74495
## 3rd Qu.: 40959 3rd Qu.: 92317 3rd Qu.: 98770
## Max. :153693 Max. :1553888 Max. :307987
## NA's :45 NA's :16 NA's :27
## Nhouse2007_2009 Aphouse1989_1993 Aphouse1994_1998 Aphouse1999_2002
## Min. : 1779 Min. : 477.3 Min. : 585.6 Min. : 160.0
## 1st Qu.: 35839 1st Qu.:1116.4 1st Qu.: 976.0 1st Qu.: 951.3
## Median : 70374 Median :1800.0 Median :1629.8 Median :1759.0
## Mean : 203085 Mean :1830.0 Mean :1797.1 Mean :1727.5
## 3rd Qu.: 144459 3rd Qu.:2350.0 3rd Qu.:2550.0 3rd Qu.:2476.8
## Max. :1570149 Max. :3700.0 Max. :3500.0 Max. :3784.0
## NA's :38 NA's :39 NA's :38 NA's :31
## Aphouse2003_2006 Aphouse2007_2009 ApapartMincome1989_1993
## Min. : 408.6 Min. : 238.7 Min. :0.1080
## 1st Qu.:1097.0 1st Qu.:1466.5 1st Qu.:0.1108
## Median :2200.0 Median :2800.0 Median :0.1190
## Mean :2187.3 Mean :2714.4 Mean :0.1270
## 3rd Qu.:2838.5 3rd Qu.:3833.0 3rd Qu.:0.1452
## Max. :4530.0 Max. :5399.9 Max. :0.1540
## NA's :27 NA's :31 NA's :44
## ApapartMincome1994_1998 ApapartMincome1999_2002 ApapartMincome2003_2006
## Min. :0.1070 Min. :0.0440 Min. :0.0800
## 1st Qu.:0.1138 1st Qu.:0.0775 1st Qu.:0.0915
## Median :0.1255 Median :0.0955 Median :0.1145
## Mean :0.1230 Mean :0.1095 Mean :0.1335
## 3rd Qu.:0.1305 3rd Qu.:0.1080 3rd Qu.:0.1610
## Max. :0.1380 Max. :0.3050 Max. :0.2660
## NA's :44 NA's :38 NA's :34
## ApapartMincome2007_2009 Arent-housing1989_1993 Arent-housing1994_1998
## Min. :0.0650 Mode:logical Mode:logical
## 1st Qu.:0.0950 NA's:50 NA's:50
## Median :0.1100
## Mean :0.1334
## 3rd Qu.:0.1340
## Max. :0.2870
## NA's :37
## Arent-housing1999_2002 Arent-housing2003_2006 Arent-housing2007_2009
## Min. : 3.00 Min. : 5.00 Min. : 8.00
## 1st Qu.:12.25 1st Qu.: 13.00 1st Qu.: 18.00
## Median :70.50 Median : 78.00 Median : 88.00
## Mean :55.00 Mean : 58.59 Mean : 76.06
## 3rd Qu.:80.50 3rd Qu.: 85.00 3rd Qu.: 99.00
## Max. :99.00 Max. :105.00 Max. :167.00
## NA's :34 NA's :33 NA's :33
## Alarea1989_1993 Alarea1994_1998 Alarea1999_2002 Alarea2003_2006
## Min. :14.91 Min. :16.70 Min. :13.20 Min. :14.80
## 1st Qu.:19.28 1st Qu.:21.06 1st Qu.:24.27 1st Qu.:32.71
## Median :32.75 Median :33.35 Median :34.90 Median :38.20
## Mean :28.58 Mean :30.16 Mean :31.65 Mean :34.29
## 3rd Qu.:35.12 3rd Qu.:38.12 3rd Qu.:38.00 3rd Qu.:40.35
## Max. :37.10 Max. :39.30 Max. :47.70 Max. :45.86
## NA's :32 NA's :38 NA's :19 NA's :30
## Alarea2007_2009 Phh-owndwe1989_1993 Phh-owndwe1994_1998 Phh-owndwe1999_2002
## Min. :15.85 Min. : 9.30 Min. :10.10 Min. :11.40
## 1st Qu.:27.03 1st Qu.:21.20 1st Qu.:16.45 1st Qu.:22.20
## Median :38.76 Median :39.90 Median :20.10 Median :50.00
## Mean :33.50 Mean :42.26 Mean :36.05 Mean :47.05
## 3rd Qu.:39.75 3rd Qu.:57.20 3rd Qu.:54.90 3rd Qu.:64.20
## Max. :46.44 Max. :86.50 Max. :78.10 Max. :88.80
## NA's :35 NA's :23 NA's :39 NA's :13
## Phh-owndwe2003_2006 Phh-owndwe2007_2009 Urate1989_1993 Urate1994_1998
## Min. :11.50 Min. :12.80 Min. : 1.300 Min. : 2.00
## 1st Qu.:19.27 1st Qu.:19.93 1st Qu.: 4.025 1st Qu.: 8.60
## Median :23.50 Median :21.25 Median : 7.150 Median : 9.30
## Mean :36.59 Mean :36.03 Mean :10.575 Mean :12.24
## 3rd Qu.:50.80 3rd Qu.:41.15 3rd Qu.:14.250 3rd Qu.:16.60
## Max. :84.70 Max. :85.20 Max. :42.700 Max. :27.80
## NA's :32 NA's :38 NA's :26 NA's :29
## Urate1999_2002 Urate2003_2006 Urate2007_2009 Ncom-head1989_1993
## Min. : 2.600 Min. : 3.300 Min. : 1.100 Mode:logical
## 1st Qu.: 5.250 1st Qu.: 6.700 1st Qu.: 5.225 NA's:50
## Median : 8.000 Median : 8.900 Median : 6.550
## Mean : 9.842 Mean : 9.134 Mean : 6.906
## 3rd Qu.:12.475 3rd Qu.:11.300 3rd Qu.: 8.550
## Max. :31.800 Max. :19.100 Max. :15.300
## NA's :14 NA's :15 NA's :32
## Ncom-head1994_1998 Ncom-head1999_2002 Ncom-head2003_2006 Ncom-head2007_2009
## Mode:logical Min. : 8.00 Min. : 2.00 Min. : 9.00
## NA's:50 1st Qu.: 23.25 1st Qu.: 12.50 1st Qu.: 21.75
## Median : 46.00 Median : 21.00 Median : 38.00
## Mean :171.55 Mean : 42.74 Mean : 64.42
## 3rd Qu.:117.00 3rd Qu.: 37.50 3rd Qu.: 68.75
## Max. :985.00 Max. :331.00 Max. :210.00
## NA's :28 NA's :27 NA's :38
## Mhhincome1989_1993 Mhhincome1994_1998 Mhhincome1999_2002 Mhhincome2003_2006
## Min. :11913 Min. : 1091 Min. : 1641 Min. : 2877
## 1st Qu.:14350 1st Qu.: 8118 1st Qu.:12148 1st Qu.:12475
## Median :14988 Median :15500 Median :17476 Median :17400
## Mean :14511 Mean :11525 Mean :15866 Mean :15766
## 3rd Qu.:15225 3rd Qu.:16000 3rd Qu.:21700 3rd Qu.:20600
## Max. :15700 Max. :17400 Max. :26490 Max. :26544
## NA's :42 NA's :35 NA's :25 NA's :33
## Mhhincome2007_2009 Ahhincome1989_1993 Ahhincome1994_1998 Ahhincome1999_2002
## Min. : 3437 Mode:logical Mode:logical Min. : 2873
## 1st Qu.:13587 NA's:50 NA's:50 1st Qu.:21302
## Median :21650 Median :24900
## Mean :18643 Mean :22507
## 3rd Qu.:23275 3rd Qu.:26022
## Max. :32210 Max. :38516
## NA's :36 NA's :35
## Ahhincome2003_2006 Ahhincome2007_2009 RQ1-Q4earn1989_1993 RQ1-Q4earn1994_1998
## Min. : 3592 Min. : 4278 Mode:logical Mode:logical
## 1st Qu.:21926 1st Qu.:23952 NA's:50 NA's:50
## Median :25150 Median :28200
## Mean :22890 Mean :24842
## 3rd Qu.:27525 3rd Qu.:30437
## Max. :40250 Max. :35917
## NA's :32 NA's :31
## RQ1-Q4earn1999_2002 RQ1-Q4earn2003_2006 RQ1-Q4earn2007_2009
## Min. :0.20 Min. :0.3000 Min. :0.3000
## 1st Qu.:0.30 1st Qu.:0.3000 1st Qu.:0.3000
## Median :0.40 Median :0.3000 Median :0.3000
## Mean :0.35 Mean :0.3429 Mean :0.3286
## 3rd Qu.:0.40 3rd Qu.:0.4000 3rd Qu.:0.3750
## Max. :0.50 Max. :0.4000 Max. :0.4000
## NA's :32 NA's :36 NA's :36
## HhincomeQ21989_1993 HhincomeQ21994_1998 HhincomeQ21999_2002
## Mode:logical Mode:logical Min. : 1374
## NA's:50 NA's:50 1st Qu.:11138
## Median :16500
## Mean :14535
## 3rd Qu.:18350
## Max. :28444
## NA's :32
## HhincomeQ22003_2006 HhincomeQ22007_2009 HhincomeQ31989_1993
## Min. : 2498 Min. : 2838 Mode:logical
## 1st Qu.:12900 1st Qu.:11096 NA's:50
## Median :16900 Median :18350
## Mean :14058 Mean :15866
## 3rd Qu.:17900 3rd Qu.:19975
## Max. :23502 Max. :27899
## NA's :37 NA's :36
## HhincomeQ31994_1998 HhincomeQ31999_2002 HhincomeQ32003_2006
## Mode:logical Min. : 2009 Min. : 3338
## NA's:50 1st Qu.:15869 1st Qu.:19509
## Median :22907 Median :23400
## Mean :20194 Mean :19306
## 3rd Qu.:25450 3rd Qu.:24500
## Max. :43628 Max. :30372
## NA's :32 NA's :37
## HhincomeQ32007_2009 Tlandarea1989_1993 Tlandarea1994_1998 Tlandarea1999_2002
## Min. : 3989 Min. : 39.0 Min. : 83.8 Min. : 38.9
## 1st Qu.:15830 1st Qu.: 139.5 1st Qu.: 144.1 1st Qu.: 158.3
## Median :25450 Median : 267.1 Median : 287.2 Median : 248.4
## Mean :21855 Mean : 363.2 Mean : 364.2 Mean : 382.6
## 3rd Qu.:27325 3rd Qu.: 487.8 3rd Qu.: 495.0 3rd Qu.: 494.0
## Max. :36527 Max. :1498.7 Max. :1285.3 Max. :1572.0
## NA's :36 NA's :26 NA's :23 NA's :17
## Tlandarea2003_2006 Tlandarea2007_2009 Larea-leisure1989_1993
## Min. : 38.9 Min. : 84.7 Min. :1.50
## 1st Qu.: 141.3 1st Qu.: 148.2 1st Qu.:3.05
## Median : 217.0 Median : 248.3 Median :4.60
## Mean : 327.9 Mean : 359.3 Mean :4.60
## 3rd Qu.: 426.2 3rd Qu.: 496.0 3rd Qu.:6.15
## Max. :1285.3 Max. :1307.7 Max. :7.70
## NA's :21 NA's :25 NA's :48
## Larea-leisure1994_1998 Larea-leisure1999_2002 Larea-leisure2003_2006
## Min. :2.000 Min. : 0.00 Min. : 2.40
## 1st Qu.:2.600 1st Qu.: 4.35 1st Qu.:10.40
## Median :3.200 Median : 9.45 Median :22.00
## Mean :3.667 Mean :15.34 Mean :21.76
## 3rd Qu.:4.500 3rd Qu.:24.60 3rd Qu.:34.20
## Max. :5.800 Max. :38.90 Max. :42.80
## NA's :47 NA's :32 NA's :33
## Larea-leisure2007_2009 Parea-housing1989_1993 Parea-housing1994_1998
## Min. : 5.80 Min. :34 Min. :16.50
## 1st Qu.:18.70 1st Qu.:34 1st Qu.:29.25
## Median :26.70 Median :34 Median :37.10
## Mean :25.09 Mean :34 Mean :38.79
## 3rd Qu.:31.18 3rd Qu.:34 3rd Qu.:44.05
## Max. :42.00 Max. :34 Max. :71.30
## NA's :38 NA's :49 NA's :43
## Parea-housing1999_2002 Parea-housing2003_2006 Parea-housing2007_2009
## Min. : 4.30 Min. :10.70 Min. :13.10
## 1st Qu.:14.75 1st Qu.:14.25 1st Qu.:15.12
## Median :20.20 Median :19.60 Median :18.00
## Mean :23.02 Mean :24.40 Mean :19.12
## 3rd Qu.:24.45 3rd Qu.:27.98 3rd Qu.:22.70
## Max. :72.00 Max. :72.10 Max. :28.60
## NA's :31 NA's :34 NA's :38
## Ppldens1989_1993 Ppldens1994_1998 Ppldens1999_2002 Ppldens2003_2006
## Min. : 1852 Min. : 2014 Min. : 1384 Min. : 1243
## 1st Qu.: 2674 1st Qu.: 2627 1st Qu.: 2510 1st Qu.: 2608
## Median : 3775 Median : 3816 Median : 3768 Median : 4030
## Mean : 5357 Mean : 4622 Mean : 5271 Mean : 5103
## 3rd Qu.: 6031 3rd Qu.: 5617 3rd Qu.: 5633 3rd Qu.: 6196
## Max. :19797 Max. :15240 Max. :20287 Max. :20467
## NA's :26 NA's :26 NA's :18 NA's :21
## Ppldens2007_2009 Netresidens-housingarea1989_1993
## Min. : 1313 Min. :48871
## 1st Qu.: 2486 1st Qu.:48871
## Median : 3306 Median :48871
## Mean : 4477 Mean :48871
## 3rd Qu.: 5778 3rd Qu.:48871
## Max. :16454 Max. :48871
## NA's :25 NA's :49
## Netresidens-housingarea1994_1998 Netresidens-housingarea1999_2002
## Min. : 7075 Min. : 6422
## 1st Qu.: 7362 1st Qu.: 13804
## Median :11080 Median : 18127
## Mean :18732 Mean : 42694
## 3rd Qu.:22451 3rd Qu.: 25980
## Max. :45694 Max. :465043
## NA's :46 NA's :31
## Netresidens-housingarea2003_2006 Netresidens-housingarea2007_2009
## Min. : 8265 Min. : 8537
## 1st Qu.:12476 1st Qu.:13269
## Median :16750 Median :16869
## Mean :17101 Mean :16338
## 3rd Qu.:20177 3rd Qu.:18911
## Max. :28404 Max. :22743
## NA's :33 NA's :38
## APApartment1989_1993 APApartment1994_1998 APApartment1999_2002
## Min. :1600 Min. : 848 Min. : 217.7
## 1st Qu.:1725 1st Qu.:1700 1st Qu.:1072.7
## Median :1800 Median :2000 Median :1342.5
## Mean :1883 Mean :1821 Mean :1425.4
## 3rd Qu.:1950 3rd Qu.:2150 3rd Qu.:1879.2
## Max. :2400 Max. :2200 Max. :2666.8
## NA's :44 NA's :43 NA's :34
## APApartment2003_2006 APApartment2007_2009 Temp_Jul1989_1993 Temp_Jul1994_1998
## Min. : 341 Min. : 962 Min. :17.40 Min. :14.80
## 1st Qu.:1352 1st Qu.:1660 1st Qu.:18.20 1st Qu.:20.85
## Median :2007 Median :2150 Median :19.00 Median :26.00
## Mean :2017 Mean :2468 Mean :20.77 Mean :25.33
## 3rd Qu.:2681 3rd Qu.:3232 3rd Qu.:22.45 3rd Qu.:29.55
## Max. :4486 Max. :5269 Max. :25.90 Max. :36.00
## NA's :23 NA's :29 NA's :47 NA's :39
## Temp_Jul1999_2002 Temp_Jul2003_2006 Temp_Jul2007_2009 Temp_Jan1989_1993
## Min. :18.50 Min. :16.00 Min. :16.70 Min. :-1.800
## 1st Qu.:20.00 1st Qu.:19.00 1st Qu.:18.50 1st Qu.: 0.300
## Median :21.00 Median :20.45 Median :20.30 Median : 2.400
## Mean :22.40 Mean :21.44 Mean :22.06 Mean : 3.333
## 3rd Qu.:25.12 3rd Qu.:24.52 3rd Qu.:25.12 3rd Qu.: 5.900
## Max. :31.50 Max. :29.20 Max. :32.00 Max. : 9.400
## NA's :20 NA's :16 NA's :22 NA's :47
## Temp_Jan1994_1998 Temp_Jan1999_2002 Temp_Jan2003_2006 Temp_Jan2007_2009
## Min. :-8.500 Min. :-7.200 Min. :-7.700 Min. :-3.000
## 1st Qu.: 1.800 1st Qu.:-0.625 1st Qu.:-1.475 1st Qu.: 0.375
## Median : 4.500 Median : 1.700 Median : 1.600 Median : 1.950
## Mean : 4.122 Mean : 2.480 Mean : 2.006 Mean : 2.754
## 3rd Qu.: 7.600 3rd Qu.: 6.150 3rd Qu.: 5.525 3rd Qu.: 3.650
## Max. :13.400 Max. :13.200 Max. :11.900 Max. :11.700
## NA's :41 NA's :20 NA's :16 NA's :22
## Latitude_deg Latitude_min Latitude_sec Longitude_deg
## Min. :37.00 Min. : 0.00 Min. : 0.00 Min. :-9.00
## 1st Qu.:44.25 1st Qu.:17.50 1st Qu.: 0.00 1st Qu.: 6.00
## Median :50.00 Median :28.00 Median : 0.00 Median :14.00
## Mean :48.72 Mean :29.98 Mean :14.07 Mean :17.44
## 3rd Qu.:53.00 3rd Qu.:45.75 3rd Qu.:32.75 3rd Qu.:26.75
## Max. :59.00 Max. :56.00 Max. :59.76 Max. :60.00
##
## Longitude_min Longitude_sec Lat Lon
## Min. :-59.00 Min. :-57.000 Min. :37.38 Min. :-9.185
## 1st Qu.: 6.00 1st Qu.: 0.000 1st Qu.:44.88 1st Qu.: 6.829
## Median : 21.00 Median : 0.000 Median :50.04 Median :14.336
## Mean : 20.18 Mean : 8.731 Mean :49.22 Mean :17.779
## 3rd Qu.: 39.25 3rd Qu.: 22.250 3rd Qu.:53.50 3rd Qu.:27.143
## Max. : 59.00 Max. : 59.000 Max. :59.93 Max. :60.583
##
## Liveability2010 Mercer_Qual_Liv2011 Mercer_Per_Safe2011 ECM2010
## Min. :61.00 Min. : 1.0 Min. : 5.00 Min. :0.0200
## 1st Qu.:80.00 1st Qu.:16.0 1st Qu.: 11.00 1st Qu.:0.0550
## Median :90.00 Median :30.0 Median : 20.50 Median :0.1000
## Mean :86.23 Mean :33.2 Mean : 34.94 Mean :0.1663
## 3rd Qu.:93.00 3rd Qu.:42.0 3rd Qu.: 40.50 3rd Qu.:0.2300
## Max. :98.00 Max. :84.0 Max. :199.00 Max. :0.8500
## NA's :19 NA's :25 NA's :34 NA's :23
## ECM_Cost2010
## Min. :0.0200
## 1st Qu.:0.1600
## Median :0.2700
## Mean :0.4637
## 3rd Qu.:0.6350
## Max. :1.4200
## NA's :23
Un aspecto clave que influirá en los resultados de los análisis posteriores es la presencia de valores faltantes (NA) en nuestra base de datos. Para conocer la situación de estos datos, utilizaremos el siguiente comando.
library(dplyr)
faltantes <- data.frame(
Variable = names(base_taller),
NAs = colSums(is.na(base_taller))
)
faltantes
## Variable NAs
## City City 0
## City_Eng City_Eng 0
## City_Short City_Short 0
## NAds NAds 0
## Price_Median Price_Median 0
## Price_Mean Price_Mean 0
## Area_Median Area_Median 0
## Area_Mean Area_Mean 0
## Room_Median Room_Median 0
## Room_Mean Room_Mean 0
## Euro_area Euro_area 0
## EU EU 0
## Population Population 0
## City_Area City_Area 0
## Density Density 0
## GDP_PC GDP_PC 0
## GDP_PC_PPS GDP_PC_PPS 0
## GDP_PC2008 GDP_PC2008 0
## GDP_PC2009 GDP_PC2009 0
## GDP_PC2010 GDP_PC2010 0
## Gini Gini 0
## HOR HOR 0
## Kearny_GCI2010 Kearny_GCI2010 0
## LRIR LRIR 0
## Inflation2010 Inflation2010 0
## Inflation2011 Inflation2011 6
## URate URate 0
## MIR2009 MIR2009 2
## MIR2010 MIR2010 2
## Mortgage_PC2010 Mortgage_PC2010 2
## Tppl1989_1993 Tppl1989_1993 17
## Tppl1994_1998 Tppl1994_1998 19
## Tppl1999_2002 Tppl1999_2002 14
## Tppl2003_2006 Tppl2003_2006 14
## Tppl2007_2009 Tppl2007_2009 20
## GDP_PC_PPS1989_1993 GDP_PC_PPS1989_1993 50
## GDP_PC_PPS1994_1998 GDP_PC_PPS1994_1998 16
## GDP_PC_PPS1999_2002 GDP_PC_PPS1999_2002 16
## GDP_PC_PPS2003_2006 GDP_PC_PPS2003_2006 15
## GDP_PC_PPS2007_2009 GDP_PC_PPS2007_2009 15
## CITIES CITIES 13
## DemoDepend1989_1993 DemoDepend1989_1993 22
## DemoDepend1994_1998 DemoDepend1994_1998 25
## DemoDepend1999_2002 DemoDepend1999_2002 19
## DemoDepend2003_2006 DemoDepend2003_2006 17
## DemoDepend2007_2009 DemoDepend2007_2009 23
## DemoODepend1989_1993 DemoODepend1989_1993 17
## DemoODepend1994_1998 DemoODepend1994_1998 21
## DemoODepend1999_2002 DemoODepend1999_2002 14
## DemoODepend2003_2006 DemoODepend2003_2006 13
## DemoODepend2007_2009 DemoODepend2007_2009 20
## Thh1989_1993 Thh1989_1993 20
## Thh1994_1998 Thh1994_1998 27
## Thh1999_2002 Thh1999_2002 13
## Thh2003_2006 Thh2003_2006 23
## Thh2007_2009 Thh2007_2009 27
## Ndwe1989_1993 Ndwe1989_1993 35
## Ndwe1994_1998 Ndwe1994_1998 43
## Ndwe1999_2002 Ndwe1999_2002 13
## Ndwe2003_2006 Ndwe2003_2006 19
## Ndwe2007_2009 Ndwe2007_2009 29
## Napart1989_1993 Napart1989_1993 46
## Napart1994_1998 Napart1994_1998 50
## Napart1999_2002 Napart1999_2002 16
## Napart2003_2006 Napart2003_2006 27
## Napart2007_2009 Napart2007_2009 38
## Nhouse1989_1993 Nhouse1989_1993 45
## Nhouse1994_1998 Nhouse1994_1998 50
## Nhouse1999_2002 Nhouse1999_2002 16
## Nhouse2003_2006 Nhouse2003_2006 27
## Nhouse2007_2009 Nhouse2007_2009 38
## Aphouse1989_1993 Aphouse1989_1993 39
## Aphouse1994_1998 Aphouse1994_1998 38
## Aphouse1999_2002 Aphouse1999_2002 31
## Aphouse2003_2006 Aphouse2003_2006 27
## Aphouse2007_2009 Aphouse2007_2009 31
## ApapartMincome1989_1993 ApapartMincome1989_1993 44
## ApapartMincome1994_1998 ApapartMincome1994_1998 44
## ApapartMincome1999_2002 ApapartMincome1999_2002 38
## ApapartMincome2003_2006 ApapartMincome2003_2006 34
## ApapartMincome2007_2009 ApapartMincome2007_2009 37
## Arent-housing1989_1993 Arent-housing1989_1993 50
## Arent-housing1994_1998 Arent-housing1994_1998 50
## Arent-housing1999_2002 Arent-housing1999_2002 34
## Arent-housing2003_2006 Arent-housing2003_2006 33
## Arent-housing2007_2009 Arent-housing2007_2009 33
## Alarea1989_1993 Alarea1989_1993 32
## Alarea1994_1998 Alarea1994_1998 38
## Alarea1999_2002 Alarea1999_2002 19
## Alarea2003_2006 Alarea2003_2006 30
## Alarea2007_2009 Alarea2007_2009 35
## Phh-owndwe1989_1993 Phh-owndwe1989_1993 23
## Phh-owndwe1994_1998 Phh-owndwe1994_1998 39
## Phh-owndwe1999_2002 Phh-owndwe1999_2002 13
## Phh-owndwe2003_2006 Phh-owndwe2003_2006 32
## Phh-owndwe2007_2009 Phh-owndwe2007_2009 38
## Urate1989_1993 Urate1989_1993 26
## Urate1994_1998 Urate1994_1998 29
## Urate1999_2002 Urate1999_2002 14
## Urate2003_2006 Urate2003_2006 15
## Urate2007_2009 Urate2007_2009 32
## Ncom-head1989_1993 Ncom-head1989_1993 50
## Ncom-head1994_1998 Ncom-head1994_1998 50
## Ncom-head1999_2002 Ncom-head1999_2002 28
## Ncom-head2003_2006 Ncom-head2003_2006 27
## Ncom-head2007_2009 Ncom-head2007_2009 38
## Mhhincome1989_1993 Mhhincome1989_1993 42
## Mhhincome1994_1998 Mhhincome1994_1998 35
## Mhhincome1999_2002 Mhhincome1999_2002 25
## Mhhincome2003_2006 Mhhincome2003_2006 33
## Mhhincome2007_2009 Mhhincome2007_2009 36
## Ahhincome1989_1993 Ahhincome1989_1993 50
## Ahhincome1994_1998 Ahhincome1994_1998 50
## Ahhincome1999_2002 Ahhincome1999_2002 35
## Ahhincome2003_2006 Ahhincome2003_2006 32
## Ahhincome2007_2009 Ahhincome2007_2009 31
## RQ1-Q4earn1989_1993 RQ1-Q4earn1989_1993 50
## RQ1-Q4earn1994_1998 RQ1-Q4earn1994_1998 50
## RQ1-Q4earn1999_2002 RQ1-Q4earn1999_2002 32
## RQ1-Q4earn2003_2006 RQ1-Q4earn2003_2006 36
## RQ1-Q4earn2007_2009 RQ1-Q4earn2007_2009 36
## HhincomeQ21989_1993 HhincomeQ21989_1993 50
## HhincomeQ21994_1998 HhincomeQ21994_1998 50
## HhincomeQ21999_2002 HhincomeQ21999_2002 32
## HhincomeQ22003_2006 HhincomeQ22003_2006 37
## HhincomeQ22007_2009 HhincomeQ22007_2009 36
## HhincomeQ31989_1993 HhincomeQ31989_1993 50
## HhincomeQ31994_1998 HhincomeQ31994_1998 50
## HhincomeQ31999_2002 HhincomeQ31999_2002 32
## HhincomeQ32003_2006 HhincomeQ32003_2006 37
## HhincomeQ32007_2009 HhincomeQ32007_2009 36
## Tlandarea1989_1993 Tlandarea1989_1993 26
## Tlandarea1994_1998 Tlandarea1994_1998 23
## Tlandarea1999_2002 Tlandarea1999_2002 17
## Tlandarea2003_2006 Tlandarea2003_2006 21
## Tlandarea2007_2009 Tlandarea2007_2009 25
## Larea-leisure1989_1993 Larea-leisure1989_1993 48
## Larea-leisure1994_1998 Larea-leisure1994_1998 47
## Larea-leisure1999_2002 Larea-leisure1999_2002 32
## Larea-leisure2003_2006 Larea-leisure2003_2006 33
## Larea-leisure2007_2009 Larea-leisure2007_2009 38
## Parea-housing1989_1993 Parea-housing1989_1993 49
## Parea-housing1994_1998 Parea-housing1994_1998 43
## Parea-housing1999_2002 Parea-housing1999_2002 31
## Parea-housing2003_2006 Parea-housing2003_2006 34
## Parea-housing2007_2009 Parea-housing2007_2009 38
## Ppldens1989_1993 Ppldens1989_1993 26
## Ppldens1994_1998 Ppldens1994_1998 26
## Ppldens1999_2002 Ppldens1999_2002 18
## Ppldens2003_2006 Ppldens2003_2006 21
## Ppldens2007_2009 Ppldens2007_2009 25
## Netresidens-housingarea1989_1993 Netresidens-housingarea1989_1993 49
## Netresidens-housingarea1994_1998 Netresidens-housingarea1994_1998 46
## Netresidens-housingarea1999_2002 Netresidens-housingarea1999_2002 31
## Netresidens-housingarea2003_2006 Netresidens-housingarea2003_2006 33
## Netresidens-housingarea2007_2009 Netresidens-housingarea2007_2009 38
## APApartment1989_1993 APApartment1989_1993 44
## APApartment1994_1998 APApartment1994_1998 43
## APApartment1999_2002 APApartment1999_2002 34
## APApartment2003_2006 APApartment2003_2006 23
## APApartment2007_2009 APApartment2007_2009 29
## Temp_Jul1989_1993 Temp_Jul1989_1993 47
## Temp_Jul1994_1998 Temp_Jul1994_1998 39
## Temp_Jul1999_2002 Temp_Jul1999_2002 20
## Temp_Jul2003_2006 Temp_Jul2003_2006 16
## Temp_Jul2007_2009 Temp_Jul2007_2009 22
## Temp_Jan1989_1993 Temp_Jan1989_1993 47
## Temp_Jan1994_1998 Temp_Jan1994_1998 41
## Temp_Jan1999_2002 Temp_Jan1999_2002 20
## Temp_Jan2003_2006 Temp_Jan2003_2006 16
## Temp_Jan2007_2009 Temp_Jan2007_2009 22
## Latitude_deg Latitude_deg 0
## Latitude_min Latitude_min 0
## Latitude_sec Latitude_sec 0
## Longitude_deg Longitude_deg 0
## Longitude_min Longitude_min 0
## Longitude_sec Longitude_sec 0
## Lat Lat 0
## Lon Lon 0
## Liveability2010 Liveability2010 19
## Mercer_Qual_Liv2011 Mercer_Qual_Liv2011 25
## Mercer_Per_Safe2011 Mercer_Per_Safe2011 34
## ECM2010 ECM2010 23
## ECM_Cost2010 ECM_Cost2010 23
Se identificaron dos valores faltantes en la variable independiente MIR2010 (tasa hipotecaria). Esto es relevante porque, al aplicar MCO, R eliminará automáticamente esas filas en todas las variables. Para el análisis trabajaremos con siete variables: la dependiente será el precio medio de inmuebles (price_Median), explicado a partir de las independientes área media de los departamentos (area_median), habitaciones promedio (room_median), población total (population), PIB per cápita (GDP_PC), tasa de desempleo (URate) y la tasa hipotecaria (MIR2010), considerando únicamente los datos de 2010.
# Cargar librerías necesarias
library(psych)
##
## Adjuntando el paquete: 'psych'
## The following object is masked from 'package:Hmisc':
##
## describe
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(dplyr)
library(ggplot2)
# Seleccionar variables de interés
vars_analisis <- c("Price_Median", "Area_Median", "Room_Median", "Density",
"Population", "GDP_PC", "URate", "MIR2010")
# Filtrar datos y convertir a numérico
datos_analisis <- base_taller[, vars_analisis]
datos_analisis <- data.frame(lapply(datos_analisis, as.numeric))
datos_analisis <- na.omit(datos_analisis)
# Estadísticos descriptivos básicos con summary()
cat("=== ESTADÍSTICOS DESCRIPTIVOS BÁSICOS ===\n")
## === ESTADÍSTICOS DESCRIPTIVOS BÁSICOS ===
summary_stats <- summary(datos_analisis)
print(summary_stats)
## Price_Median Area_Median Room_Median Density
## Min. : 503.1 Min. : 47.00 Min. :2.00 Min. : 1315
## 1st Qu.:1324.7 1st Qu.: 55.10 1st Qu.:2.00 1st Qu.: 2551
## Median :2167.8 Median : 68.50 Median :2.25 Median : 3297
## Mean :2488.7 Mean : 69.17 Mean :2.49 Mean : 4549
## 3rd Qu.:3127.2 3rd Qu.: 79.25 3rd Qu.:3.00 3rd Qu.: 5394
## Max. :8590.9 Max. :100.00 Max. :3.00 Max. :20618
## Population GDP_PC URate MIR2010
## Min. : 401389 Min. : 2535 Min. : 1.700 Min. : 0.920
## 1st Qu.: 784105 1st Qu.:19825 1st Qu.: 5.805 1st Qu.: 2.993
## Median : 1132700 Median :30700 Median : 7.990 Median : 3.815
## Mean : 1885327 Mean :32034 Mean : 9.545 Mean : 7.026
## 3rd Qu.: 1704168 3rd Qu.:43000 3rd Qu.:11.908 3rd Qu.: 8.615
## Max. :12915158 Max. :86464 Max. :23.124 Max. :26.200
# Estadísticos más detallados con psych::describe()
cat("\n=== ESTADÍSTICOS DESCRIPTIVOS DETALLADOS ===\n")
##
## === ESTADÍSTICOS DESCRIPTIVOS DETALLADOS ===
detallado_stats <- describe(datos_analisis)
print(round(detallado_stats, 3))
## vars n mean sd median trimmed mad
## Price_Median 1 48 2488.72 1620.91 2167.76 2272.81 1316.81
## Area_Median 2 48 69.17 14.90 68.50 68.48 17.79
## Room_Median 3 48 2.49 0.50 2.25 2.49 0.37
## Density 4 48 4548.62 3520.04 3296.74 3928.58 1479.41
## Population 5 48 1885327.48 2431882.99 1132700.00 1322120.12 667765.26
## GDP_PC 6 48 32033.68 20843.98 30700.00 30494.03 18532.50
## URate 7 48 9.54 5.26 7.99 9.13 4.76
## MIR2010 8 48 7.03 6.84 3.82 5.60 1.26
## min max range skew kurtosis se
## Price_Median 503.07 8590.87 8087.80 1.54 2.98 233.96
## Area_Median 47.00 100.00 53.00 0.28 -0.99 2.15
## Room_Median 2.00 3.00 1.00 0.04 -2.02 0.07
## Density 1315.07 20617.90 19302.83 2.74 8.74 508.07
## Population 401389.00 12915158.00 12513769.00 3.15 9.93 351012.07
## GDP_PC 2534.91 86463.74 83928.83 0.53 -0.26 3008.57
## URate 1.70 23.12 21.42 0.75 -0.08 0.76
## MIR2010 0.92 26.20 25.28 1.82 2.32 0.99
# Matriz de correlaciones (ya la tienes, pero la incluimos para completitud)
cat("\n=== MATRIZ DE CORRELACIONES ===\n")
##
## === MATRIZ DE CORRELACIONES ===
cor_matrix <- cor(datos_analisis, use = "complete.obs")
print(round(cor_matrix, 3))
## Price_Median Area_Median Room_Median Density Population GDP_PC
## Price_Median 1.000 0.121 0.276 0.498 0.132 0.630
## Area_Median 0.121 1.000 0.748 0.061 0.060 0.304
## Room_Median 0.276 0.748 1.000 0.041 0.020 0.502
## Density 0.498 0.061 0.041 1.000 0.105 0.157
## Population 0.132 0.060 0.020 0.105 1.000 -0.172
## GDP_PC 0.630 0.304 0.502 0.157 -0.172 1.000
## URate -0.140 0.272 0.197 0.058 -0.097 -0.034
## MIR2010 -0.383 -0.545 -0.546 -0.090 0.017 -0.591
## URate MIR2010
## Price_Median -0.140 -0.383
## Area_Median 0.272 -0.545
## Room_Median 0.197 -0.546
## Density 0.058 -0.090
## Population -0.097 0.017
## GDP_PC -0.034 -0.591
## URate 1.000 -0.346
## MIR2010 -0.346 1.000
La variabilidad observada justifica el uso de regresión múltiple para capturar relaciones complejas. La presencia de outliers sugiere la necesidad de verificar la robustez del modelo mediante análisis de sensibilidad.
# Configurar tema para gráficos
theme_set(theme_minimal())
# A) HISTOGRAMAS DE DISTRIBUCIÓN
cat("\nGenerando histogramas de distribución...\n")
##
## Generando histogramas de distribución...
par(mfrow = c(3, 3))
for(i in 1:ncol(datos_analisis)) {
hist(datos_analisis[,i], main = paste("Distribución de", colnames(datos_analisis)[i]),
xlab = colnames(datos_analisis)[i], col = "lightblue", breaks = 15)
}
par(mfrow = c(1, 1))
# B) BOXPLOTS PARA DETECTAR OUTLIERS
cat("\nGenerando boxplots...\n")
##
## Generando boxplots...
par(mfrow = c(3, 3))
for(i in 1:ncol(datos_analisis)) {
boxplot(datos_analisis[,i], main = paste("Boxplot de", colnames(datos_analisis)[i]),
col = "lightgreen")
}
par(mfrow = c(1, 1))
# c) GRÁFICO DE VIOLÍN PARA PRINCIPALES VARIABLES
cat("\nGenerando gráficos de violín...\n")
##
## Generando gráficos de violín...
library(vioplot)
## Cargando paquete requerido: sm
## Package 'sm', version 2.2-6.0: type help(sm) for summary information
## Cargando paquete requerido: zoo
##
## Adjuntando el paquete: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
par(mfrow = c(2, 2))
vioplot(datos_analisis$Price_Median, main = "Distribución Price_Median", col = "gold")
vioplot(datos_analisis$Area_Median, main = "Distribución Area_Median", col = "lightblue")
vioplot(datos_analisis$GDP_PC, main = "Distribución GDP_PC", col = "lightgreen")
vioplot(datos_analisis$URate, main = "Distribución URate", col = "pink")
par(mfrow = c(1, 1))
library(ggplot2)
# Preparar datos directamente
datos_mapa <- base_taller %>%
mutate(
Lat = as.numeric(gsub(",", ".", Lat)),
Lon = as.numeric(gsub(",", ".", Lon)),
GDP_PC = as.numeric(GDP_PC)
) %>%
filter(!is.na(Lat) & !is.na(Lon))
# Mapa básico sin fondo de países
ggplot(datos_mapa, aes(x = Lon, y = Lat)) +
geom_point(aes(color = GDP_PC, size = GDP_PC), alpha = 0.7) +
scale_color_gradient(low = "lightblue", high = "darkblue") +
theme_minimal() +
labs(title = "PIB per cápita en ciudades europeas",)
El gráfico muestra cómo se distribuye el PIB per cápita en las ciudades europeas, ubicándolas según su latitud y longitud. Se observa que la mayor concentración de ciudades está en la parte central del continente, especialmente entre las latitudes 45 y 55 y las longitudes 0 y 20. El color y el tamaño de los círculos representan el nivel de PIB per cápita: los círculos más grandes y de color más oscuro indican un mayor ingreso, mientras que los más pequeños y claros muestran ingresos más bajos. De esta forma, se nota que en las ciudades de Europa occidental y central predominan los niveles más altos de PIB per cápita, mientras que en el este de Europa se observan valores más bajos.
# Filtrar solo las variables numéricas de interés
vars <- c("Price_Median", "Area_Median", "Room_Median", "GDP_PC", "URate", "MIR2010")
datos_cor <- base_taller[, vars]
library(Hmisc)
cor_test <- rcorr(as.matrix(datos_cor))
library(Hmisc)
library(corrplot)
vars <- c("Price_Median", "Area_Median", "Room_Median", "GDP_PC", "URate", "MIR2010")
datos_cor <- base_taller[, vars]
datos_cor <- data.frame(lapply(datos_cor, as.numeric))
datos_cor <- na.omit(datos_cor)
# Calcular correlaciones y p-valores
cor_test <- rcorr(as.matrix(datos_cor))
matrizcor <- round(cor_test$r, 3)
pval <- cor_test$P
# Asegurar que las dimensiones y nombres coincidan
stopifnot(identical(dim(matrizcor), dim(pval)))
rownames(pval) <- rownames(matrizcor)
colnames(pval) <- colnames(matrizcor)
# Reemplazar NA en pval si los hay
pval[is.na(pval)] <- 1
# Graficar
corrplot(
matrizcor,
method = "color",
type = "upper",
tl.col = "black",
tl.srt = 45,
addCoef.col = "black",
number.cex = 0.7,
p.mat = pval, # Matriz de p-valores
sig.level = 0.05, # Nivel de significancia
insig = "blank", # Oculta correlaciones no significativas
title = "Mapa de calor de correlaciones significativas",
mar = c(0, 0, 2, 0) # Márgenes del gráfico
)
# Seleccionar las variables vars <- c(“Price_Median”, “Area_Median”,
“Room_Median”, “GDP_PC”, “URate”, “MIR2010”)
pairs(base_taller[vars], main = “Matriz de dispersión”, pch = 19, col = “blue”) # cambia el color aquí (ej: “red”, “purple”, “orange”)
## Matriz de dispersión con pares de variables
# Seleccionar las variables
vars <- c("Price_Median", "Area_Median", "Room_Median", "GDP_PC", "URate", "MIR2010")
# Matriz de dispersión con otro color
pairs(base_taller[vars],
main = "Matriz de dispersión",
pch = 19, col = "blue") # puedes cambiar: "red", "purple", "orange", etc.
## Dispersión de Price_Median vs Area_Median
ggplot(base_taller, aes(x = Area_Median, y = Price_Median)) +
geom_point(color = "steelblue", alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Dispersión de Price_Median vs Area_Median",
x = "Area_Median", y = "Price_Median") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Gráfico 2 – Price_Median vs Room_Media
## Dispersión de Price_Median vs Room_Median
ggplot(base_taller, aes(x = Room_Median, y = Price_Median)) +
geom_point(color = "steelblue", alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Dispersión de Price_Median vs Room_Median",
x = "Room_Median", y = "Price_Median") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Gráfico 3
## Dispersión de Price_Median vs GDP_PC
ggplot(base_taller, aes(x = GDP_PC, y = Price_Median)) +
geom_point(color = "steelblue", alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Dispersión de Price_Median vs GDP_PC",
x = "GDP_PC", y = "Price_Median") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Gráfico 4
## Dispersión de Price_Median vs URate
ggplot(base_taller, aes(x = URate, y = Price_Median)) +
geom_point(color = "steelblue", alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Dispersión de Price_Median vs URate",
x = "URate", y = "Price_Median") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Gráfico 5
## Dispersión de Price_Median vs MIR2010
ggplot(base_taller, aes(x = MIR2010, y = Price_Median)) +
geom_point(color = "steelblue", alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Dispersión de Price_Median vs MIR2010",
x = "MIR2010", y = "Price_Median") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
# 1. CARGAR LIBRERÍAS
library(car)
## Cargando paquete requerido: carData
##
## Adjuntando el paquete: 'car'
## The following object is masked from 'package:psych':
##
## logit
## The following object is masked from 'package:dplyr':
##
## recode
library(lmtest)
library(gridExtra)
##
## Adjuntando el paquete: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
# 2. PREPARAR DATOS
cat("=== PREPARACIÓN DE DATOS ===\n")
## === PREPARACIÓN DE DATOS ===
variables_modelo <- c("Price_Median", "Area_Median", "Room_Median", "Density",
"Population", "GDP_PC", "URate", "MIR2010")
# Filtrar variables existentes
vars_disponibles <- variables_modelo[variables_modelo %in% names(base_taller)]
cat("Variables disponibles:", paste(vars_disponibles, collapse = ", "), "\n")
## Variables disponibles: Price_Median, Area_Median, Room_Median, Density, Population, GDP_PC, URate, MIR2010
datos_completos <- base_taller[, vars_disponibles]
datos_completos <- data.frame(lapply(datos_completos, as.numeric))
datos_completos <- na.omit(datos_completos)
cat("Observaciones finales:", nrow(datos_completos), "\n\n")
## Observaciones finales: 48
# 3. ESTADÍSTICOS DESCRIPTIVOS
cat("=== ESTADÍSTICOS DESCRIPTIVOS ===\n")
## === ESTADÍSTICOS DESCRIPTIVOS ===
print(describe(datos_completos))
## vars n mean sd median trimmed mad
## Price_Median 1 48 2488.72 1620.91 2167.76 2272.81 1316.80
## Area_Median 2 48 69.17 14.90 68.50 68.48 17.79
## Room_Median 3 48 2.49 0.50 2.25 2.49 0.37
## Density 4 48 4548.62 3520.04 3296.74 3928.59 1479.41
## Population 5 48 1885327.48 2431882.99 1132700.00 1322120.12 667765.26
## GDP_PC 6 48 32033.68 20843.98 30700.00 30494.03 18532.50
## URate 7 48 9.55 5.26 7.99 9.13 4.76
## MIR2010 8 48 7.03 6.84 3.82 5.60 1.26
## min max range skew kurtosis se
## Price_Median 503.07 8590.87 8087.80 1.54 2.98 233.96
## Area_Median 47.00 100.00 53.00 0.28 -0.99 2.15
## Room_Median 2.00 3.00 1.00 0.04 -2.02 0.07
## Density 1315.07 20617.90 19302.83 2.74 8.74 508.07
## Population 401389.00 12915158.00 12513769.00 3.15 9.93 351012.07
## GDP_PC 2534.91 86463.74 83928.83 0.53 -0.26 3008.57
## URate 1.70 23.12 21.42 0.75 -0.08 0.76
## MIR2010 0.92 26.20 25.28 1.82 2.32 0.99
# 4. GRÁFICOS EXPLORATORIOS (ggplot2 - sin errores de márgenes)
cat("\n=== GRÁFICOS EXPLORATORIOS ===\n")
##
## === GRÁFICOS EXPLORATORIOS ===
# Histogramas
plot_list <- list()
for(i in 1:ncol(datos_completos)) {
var_name <- colnames(datos_completos)[i]
p <- ggplot(datos_completos, aes(x = .data[[var_name]])) +
geom_histogram(fill = "lightblue", color = "black", bins = 10) +
labs(title = paste("Distribución de", var_name), x = var_name) +
theme_minimal()
plot_list[[i]] <- p
}
grid.arrange(grobs = plot_list, ncol = 3)
# Diagramas de dispersión vs Price_Median
scatter_list <- list()
vars_ind <- setdiff(vars_disponibles, "Price_Median")
for(var in vars_ind) {
p <- ggplot(datos_completos, aes(x = .data[[var]], y = Price_Median)) +
geom_point(alpha = 0.6, color = "blue") +
geom_smooth(method = "lm", color = "red") +
labs(title = paste("Price_Median vs", var), x = var, y = "Price_Median") +
theme_minimal()
scatter_list[[var]] <- p
}
grid.arrange(grobs = scatter_list, ncol = 3)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
# 5. MODELO DE REGRESIÓN
cat("\n=== MODELO DE REGRESIÓN LINEAL MÚLTIPLE ===\n")
##
## === MODELO DE REGRESIÓN LINEAL MÚLTIPLE ===
# Especificar fórmula del modelo
formula_modelo <- as.formula(paste("Price_Median ~", paste(vars_ind, collapse = " + ")))
modelo <- lm(formula_modelo, data = datos_completos)
# Resultados detallados
summary_modelo <- summary(modelo)
print(summary_modelo)
##
## Call:
## lm(formula = formula_modelo, data = datos_completos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1839.24 -694.31 52.88 490.60 2596.36
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.487e+03 1.250e+03 1.190 0.241245
## Area_Median -1.699e+01 1.702e+01 -0.998 0.324133
## Room_Median 1.940e+02 5.286e+02 0.367 0.715582
## Density 1.844e-01 4.591e-02 4.016 0.000253 ***
## Population 1.173e-04 6.807e-05 1.724 0.092485 .
## GDP_PC 4.082e-02 1.112e-02 3.671 0.000706 ***
## URate -4.559e+01 3.445e+01 -1.323 0.193250
## MIR2010 -3.396e+01 3.510e+01 -0.967 0.339108
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1074 on 40 degrees of freedom
## Multiple R-squared: 0.6265, Adjusted R-squared: 0.5612
## F-statistic: 9.586 on 7 and 40 DF, p-value: 6.164e-07
# 6. DIAGNÓSTICOS DEL MODELO
cat("\n=== DIAGNÓSTICOS DEL MODELO ===\n")
##
## === DIAGNÓSTICOS DEL MODELO ===
# Multicolinealidad (VIF)
cat("ANÁLISIS DE MULTICOLINEALIDAD (VIF):\n")
## ANÁLISIS DE MULTICOLINEALIDAD (VIF):
vif_resultados <- vif(modelo)
print(data.frame(Variable = names(vif_resultados), VIF = round(vif_resultados, 2)))
## Variable VIF
## Area_Median Area_Median 2.62
## Room_Median Room_Median 2.85
## Density Density 1.06
## Population Population 1.12
## GDP_PC GDP_PC 2.19
## URate URate 1.34
## MIR2010 MIR2010 2.35
# Pruebas de supuestos
cat("\nPRUEBAS DE SUPUESTOS:\n")
##
## PRUEBAS DE SUPUESTOS:
cat("Normalidad (Shapiro-Wilk):")
## Normalidad (Shapiro-Wilk):
shapiro_test <- shapiro.test(residuals(modelo))
print(shapiro_test)
##
## Shapiro-Wilk normality test
##
## data: residuals(modelo)
## W = 0.96001, p-value = 0.1011
cat("\nHomocedasticidad (Breusch-Pagan):")
##
## Homocedasticidad (Breusch-Pagan):
bp_test <- bptest(modelo)
print(bp_test)
##
## studentized Breusch-Pagan test
##
## data: modelo
## BP = 9.3483, df = 7, p-value = 0.2286
cat("\nAutocorrelación (Durbin-Watson):")
##
## Autocorrelación (Durbin-Watson):
dw_test <- dwtest(modelo)
print(dw_test)
##
## Durbin-Watson test
##
## data: modelo
## DW = 1.9592, p-value = 0.439
## alternative hypothesis: true autocorrelation is greater than 0
# 7. GRÁFICOS DE DIAGNÓSTICO
cat("\n=== GRÁFICOS DE DIAGNÓSTICO ===\n")
##
## === GRÁFICOS DE DIAGNÓSTICO ===
par(mfrow = c(2, 2))
plot(modelo, which = 1:4)
par(mfrow = c(1, 1))
# 8. RESULTADOS RESUMIDOS EN TABLA
cat("\n=== TABLA RESUMEN DE RESULTADOS ===\n")
##
## === TABLA RESUMEN DE RESULTADOS ===
resultados_tabla <- data.frame(
Variable = c("Intercepto", vars_ind),
Coeficiente = round(coef(modelo), 4),
Error_Std = round(summary_modelo$coefficients[, 2], 4),
t_value = round(summary_modelo$coefficients[, 3], 3),
p_value = round(summary_modelo$coefficients[, 4], 4),
Significancia = ifelse(summary_modelo$coefficients[, 4] < 0.001, "***",
ifelse(summary_modelo$coefficients[, 4] < 0.01, "**",
ifelse(summary_modelo$coefficients[, 4] < 0.05, "*",
ifelse(summary_modelo$coefficients[, 4] < 0.1, ".", "ns"))))
)
print(resultados_tabla)
## Variable Coeficiente Error_Std t_value p_value Significancia
## (Intercept) Intercepto 1487.1538 1250.2043 1.190 0.2412 ns
## Area_Median Area_Median -16.9894 17.0183 -0.998 0.3241 ns
## Room_Median Room_Median 193.9879 528.6325 0.367 0.7156 ns
## Density Density 0.1844 0.0459 4.016 0.0003 ***
## Population Population 0.0001 0.0001 1.724 0.0925 .
## GDP_PC GDP_PC 0.0408 0.0111 3.671 0.0007 ***
## URate URate -45.5866 34.4492 -1.323 0.1933 ns
## MIR2010 MIR2010 -33.9564 35.0971 -0.967 0.3391 ns
# 9. INTERPRETACIÓN AUTOMÁTICA
cat("\n=== INTERPRETACIÓN DE RESULTADOS ===\n")
##
## === INTERPRETACIÓN DE RESULTADOS ===
# Bondad de ajuste
cat("BONDAD DE AJUSTE:\n")
## BONDAD DE AJUSTE:
cat("- R-cuadrado:", round(summary_modelo$r.squared, 4), "(", round(summary_modelo$r.squared * 100, 1), "%)\n")
## - R-cuadrado: 0.6265 ( 62.7 %)
cat("- R-cuadrado ajustado:", round(summary_modelo$adj.r.squared, 4), "\n")
## - R-cuadrado ajustado: 0.5612
cat("- Estadístico F:", round(summary_modelo$fstatistic[1], 2), "\n")
## - Estadístico F: 9.59
cat("- Valor-p del modelo:", pf(summary_modelo$fstatistic[1],
summary_modelo$fstatistic[2],
summary_modelo$fstatistic[3],
lower.tail = FALSE), "\n")
## - Valor-p del modelo: 6.16443e-07
# Variables significativas
vars_significativas <- rownames(summary_modelo$coefficients)[summary_modelo$coefficients[, 4] < 0.05]
vars_significativas <- vars_significativas[vars_significativas != "(Intercept)"]
cat("\nVARIABLES SIGNIFICATIVAS (p < 0.05):",
if(length(vars_significativas) > 0) paste(vars_significativas, collapse = ", ") else "Ninguna", "\n")
##
## VARIABLES SIGNIFICATIVAS (p < 0.05): Density, GDP_PC
# Variables más influyentes (coeficientes estandarizados)
coef_estandarizados <- coef(modelo)[-1] * apply(model.matrix(modelo)[, -1], 2, sd) / sd(datos_completos$Price_Median)
top_variables <- names(sort(abs(coef_estandarizados), decreasing = TRUE)[1:3])
cat("VARIABLES MÁS INFLUYENTES:", paste(top_variables, collapse = ", "), "\n")
## VARIABLES MÁS INFLUYENTES: GDP_PC, Density, Population
# Interpretación de supuestos
cat("\nVERIFICACIÓN DE SUPUESTOS:\n")
##
## VERIFICACIÓN DE SUPUESTOS:
cat("- Normalidad de residuos:", ifelse(shapiro_test$p.value > 0.05, "CUMPLE", "NO CUMPLE"),
"(p =", round(shapiro_test$p.value, 4), ")\n")
## - Normalidad de residuos: CUMPLE (p = 0.1011 )
cat("- Homocedasticidad:", ifelse(bp_test$p.value > 0.05, "CUMPLE", "NO CUMPLE"),
"(p =", round(bp_test$p.value, 4), ")\n")
## - Homocedasticidad: CUMPLE (p = 0.2286 )
cat("- Autocorrelación:", ifelse(dw_test$p.value > 0.05, "CUMPLE", "NO CUMPLE"),
"(p =", round(dw_test$p.value, 4), ")\n")
## - Autocorrelación: CUMPLE (p = 0.439 )
# 10. PREDICCIONES Y APLICACIONES
cat("\n=== APLICACIONES PRÁCTICAS ===\n")
##
## === APLICACIONES PRÁCTICAS ===
cat("Ecuación del modelo:\n")
## Ecuación del modelo:
cat("Price_Median =", round(coef(modelo)[1], 2))
## Price_Median = 1487.15
for(i in 2:length(coef(modelo))) {
cat(" +", round(coef(modelo)[i], 2), "*", names(coef(modelo))[i])
}
## + -16.99 * Area_Median + 193.99 * Room_Median + 0.18 * Density + 0 * Population + 0.04 * GDP_PC + -45.59 * URate + -33.96 * MIR2010
cat("\n")
# Ejemplo de predicción
if(nrow(datos_completos) > 0) {
ejemplo_prediccion <- predict(modelo, newdata = datos_completos[1, , drop = FALSE])
cat("\nEjemplo de predicción para la primera observación:\n")
cat("Valor real:", datos_completos$Price_Median[1], "\n")
cat("Valor predicho:", round(ejemplo_prediccion, 2), "\n")
cat("Residuo:", round(datos_completos$Price_Median[1] - ejemplo_prediccion, 2), "\n")
}
##
## Ejemplo de predicción para la primera observación:
## Valor real: 3419.973
## Valor predicho: 3397.91
## Residuo: 22.07
cat("\n=== ANÁLISIS COMPLETADO ===\n")
##
## === ANÁLISIS COMPLETADO ===
Los resultados del modelo de regresión lineal múltiple revelan que las variables [MENCIONAR VARIABLES SIGNIFICATIVAS] tienen un impacto estadísticamente significativo en el precio medio de las viviendas. El modelo explica aproximadamente el [INSERTAR R²*100]% de la variabilidad en los precios, lo que indica una capacidad predictiva [ALTA/MODERADA/BAJA].Las variables más influyentes según los coeficientes estandarizados son [LISTAR TOP 3 VARIABLES], sugiriendo que los factores [ECONÓMICOS/CARACTERÍSTICAS DE VIVIENDA] son los principales determinantes de los precios. El cumplimiento de los supuestos de MCO es [SATISFACTORIO/PARCIAL/INSUFICIENTE], lo que [VALIDA/ CUESTIONA] la robustez de las estimaciones.