El presente estudio se enmarca en el análisis del mercado inmobiliario europeo, realizado a solicitud de una empresa inmobiliaria que busca comprender los factores determinantes de los precios de los inmuebles. La empresa opera en varias ciudades y ha proporcionado datos que incluyen variables de características de las viviendas (como el área media y el número de habitaciones), así como variables económicas (como el PIB per cápita y la tasa de desempleo). En primer lugar, la teoría económica de los bienes raíces señala que las características físicas de una vivienda (tamaño, número de habitaciones, ubicación) determinan de manera directa su valor en el mercado, ya que incrementan la utilidad percibida por el consumidor (Rosen, 1974; Lancaster, 1966). Por lo tanto, se espera que el área media (Area_Median) y el número medio de habitaciones (Room_Median) tengan una relación positiva y significativa con el precio de los inmuebles.
En cuanto a las variables demográficas, la densidad de población (Density) y la población total (Population) pueden tener efectos ambiguos. Según la teoría de la demanda de vivienda en entornos urbanos (Alonso, 1964; Muth, 1969), una mayor densidad puede elevar los precios al reflejar una mayor presión de demanda; sin embargo, una densidad excesiva puede asociarse con congestión y externalidades negativas, lo que reduciría la disposición a pagar por vivienda en esas áreas. La población total, por su parte, se espera que influya positivamente, ya que a mayor número de habitantes, mayor demanda potencial de inmuebles. En cuanto a las variables macroeconómicas, el PIB per cápita refleja el nivel de renta disponible en cada región y se espera que guarde una relación positiva con los precios de la vivienda, dado que mayores ingresos permiten a los hogares destinar más recursos a este tipo de bienes. La tasa de desempleo, en cambio, tendería a reducir los precios, pues limita la capacidad de compra y debilita la demanda inmobiliaria. De forma similar, unas tasas de interés hipotecarias elevadas encarecen el crédito y restringen la posibilidad de adquirir vivienda, lo que suele traducirse en una menor presión sobre los precios.
library(dplyr)
##
## Adjuntando el paquete: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2) # <- agrega esta
library(GGally)
library(Hmisc)
##
## Adjuntando el paquete: 'Hmisc'
## The following objects are masked from 'package:dplyr':
##
## src, summarize
## The following objects are masked from 'package:base':
##
## format.pval, units
library(corrplot)
## corrplot 0.95 loaded
library(readxl)
# Cargar archivo Excel
base_taller <- read_excel("C:/Users/carab/Downloads/Rosi_files/dp2015-13_Dataset.xls")
# Ver tabla en pestaña nueva
View(base_taller)
head(base_taller)
## # A tibble: 6 × 184
## City City_Eng City_Short NAds Price_Median Price_Mean Area_Median Area_Mean
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Amste… Amsterd… AMS 9520 3420. 3426. 71 75.3
## 2 Athina Athens ATH 10782 2064. 2109. 76 77.2
## 3 Barce… Barcelo… BCN 5479 3140 3268. 77 80.8
## 4 Beogr… Belgrade BEG 12797 1417. 1466. 58 60.2
## 5 Berlin Berlin BER 16772 2150. 2314. 75 82.3
## 6 Bruxe… Brussels BRU 7879 2357. 2428. 95 98.4
## # ℹ 176 more variables: Room_Median <dbl>, Room_Mean <dbl>, Euro_area <dbl>,
## # EU <dbl>, Population <dbl>, City_Area <dbl>, Density <dbl>, GDP_PC <dbl>,
## # GDP_PC_PPS <dbl>, GDP_PC2008 <dbl>, GDP_PC2009 <dbl>, GDP_PC2010 <dbl>,
## # Gini <dbl>, HOR <dbl>, Kearny_GCI2010 <dbl>, LRIR <dbl>,
## # Inflation2010 <dbl>, Inflation2011 <dbl>, URate <dbl>, MIR2009 <dbl>,
## # MIR2010 <dbl>, Mortgage_PC2010 <dbl>, Tppl1989_1993 <dbl>,
## # Tppl1994_1998 <dbl>, Tppl1999_2002 <dbl>, Tppl2003_2006 <dbl>, …
tail(base_taller)
## # A tibble: 6 × 184
## City City_Eng City_Short NAds Price_Median Price_Mean Area_Median Area_Mean
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Tall… Tallinn TAL 8876 1062. 1059. 52.6 54.1
## 2 Tori… Turin TUR 8525 2225 2321. 75 80.5
## 3 Vale… Valencia VAL 14537 1741. 1849. 93 96.5
## 4 Viln… Vilnius VIL 2794 1251. 1289. 58 59.1
## 5 Wars… Warsaw WAW 155154 1934. 1976. 54 56.7
## 6 Wien Vienna VIE 10370 3571. 3657. 87 93.0
## # ℹ 176 more variables: Room_Median <dbl>, Room_Mean <dbl>, Euro_area <dbl>,
## # EU <dbl>, Population <dbl>, City_Area <dbl>, Density <dbl>, GDP_PC <dbl>,
## # GDP_PC_PPS <dbl>, GDP_PC2008 <dbl>, GDP_PC2009 <dbl>, GDP_PC2010 <dbl>,
## # Gini <dbl>, HOR <dbl>, Kearny_GCI2010 <dbl>, LRIR <dbl>,
## # Inflation2010 <dbl>, Inflation2011 <dbl>, URate <dbl>, MIR2009 <dbl>,
## # MIR2010 <dbl>, Mortgage_PC2010 <dbl>, Tppl1989_1993 <dbl>,
## # Tppl1994_1998 <dbl>, Tppl1999_2002 <dbl>, Tppl2003_2006 <dbl>, …
summary(base_taller)
## City City_Eng City_Short NAds
## Length:50 Length:50 Length:50 Min. : 576
## Class :character Class :character Class :character 1st Qu.: 4932
## Mode :character Mode :character Mode :character Median : 9694
## Mean : 17924
## 3rd Qu.: 15981
## Max. :155154
##
## Price_Median Price_Mean Area_Median Area_Mean
## Min. : 503.1 Min. : 534.7 Min. : 47.00 Min. : 51.12
## 1st Qu.:1305.3 1st Qu.:1322.1 1st Qu.: 55.03 1st Qu.: 59.10
## Median :2107.2 Median :2200.7 Median : 65.50 Median : 68.07
## Mean :2436.7 Mean :2535.0 Mean : 68.56 Mean : 72.14
## 3rd Qu.:3092.2 3rd Qu.:3176.5 3rd Qu.: 78.50 3rd Qu.: 82.63
## Max. :8590.9 Max. :8865.7 Max. :100.00 Max. :109.49
##
## Room_Median Room_Mean Euro_area EU Population
## Min. :2.00 Min. :1.805 Min. :0.0 Min. :0.0 Min. : 401389
## 1st Qu.:2.00 1st Qu.:2.172 1st Qu.:0.0 1st Qu.:0.0 1st Qu.: 799132
## Median :2.00 Median :2.476 Median :0.5 Median :1.0 Median : 1148799
## Mean :2.47 Mean :2.515 Mean :0.5 Mean :0.7 Mean : 1872288
## 3rd Qu.:3.00 3rd Qu.:2.832 3rd Qu.:1.0 3rd Qu.:1.0 3rd Qu.: 1707910
## Max. :3.00 Max. :3.394 Max. :1.0 Max. :1.0 Max. :12915158
##
## City_Area Density GDP_PC GDP_PC_PPS
## Min. : 84.8 Min. : 1315 Min. : 2535 Min. : 4696
## 1st Qu.: 169.9 1st Qu.: 2571 1st Qu.:11862 1st Qu.:15047
## Median : 375.2 Median : 3297 Median :29550 Median :30400
## Mean : 523.4 Mean : 4552 Mean :31026 Mean :31335
## 3rd Qu.: 500.0 3rd Qu.: 5551 3rd Qu.:42750 3rd Qu.:42750
## Max. :5512.0 Max. :20618 Max. :86464 Max. :76200
##
## GDP_PC2008 GDP_PC2009 GDP_PC2010 Gini
## Min. : 2535 Min. : 1872 Min. : 2301 Min. :23.60
## 1st Qu.:10537 1st Qu.: 9384 1st Qu.:10685 1st Qu.:28.35
## Median :25848 Median :24680 Median :25899 Median :31.85
## Mean :31226 Mean :28888 Mean :30422 Mean :32.89
## 3rd Qu.:47699 3rd Qu.:43128 3rd Qu.:48444 3rd Qu.:36.40
## Max. :88300 Max. :84421 Max. :85861 Max. :52.10
##
## HOR Kearny_GCI2010 LRIR Inflation2010
## Min. :12.80 Min. :0.0000 Min. : 0.690 Min. :-1.600
## 1st Qu.:30.02 1st Qu.:0.0000 1st Qu.: 3.770 1st Qu.: 1.450
## Median :66.15 Median :0.0000 Median : 4.515 Median : 2.000
## Mean :58.74 Mean :0.9976 Mean : 6.525 Mean : 3.885
## 3rd Qu.:83.25 3rd Qu.:2.3050 3rd Qu.:10.706 3rd Qu.: 8.400
## Max. :97.42 Max. :5.8600 Max. :15.970 Max. :10.500
##
## Inflation2011 URate MIR2009 MIR2010
## Min. : 1.200 Min. : 0.300 Min. : 1.260 Min. : 0.920
## 1st Qu.: 2.500 1st Qu.: 5.802 1st Qu.: 3.277 1st Qu.: 2.993
## Median : 3.100 Median : 7.990 Median : 4.360 Median : 3.815
## Mean : 4.816 Mean : 9.446 Mean : 7.457 Mean : 7.026
## 3rd Qu.: 5.275 3rd Qu.:11.908 3rd Qu.: 9.352 3rd Qu.: 8.615
## Max. :53.200 Max. :23.124 Max. :26.000 Max. :26.200
## NA's :6 NA's :2 NA's :2
## Mortgage_PC2010 Tppl1989_1993 Tppl1994_1998 Tppl1999_2002
## Min. : 0.190 Min. : 464773 Min. : 421249 Min. : 399685
## 1st Qu.: 0.395 1st Qu.: 702444 1st Qu.: 704303 1st Qu.: 722104
## Median : 6.475 Median :1067365 Median : 964346 Median : 993134
## Mean :10.195 Mean :1450095 Mean :1393183 Mean :1570878
## 3rd Qu.:14.090 3rd Qu.:1655272 3rd Qu.:1611954 3rd Qu.:1698320
## Max. :45.160 Max. :6829300 Max. :6901300 Max. :8803468
## NA's :2 NA's :17 NA's :19 NA's :14
## Tppl2003_2006 Tppl2007_2009 GDP_PC_PPS1989_1993 GDP_PC_PPS1994_1998
## Min. : 392306 Min. : 401389 Mode:logical Min. : 6900
## 1st Qu.: 689874 1st Qu.: 704162 NA's:50 1st Qu.:13675
## Median :1000174 Median :1021956 Median :20700
## Mean :1363199 Mean :1467902 Mean :23215
## 3rd Qu.:1622183 3rd Qu.:1695450 3rd Qu.:29500
## Max. :7413100 Max. :7668300 Max. :54300
## NA's :14 NA's :20 NA's :16
## GDP_PC_PPS1999_2002 GDP_PC_PPS2003_2006 GDP_PC_PPS2007_2009 CITIES
## Min. :10600 Min. :13900 Min. :16000 Length:50
## 1st Qu.:21175 1st Qu.:22700 1st Qu.:26550 Class :character
## Median :27800 Median :31100 Median :34300 Mode :character
## Mean :30556 Mean :33851 Mean :39020
## 3rd Qu.:37575 3rd Qu.:43300 3rd Qu.:48450
## Max. :65800 Max. :70100 Max. :76200
## NA's :16 NA's :15 NA's :15
## DemoDepend1989_1993 DemoDepend1994_1998 DemoDepend1999_2002
## Min. :48.10 Min. :49.40 Min. :48.40
## 1st Qu.:53.83 1st Qu.:53.40 1st Qu.:54.40
## Median :61.30 Median :61.00 Median :57.00
## Mean :60.58 Mean :59.36 Mean :57.36
## 3rd Qu.:66.72 3rd Qu.:63.80 3rd Qu.:59.30
## Max. :72.40 Max. :68.10 Max. :71.50
## NA's :22 NA's :25 NA's :19
## DemoDepend2003_2006 DemoDepend2007_2009 DemoODepend1989_1993
## Min. :48.20 Min. :45.10 Min. :11.60
## 1st Qu.:52.30 1st Qu.:51.85 1st Qu.:21.00
## Median :55.50 Median :55.40 Median :24.30
## Mean :57.08 Mean :56.45 Mean :24.02
## 3rd Qu.:59.90 3rd Qu.:59.80 3rd Qu.:26.30
## Max. :73.10 Max. :73.90 Max. :35.20
## NA's :17 NA's :23 NA's :17
## DemoODepend1994_1998 DemoODepend1999_2002 DemoODepend2003_2006
## Min. :19.90 Min. :17.80 Min. : 8.00
## 1st Qu.:22.10 1st Qu.:23.05 1st Qu.:22.30
## Median :24.90 Median :25.30 Median :25.60
## Mean :24.97 Mean :25.87 Mean :25.63
## 3rd Qu.:26.50 3rd Qu.:27.60 3rd Qu.:27.90
## Max. :33.80 Max. :39.70 Max. :41.20
## NA's :21 NA's :14 NA's :13
## DemoODepend2007_2009 Thh1989_1993 Thh1994_1998 Thh1999_2002
## Min. :16.50 Min. : 181700 Min. : 172189 Min. : 173215
## 1st Qu.:23.60 1st Qu.: 321757 1st Qu.: 300098 1st Qu.: 301566
## Median :27.25 Median : 468546 Median : 501000 Median : 459053
## Mean :27.29 Mean : 640428 Mean : 677489 Mean : 650780
## 3rd Qu.:29.25 3rd Qu.: 745727 3rd Qu.: 784475 3rd Qu.: 757578
## Max. :42.10 Max. :2841000 Max. :3002000 Max. :3015997
## NA's :20 NA's :20 NA's :27 NA's :13
## Thh2003_2006 Thh2007_2009 Ndwe1989_1993 Ndwe1994_1998
## Min. : 169913 Min. : 179823 Min. : 245810 Min. : 172218
## 1st Qu.: 315000 1st Qu.: 311600 1st Qu.: 336712 1st Qu.: 318326
## Median : 445413 Median : 441678 Median : 640641 Median : 483800
## Mean : 740042 Mean : 709285 Mean : 675929 Mean : 656970
## 3rd Qu.: 861892 3rd Qu.: 778750 3rd Qu.: 826938 3rd Qu.: 756840
## Max. :3112000 Max. :3243000 Max. :1734320 Max. :1792443
## NA's :23 NA's :27 NA's :35 NA's :43
## Ndwe1999_2002 Ndwe2003_2006 Ndwe2007_2009 Napart1989_1993
## Min. : 172753 Min. : 175200 Min. : 180500 Min. : 216902
## 1st Qu.: 318722 1st Qu.: 309745 1st Qu.: 300026 1st Qu.: 293286
## Median : 459753 Median : 473332 Median : 398317 Median : 493569
## Mean : 714348 Mean : 761303 Mean : 575072 Mean : 583210
## 3rd Qu.: 779094 3rd Qu.: 827429 3rd Qu.: 743266 3rd Qu.: 783494
## Max. :3401080 Max. :3414094 Max. :1890837 Max. :1128801
## NA's :13 NA's :19 NA's :29 NA's :46
## Napart1994_1998 Napart1999_2002 Napart2003_2006 Napart2007_2009
## Mode:logical Min. : 50230 Min. : 52280 Min. : 165200
## NA's:50 1st Qu.: 278208 1st Qu.: 287361 1st Qu.: 270472
## Median : 364509 Median : 371303 Median : 310492
## Mean : 533246 Mean : 542447 Mean : 579920
## 3rd Qu.: 651279 3rd Qu.: 660014 3rd Qu.: 663682
## Max. :1692262 Max. :1694180 Max. :1721929
## NA's :16 NA's :27 NA's :38
## Nhouse1989_1993 Nhouse1994_1998 Nhouse1999_2002 Nhouse2003_2006
## Min. : 10383 Mode:logical Min. : 5354 Min. : 10162
## 1st Qu.: 19111 NA's:50 1st Qu.: 21275 1st Qu.: 27275
## Median : 28908 Median : 43705 Median : 45387
## Mean : 50611 Mean : 103348 Mean : 74495
## 3rd Qu.: 40959 3rd Qu.: 92317 3rd Qu.: 98770
## Max. :153693 Max. :1553888 Max. :307987
## NA's :45 NA's :16 NA's :27
## Nhouse2007_2009 Aphouse1989_1993 Aphouse1994_1998 Aphouse1999_2002
## Min. : 1779 Min. : 477.3 Min. : 585.6 Min. : 160.0
## 1st Qu.: 35839 1st Qu.:1116.4 1st Qu.: 976.0 1st Qu.: 951.3
## Median : 70374 Median :1800.0 Median :1629.8 Median :1759.0
## Mean : 203085 Mean :1830.0 Mean :1797.1 Mean :1727.5
## 3rd Qu.: 144459 3rd Qu.:2350.0 3rd Qu.:2550.0 3rd Qu.:2476.8
## Max. :1570149 Max. :3700.0 Max. :3500.0 Max. :3784.0
## NA's :38 NA's :39 NA's :38 NA's :31
## Aphouse2003_2006 Aphouse2007_2009 ApapartMincome1989_1993
## Min. : 408.6 Min. : 238.7 Min. :0.1080
## 1st Qu.:1097.0 1st Qu.:1466.5 1st Qu.:0.1108
## Median :2200.0 Median :2800.0 Median :0.1190
## Mean :2187.3 Mean :2714.4 Mean :0.1270
## 3rd Qu.:2838.5 3rd Qu.:3833.0 3rd Qu.:0.1452
## Max. :4530.0 Max. :5399.9 Max. :0.1540
## NA's :27 NA's :31 NA's :44
## ApapartMincome1994_1998 ApapartMincome1999_2002 ApapartMincome2003_2006
## Min. :0.1070 Min. :0.0440 Min. :0.0800
## 1st Qu.:0.1138 1st Qu.:0.0775 1st Qu.:0.0915
## Median :0.1255 Median :0.0955 Median :0.1145
## Mean :0.1230 Mean :0.1095 Mean :0.1335
## 3rd Qu.:0.1305 3rd Qu.:0.1080 3rd Qu.:0.1610
## Max. :0.1380 Max. :0.3050 Max. :0.2660
## NA's :44 NA's :38 NA's :34
## ApapartMincome2007_2009 Arent-housing1989_1993 Arent-housing1994_1998
## Min. :0.0650 Mode:logical Mode:logical
## 1st Qu.:0.0950 NA's:50 NA's:50
## Median :0.1100
## Mean :0.1334
## 3rd Qu.:0.1340
## Max. :0.2870
## NA's :37
## Arent-housing1999_2002 Arent-housing2003_2006 Arent-housing2007_2009
## Min. : 3.00 Min. : 5.00 Min. : 8.00
## 1st Qu.:12.25 1st Qu.: 13.00 1st Qu.: 18.00
## Median :70.50 Median : 78.00 Median : 88.00
## Mean :55.00 Mean : 58.59 Mean : 76.06
## 3rd Qu.:80.50 3rd Qu.: 85.00 3rd Qu.: 99.00
## Max. :99.00 Max. :105.00 Max. :167.00
## NA's :34 NA's :33 NA's :33
## Alarea1989_1993 Alarea1994_1998 Alarea1999_2002 Alarea2003_2006
## Min. :14.91 Min. :16.70 Min. :13.20 Min. :14.80
## 1st Qu.:19.28 1st Qu.:21.06 1st Qu.:24.27 1st Qu.:32.71
## Median :32.75 Median :33.35 Median :34.90 Median :38.20
## Mean :28.58 Mean :30.16 Mean :31.65 Mean :34.29
## 3rd Qu.:35.12 3rd Qu.:38.12 3rd Qu.:38.00 3rd Qu.:40.35
## Max. :37.10 Max. :39.30 Max. :47.70 Max. :45.86
## NA's :32 NA's :38 NA's :19 NA's :30
## Alarea2007_2009 Phh-owndwe1989_1993 Phh-owndwe1994_1998 Phh-owndwe1999_2002
## Min. :15.85 Min. : 9.30 Min. :10.10 Min. :11.40
## 1st Qu.:27.03 1st Qu.:21.20 1st Qu.:16.45 1st Qu.:22.20
## Median :38.76 Median :39.90 Median :20.10 Median :50.00
## Mean :33.50 Mean :42.26 Mean :36.05 Mean :47.05
## 3rd Qu.:39.75 3rd Qu.:57.20 3rd Qu.:54.90 3rd Qu.:64.20
## Max. :46.44 Max. :86.50 Max. :78.10 Max. :88.80
## NA's :35 NA's :23 NA's :39 NA's :13
## Phh-owndwe2003_2006 Phh-owndwe2007_2009 Urate1989_1993 Urate1994_1998
## Min. :11.50 Min. :12.80 Min. : 1.300 Min. : 2.00
## 1st Qu.:19.27 1st Qu.:19.93 1st Qu.: 4.025 1st Qu.: 8.60
## Median :23.50 Median :21.25 Median : 7.150 Median : 9.30
## Mean :36.59 Mean :36.03 Mean :10.575 Mean :12.24
## 3rd Qu.:50.80 3rd Qu.:41.15 3rd Qu.:14.250 3rd Qu.:16.60
## Max. :84.70 Max. :85.20 Max. :42.700 Max. :27.80
## NA's :32 NA's :38 NA's :26 NA's :29
## Urate1999_2002 Urate2003_2006 Urate2007_2009 Ncom-head1989_1993
## Min. : 2.600 Min. : 3.300 Min. : 1.100 Mode:logical
## 1st Qu.: 5.250 1st Qu.: 6.700 1st Qu.: 5.225 NA's:50
## Median : 8.000 Median : 8.900 Median : 6.550
## Mean : 9.842 Mean : 9.134 Mean : 6.906
## 3rd Qu.:12.475 3rd Qu.:11.300 3rd Qu.: 8.550
## Max. :31.800 Max. :19.100 Max. :15.300
## NA's :14 NA's :15 NA's :32
## Ncom-head1994_1998 Ncom-head1999_2002 Ncom-head2003_2006 Ncom-head2007_2009
## Mode:logical Min. : 8.00 Min. : 2.00 Min. : 9.00
## NA's:50 1st Qu.: 23.25 1st Qu.: 12.50 1st Qu.: 21.75
## Median : 46.00 Median : 21.00 Median : 38.00
## Mean :171.55 Mean : 42.74 Mean : 64.42
## 3rd Qu.:117.00 3rd Qu.: 37.50 3rd Qu.: 68.75
## Max. :985.00 Max. :331.00 Max. :210.00
## NA's :28 NA's :27 NA's :38
## Mhhincome1989_1993 Mhhincome1994_1998 Mhhincome1999_2002 Mhhincome2003_2006
## Min. :11913 Min. : 1091 Min. : 1641 Min. : 2877
## 1st Qu.:14350 1st Qu.: 8118 1st Qu.:12148 1st Qu.:12475
## Median :14988 Median :15500 Median :17476 Median :17400
## Mean :14511 Mean :11525 Mean :15866 Mean :15766
## 3rd Qu.:15225 3rd Qu.:16000 3rd Qu.:21700 3rd Qu.:20600
## Max. :15700 Max. :17400 Max. :26490 Max. :26544
## NA's :42 NA's :35 NA's :25 NA's :33
## Mhhincome2007_2009 Ahhincome1989_1993 Ahhincome1994_1998 Ahhincome1999_2002
## Min. : 3437 Mode:logical Mode:logical Min. : 2873
## 1st Qu.:13587 NA's:50 NA's:50 1st Qu.:21302
## Median :21650 Median :24900
## Mean :18643 Mean :22507
## 3rd Qu.:23275 3rd Qu.:26022
## Max. :32210 Max. :38516
## NA's :36 NA's :35
## Ahhincome2003_2006 Ahhincome2007_2009 RQ1-Q4earn1989_1993 RQ1-Q4earn1994_1998
## Min. : 3592 Min. : 4278 Mode:logical Mode:logical
## 1st Qu.:21926 1st Qu.:23952 NA's:50 NA's:50
## Median :25150 Median :28200
## Mean :22890 Mean :24842
## 3rd Qu.:27525 3rd Qu.:30437
## Max. :40250 Max. :35917
## NA's :32 NA's :31
## RQ1-Q4earn1999_2002 RQ1-Q4earn2003_2006 RQ1-Q4earn2007_2009
## Min. :0.20 Min. :0.3000 Min. :0.3000
## 1st Qu.:0.30 1st Qu.:0.3000 1st Qu.:0.3000
## Median :0.40 Median :0.3000 Median :0.3000
## Mean :0.35 Mean :0.3429 Mean :0.3286
## 3rd Qu.:0.40 3rd Qu.:0.4000 3rd Qu.:0.3750
## Max. :0.50 Max. :0.4000 Max. :0.4000
## NA's :32 NA's :36 NA's :36
## HhincomeQ21989_1993 HhincomeQ21994_1998 HhincomeQ21999_2002
## Mode:logical Mode:logical Min. : 1374
## NA's:50 NA's:50 1st Qu.:11138
## Median :16500
## Mean :14535
## 3rd Qu.:18350
## Max. :28444
## NA's :32
## HhincomeQ22003_2006 HhincomeQ22007_2009 HhincomeQ31989_1993
## Min. : 2498 Min. : 2838 Mode:logical
## 1st Qu.:12900 1st Qu.:11096 NA's:50
## Median :16900 Median :18350
## Mean :14058 Mean :15866
## 3rd Qu.:17900 3rd Qu.:19975
## Max. :23502 Max. :27899
## NA's :37 NA's :36
## HhincomeQ31994_1998 HhincomeQ31999_2002 HhincomeQ32003_2006
## Mode:logical Min. : 2009 Min. : 3338
## NA's:50 1st Qu.:15869 1st Qu.:19509
## Median :22907 Median :23400
## Mean :20194 Mean :19306
## 3rd Qu.:25450 3rd Qu.:24500
## Max. :43628 Max. :30372
## NA's :32 NA's :37
## HhincomeQ32007_2009 Tlandarea1989_1993 Tlandarea1994_1998 Tlandarea1999_2002
## Min. : 3989 Min. : 39.0 Min. : 83.8 Min. : 38.9
## 1st Qu.:15830 1st Qu.: 139.5 1st Qu.: 144.1 1st Qu.: 158.3
## Median :25450 Median : 267.1 Median : 287.2 Median : 248.4
## Mean :21855 Mean : 363.2 Mean : 364.2 Mean : 382.6
## 3rd Qu.:27325 3rd Qu.: 487.8 3rd Qu.: 495.0 3rd Qu.: 494.0
## Max. :36527 Max. :1498.7 Max. :1285.3 Max. :1572.0
## NA's :36 NA's :26 NA's :23 NA's :17
## Tlandarea2003_2006 Tlandarea2007_2009 Larea-leisure1989_1993
## Min. : 38.9 Min. : 84.7 Min. :1.50
## 1st Qu.: 141.3 1st Qu.: 148.2 1st Qu.:3.05
## Median : 217.0 Median : 248.3 Median :4.60
## Mean : 327.9 Mean : 359.3 Mean :4.60
## 3rd Qu.: 426.2 3rd Qu.: 496.0 3rd Qu.:6.15
## Max. :1285.3 Max. :1307.7 Max. :7.70
## NA's :21 NA's :25 NA's :48
## Larea-leisure1994_1998 Larea-leisure1999_2002 Larea-leisure2003_2006
## Min. :2.000 Min. : 0.00 Min. : 2.40
## 1st Qu.:2.600 1st Qu.: 4.35 1st Qu.:10.40
## Median :3.200 Median : 9.45 Median :22.00
## Mean :3.667 Mean :15.34 Mean :21.76
## 3rd Qu.:4.500 3rd Qu.:24.60 3rd Qu.:34.20
## Max. :5.800 Max. :38.90 Max. :42.80
## NA's :47 NA's :32 NA's :33
## Larea-leisure2007_2009 Parea-housing1989_1993 Parea-housing1994_1998
## Min. : 5.80 Min. :34 Min. :16.50
## 1st Qu.:18.70 1st Qu.:34 1st Qu.:29.25
## Median :26.70 Median :34 Median :37.10
## Mean :25.09 Mean :34 Mean :38.79
## 3rd Qu.:31.18 3rd Qu.:34 3rd Qu.:44.05
## Max. :42.00 Max. :34 Max. :71.30
## NA's :38 NA's :49 NA's :43
## Parea-housing1999_2002 Parea-housing2003_2006 Parea-housing2007_2009
## Min. : 4.30 Min. :10.70 Min. :13.10
## 1st Qu.:14.75 1st Qu.:14.25 1st Qu.:15.12
## Median :20.20 Median :19.60 Median :18.00
## Mean :23.02 Mean :24.40 Mean :19.12
## 3rd Qu.:24.45 3rd Qu.:27.98 3rd Qu.:22.70
## Max. :72.00 Max. :72.10 Max. :28.60
## NA's :31 NA's :34 NA's :38
## Ppldens1989_1993 Ppldens1994_1998 Ppldens1999_2002 Ppldens2003_2006
## Min. : 1852 Min. : 2014 Min. : 1384 Min. : 1243
## 1st Qu.: 2674 1st Qu.: 2627 1st Qu.: 2510 1st Qu.: 2608
## Median : 3775 Median : 3816 Median : 3768 Median : 4030
## Mean : 5357 Mean : 4622 Mean : 5271 Mean : 5103
## 3rd Qu.: 6031 3rd Qu.: 5617 3rd Qu.: 5633 3rd Qu.: 6196
## Max. :19797 Max. :15240 Max. :20287 Max. :20467
## NA's :26 NA's :26 NA's :18 NA's :21
## Ppldens2007_2009 Netresidens-housingarea1989_1993
## Min. : 1313 Min. :48871
## 1st Qu.: 2486 1st Qu.:48871
## Median : 3306 Median :48871
## Mean : 4477 Mean :48871
## 3rd Qu.: 5778 3rd Qu.:48871
## Max. :16454 Max. :48871
## NA's :25 NA's :49
## Netresidens-housingarea1994_1998 Netresidens-housingarea1999_2002
## Min. : 7075 Min. : 6422
## 1st Qu.: 7362 1st Qu.: 13804
## Median :11080 Median : 18127
## Mean :18732 Mean : 42694
## 3rd Qu.:22451 3rd Qu.: 25980
## Max. :45694 Max. :465043
## NA's :46 NA's :31
## Netresidens-housingarea2003_2006 Netresidens-housingarea2007_2009
## Min. : 8265 Min. : 8537
## 1st Qu.:12476 1st Qu.:13269
## Median :16750 Median :16869
## Mean :17101 Mean :16338
## 3rd Qu.:20177 3rd Qu.:18911
## Max. :28404 Max. :22743
## NA's :33 NA's :38
## APApartment1989_1993 APApartment1994_1998 APApartment1999_2002
## Min. :1600 Min. : 848 Min. : 217.7
## 1st Qu.:1725 1st Qu.:1700 1st Qu.:1072.7
## Median :1800 Median :2000 Median :1342.5
## Mean :1883 Mean :1821 Mean :1425.4
## 3rd Qu.:1950 3rd Qu.:2150 3rd Qu.:1879.2
## Max. :2400 Max. :2200 Max. :2666.8
## NA's :44 NA's :43 NA's :34
## APApartment2003_2006 APApartment2007_2009 Temp_Jul1989_1993 Temp_Jul1994_1998
## Min. : 341 Min. : 962 Min. :17.40 Min. :14.80
## 1st Qu.:1352 1st Qu.:1660 1st Qu.:18.20 1st Qu.:20.85
## Median :2007 Median :2150 Median :19.00 Median :26.00
## Mean :2017 Mean :2468 Mean :20.77 Mean :25.33
## 3rd Qu.:2681 3rd Qu.:3232 3rd Qu.:22.45 3rd Qu.:29.55
## Max. :4486 Max. :5269 Max. :25.90 Max. :36.00
## NA's :23 NA's :29 NA's :47 NA's :39
## Temp_Jul1999_2002 Temp_Jul2003_2006 Temp_Jul2007_2009 Temp_Jan1989_1993
## Min. :18.50 Min. :16.00 Min. :16.70 Min. :-1.800
## 1st Qu.:20.00 1st Qu.:19.00 1st Qu.:18.50 1st Qu.: 0.300
## Median :21.00 Median :20.45 Median :20.30 Median : 2.400
## Mean :22.40 Mean :21.44 Mean :22.06 Mean : 3.333
## 3rd Qu.:25.12 3rd Qu.:24.52 3rd Qu.:25.12 3rd Qu.: 5.900
## Max. :31.50 Max. :29.20 Max. :32.00 Max. : 9.400
## NA's :20 NA's :16 NA's :22 NA's :47
## Temp_Jan1994_1998 Temp_Jan1999_2002 Temp_Jan2003_2006 Temp_Jan2007_2009
## Min. :-8.500 Min. :-7.200 Min. :-7.700 Min. :-3.000
## 1st Qu.: 1.800 1st Qu.:-0.625 1st Qu.:-1.475 1st Qu.: 0.375
## Median : 4.500 Median : 1.700 Median : 1.600 Median : 1.950
## Mean : 4.122 Mean : 2.480 Mean : 2.006 Mean : 2.754
## 3rd Qu.: 7.600 3rd Qu.: 6.150 3rd Qu.: 5.525 3rd Qu.: 3.650
## Max. :13.400 Max. :13.200 Max. :11.900 Max. :11.700
## NA's :41 NA's :20 NA's :16 NA's :22
## Latitude_deg Latitude_min Latitude_sec Longitude_deg
## Min. :37.00 Min. : 0.00 Min. : 0.00 Min. :-9.00
## 1st Qu.:44.25 1st Qu.:17.50 1st Qu.: 0.00 1st Qu.: 6.00
## Median :50.00 Median :28.00 Median : 0.00 Median :14.00
## Mean :48.72 Mean :29.98 Mean :14.07 Mean :17.44
## 3rd Qu.:53.00 3rd Qu.:45.75 3rd Qu.:32.75 3rd Qu.:26.75
## Max. :59.00 Max. :56.00 Max. :59.76 Max. :60.00
##
## Longitude_min Longitude_sec Lat Lon
## Min. :-59.00 Min. :-57.000 Min. :37.38 Min. :-9.185
## 1st Qu.: 6.00 1st Qu.: 0.000 1st Qu.:44.88 1st Qu.: 6.829
## Median : 21.00 Median : 0.000 Median :50.04 Median :14.336
## Mean : 20.18 Mean : 8.731 Mean :49.22 Mean :17.779
## 3rd Qu.: 39.25 3rd Qu.: 22.250 3rd Qu.:53.50 3rd Qu.:27.143
## Max. : 59.00 Max. : 59.000 Max. :59.93 Max. :60.583
##
## Liveability2010 Mercer_Qual_Liv2011 Mercer_Per_Safe2011 ECM2010
## Min. :61.00 Min. : 1.0 Min. : 5.00 Min. :0.0200
## 1st Qu.:80.00 1st Qu.:16.0 1st Qu.: 11.00 1st Qu.:0.0550
## Median :90.00 Median :30.0 Median : 20.50 Median :0.1000
## Mean :86.23 Mean :33.2 Mean : 34.94 Mean :0.1663
## 3rd Qu.:93.00 3rd Qu.:42.0 3rd Qu.: 40.50 3rd Qu.:0.2300
## Max. :98.00 Max. :84.0 Max. :199.00 Max. :0.8500
## NA's :19 NA's :25 NA's :34 NA's :23
## ECM_Cost2010
## Min. :0.0200
## 1st Qu.:0.1600
## Median :0.2700
## Mean :0.4637
## 3rd Qu.:0.6350
## Max. :1.4200
## NA's :23
library(dplyr)
faltantes <- data.frame(
Variable = names(base_taller),
NAs = colSums(is.na(base_taller))
)
faltantes
## Variable NAs
## City City 0
## City_Eng City_Eng 0
## City_Short City_Short 0
## NAds NAds 0
## Price_Median Price_Median 0
## Price_Mean Price_Mean 0
## Area_Median Area_Median 0
## Area_Mean Area_Mean 0
## Room_Median Room_Median 0
## Room_Mean Room_Mean 0
## Euro_area Euro_area 0
## EU EU 0
## Population Population 0
## City_Area City_Area 0
## Density Density 0
## GDP_PC GDP_PC 0
## GDP_PC_PPS GDP_PC_PPS 0
## GDP_PC2008 GDP_PC2008 0
## GDP_PC2009 GDP_PC2009 0
## GDP_PC2010 GDP_PC2010 0
## Gini Gini 0
## HOR HOR 0
## Kearny_GCI2010 Kearny_GCI2010 0
## LRIR LRIR 0
## Inflation2010 Inflation2010 0
## Inflation2011 Inflation2011 6
## URate URate 0
## MIR2009 MIR2009 2
## MIR2010 MIR2010 2
## Mortgage_PC2010 Mortgage_PC2010 2
## Tppl1989_1993 Tppl1989_1993 17
## Tppl1994_1998 Tppl1994_1998 19
## Tppl1999_2002 Tppl1999_2002 14
## Tppl2003_2006 Tppl2003_2006 14
## Tppl2007_2009 Tppl2007_2009 20
## GDP_PC_PPS1989_1993 GDP_PC_PPS1989_1993 50
## GDP_PC_PPS1994_1998 GDP_PC_PPS1994_1998 16
## GDP_PC_PPS1999_2002 GDP_PC_PPS1999_2002 16
## GDP_PC_PPS2003_2006 GDP_PC_PPS2003_2006 15
## GDP_PC_PPS2007_2009 GDP_PC_PPS2007_2009 15
## CITIES CITIES 13
## DemoDepend1989_1993 DemoDepend1989_1993 22
## DemoDepend1994_1998 DemoDepend1994_1998 25
## DemoDepend1999_2002 DemoDepend1999_2002 19
## DemoDepend2003_2006 DemoDepend2003_2006 17
## DemoDepend2007_2009 DemoDepend2007_2009 23
## DemoODepend1989_1993 DemoODepend1989_1993 17
## DemoODepend1994_1998 DemoODepend1994_1998 21
## DemoODepend1999_2002 DemoODepend1999_2002 14
## DemoODepend2003_2006 DemoODepend2003_2006 13
## DemoODepend2007_2009 DemoODepend2007_2009 20
## Thh1989_1993 Thh1989_1993 20
## Thh1994_1998 Thh1994_1998 27
## Thh1999_2002 Thh1999_2002 13
## Thh2003_2006 Thh2003_2006 23
## Thh2007_2009 Thh2007_2009 27
## Ndwe1989_1993 Ndwe1989_1993 35
## Ndwe1994_1998 Ndwe1994_1998 43
## Ndwe1999_2002 Ndwe1999_2002 13
## Ndwe2003_2006 Ndwe2003_2006 19
## Ndwe2007_2009 Ndwe2007_2009 29
## Napart1989_1993 Napart1989_1993 46
## Napart1994_1998 Napart1994_1998 50
## Napart1999_2002 Napart1999_2002 16
## Napart2003_2006 Napart2003_2006 27
## Napart2007_2009 Napart2007_2009 38
## Nhouse1989_1993 Nhouse1989_1993 45
## Nhouse1994_1998 Nhouse1994_1998 50
## Nhouse1999_2002 Nhouse1999_2002 16
## Nhouse2003_2006 Nhouse2003_2006 27
## Nhouse2007_2009 Nhouse2007_2009 38
## Aphouse1989_1993 Aphouse1989_1993 39
## Aphouse1994_1998 Aphouse1994_1998 38
## Aphouse1999_2002 Aphouse1999_2002 31
## Aphouse2003_2006 Aphouse2003_2006 27
## Aphouse2007_2009 Aphouse2007_2009 31
## ApapartMincome1989_1993 ApapartMincome1989_1993 44
## ApapartMincome1994_1998 ApapartMincome1994_1998 44
## ApapartMincome1999_2002 ApapartMincome1999_2002 38
## ApapartMincome2003_2006 ApapartMincome2003_2006 34
## ApapartMincome2007_2009 ApapartMincome2007_2009 37
## Arent-housing1989_1993 Arent-housing1989_1993 50
## Arent-housing1994_1998 Arent-housing1994_1998 50
## Arent-housing1999_2002 Arent-housing1999_2002 34
## Arent-housing2003_2006 Arent-housing2003_2006 33
## Arent-housing2007_2009 Arent-housing2007_2009 33
## Alarea1989_1993 Alarea1989_1993 32
## Alarea1994_1998 Alarea1994_1998 38
## Alarea1999_2002 Alarea1999_2002 19
## Alarea2003_2006 Alarea2003_2006 30
## Alarea2007_2009 Alarea2007_2009 35
## Phh-owndwe1989_1993 Phh-owndwe1989_1993 23
## Phh-owndwe1994_1998 Phh-owndwe1994_1998 39
## Phh-owndwe1999_2002 Phh-owndwe1999_2002 13
## Phh-owndwe2003_2006 Phh-owndwe2003_2006 32
## Phh-owndwe2007_2009 Phh-owndwe2007_2009 38
## Urate1989_1993 Urate1989_1993 26
## Urate1994_1998 Urate1994_1998 29
## Urate1999_2002 Urate1999_2002 14
## Urate2003_2006 Urate2003_2006 15
## Urate2007_2009 Urate2007_2009 32
## Ncom-head1989_1993 Ncom-head1989_1993 50
## Ncom-head1994_1998 Ncom-head1994_1998 50
## Ncom-head1999_2002 Ncom-head1999_2002 28
## Ncom-head2003_2006 Ncom-head2003_2006 27
## Ncom-head2007_2009 Ncom-head2007_2009 38
## Mhhincome1989_1993 Mhhincome1989_1993 42
## Mhhincome1994_1998 Mhhincome1994_1998 35
## Mhhincome1999_2002 Mhhincome1999_2002 25
## Mhhincome2003_2006 Mhhincome2003_2006 33
## Mhhincome2007_2009 Mhhincome2007_2009 36
## Ahhincome1989_1993 Ahhincome1989_1993 50
## Ahhincome1994_1998 Ahhincome1994_1998 50
## Ahhincome1999_2002 Ahhincome1999_2002 35
## Ahhincome2003_2006 Ahhincome2003_2006 32
## Ahhincome2007_2009 Ahhincome2007_2009 31
## RQ1-Q4earn1989_1993 RQ1-Q4earn1989_1993 50
## RQ1-Q4earn1994_1998 RQ1-Q4earn1994_1998 50
## RQ1-Q4earn1999_2002 RQ1-Q4earn1999_2002 32
## RQ1-Q4earn2003_2006 RQ1-Q4earn2003_2006 36
## RQ1-Q4earn2007_2009 RQ1-Q4earn2007_2009 36
## HhincomeQ21989_1993 HhincomeQ21989_1993 50
## HhincomeQ21994_1998 HhincomeQ21994_1998 50
## HhincomeQ21999_2002 HhincomeQ21999_2002 32
## HhincomeQ22003_2006 HhincomeQ22003_2006 37
## HhincomeQ22007_2009 HhincomeQ22007_2009 36
## HhincomeQ31989_1993 HhincomeQ31989_1993 50
## HhincomeQ31994_1998 HhincomeQ31994_1998 50
## HhincomeQ31999_2002 HhincomeQ31999_2002 32
## HhincomeQ32003_2006 HhincomeQ32003_2006 37
## HhincomeQ32007_2009 HhincomeQ32007_2009 36
## Tlandarea1989_1993 Tlandarea1989_1993 26
## Tlandarea1994_1998 Tlandarea1994_1998 23
## Tlandarea1999_2002 Tlandarea1999_2002 17
## Tlandarea2003_2006 Tlandarea2003_2006 21
## Tlandarea2007_2009 Tlandarea2007_2009 25
## Larea-leisure1989_1993 Larea-leisure1989_1993 48
## Larea-leisure1994_1998 Larea-leisure1994_1998 47
## Larea-leisure1999_2002 Larea-leisure1999_2002 32
## Larea-leisure2003_2006 Larea-leisure2003_2006 33
## Larea-leisure2007_2009 Larea-leisure2007_2009 38
## Parea-housing1989_1993 Parea-housing1989_1993 49
## Parea-housing1994_1998 Parea-housing1994_1998 43
## Parea-housing1999_2002 Parea-housing1999_2002 31
## Parea-housing2003_2006 Parea-housing2003_2006 34
## Parea-housing2007_2009 Parea-housing2007_2009 38
## Ppldens1989_1993 Ppldens1989_1993 26
## Ppldens1994_1998 Ppldens1994_1998 26
## Ppldens1999_2002 Ppldens1999_2002 18
## Ppldens2003_2006 Ppldens2003_2006 21
## Ppldens2007_2009 Ppldens2007_2009 25
## Netresidens-housingarea1989_1993 Netresidens-housingarea1989_1993 49
## Netresidens-housingarea1994_1998 Netresidens-housingarea1994_1998 46
## Netresidens-housingarea1999_2002 Netresidens-housingarea1999_2002 31
## Netresidens-housingarea2003_2006 Netresidens-housingarea2003_2006 33
## Netresidens-housingarea2007_2009 Netresidens-housingarea2007_2009 38
## APApartment1989_1993 APApartment1989_1993 44
## APApartment1994_1998 APApartment1994_1998 43
## APApartment1999_2002 APApartment1999_2002 34
## APApartment2003_2006 APApartment2003_2006 23
## APApartment2007_2009 APApartment2007_2009 29
## Temp_Jul1989_1993 Temp_Jul1989_1993 47
## Temp_Jul1994_1998 Temp_Jul1994_1998 39
## Temp_Jul1999_2002 Temp_Jul1999_2002 20
## Temp_Jul2003_2006 Temp_Jul2003_2006 16
## Temp_Jul2007_2009 Temp_Jul2007_2009 22
## Temp_Jan1989_1993 Temp_Jan1989_1993 47
## Temp_Jan1994_1998 Temp_Jan1994_1998 41
## Temp_Jan1999_2002 Temp_Jan1999_2002 20
## Temp_Jan2003_2006 Temp_Jan2003_2006 16
## Temp_Jan2007_2009 Temp_Jan2007_2009 22
## Latitude_deg Latitude_deg 0
## Latitude_min Latitude_min 0
## Latitude_sec Latitude_sec 0
## Longitude_deg Longitude_deg 0
## Longitude_min Longitude_min 0
## Longitude_sec Longitude_sec 0
## Lat Lat 0
## Lon Lon 0
## Liveability2010 Liveability2010 19
## Mercer_Qual_Liv2011 Mercer_Qual_Liv2011 25
## Mercer_Per_Safe2011 Mercer_Per_Safe2011 34
## ECM2010 ECM2010 23
## ECM_Cost2010 ECM_Cost2010 23
# Cargar librerías necesarias
library(psych)
##
## Adjuntando el paquete: 'psych'
## The following object is masked from 'package:Hmisc':
##
## describe
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(dplyr)
library(ggplot2)
# Seleccionar variables de interés
vars_analisis <- c("Price_Median", "Area_Median", "Room_Median", "Density",
"Population", "GDP_PC", "URate", "MIR2010")
# Filtrar datos y convertir a numérico
datos_analisis <- base_taller[, vars_analisis]
datos_analisis <- data.frame(lapply(datos_analisis, as.numeric))
datos_analisis <- na.omit(datos_analisis)
# Estadísticos descriptivos básicos con summary()
cat("=== ESTADÍSTICOS DESCRIPTIVOS BÁSICOS ===\n")
## === ESTADÍSTICOS DESCRIPTIVOS BÁSICOS ===
summary_stats <- summary(datos_analisis)
print(summary_stats)
## Price_Median Area_Median Room_Median Density
## Min. : 503.1 Min. : 47.00 Min. :2.00 Min. : 1315
## 1st Qu.:1324.7 1st Qu.: 55.10 1st Qu.:2.00 1st Qu.: 2551
## Median :2167.8 Median : 68.50 Median :2.25 Median : 3297
## Mean :2488.7 Mean : 69.17 Mean :2.49 Mean : 4549
## 3rd Qu.:3127.2 3rd Qu.: 79.25 3rd Qu.:3.00 3rd Qu.: 5394
## Max. :8590.9 Max. :100.00 Max. :3.00 Max. :20618
## Population GDP_PC URate MIR2010
## Min. : 401389 Min. : 2535 Min. : 1.700 Min. : 0.920
## 1st Qu.: 784105 1st Qu.:19825 1st Qu.: 5.805 1st Qu.: 2.993
## Median : 1132700 Median :30700 Median : 7.990 Median : 3.815
## Mean : 1885327 Mean :32034 Mean : 9.545 Mean : 7.026
## 3rd Qu.: 1704168 3rd Qu.:43000 3rd Qu.:11.908 3rd Qu.: 8.615
## Max. :12915158 Max. :86464 Max. :23.124 Max. :26.200
# Estadísticos más detallados con psych::describe()
cat("\n=== ESTADÍSTICOS DESCRIPTIVOS DETALLADOS ===\n")
##
## === ESTADÍSTICOS DESCRIPTIVOS DETALLADOS ===
detallado_stats <- describe(datos_analisis)
print(round(detallado_stats, 3))
## vars n mean sd median trimmed mad
## Price_Median 1 48 2488.72 1620.91 2167.76 2272.81 1316.81
## Area_Median 2 48 69.17 14.90 68.50 68.48 17.79
## Room_Median 3 48 2.49 0.50 2.25 2.49 0.37
## Density 4 48 4548.62 3520.04 3296.74 3928.58 1479.41
## Population 5 48 1885327.48 2431882.99 1132700.00 1322120.12 667765.26
## GDP_PC 6 48 32033.68 20843.98 30700.00 30494.03 18532.50
## URate 7 48 9.54 5.26 7.99 9.13 4.76
## MIR2010 8 48 7.03 6.84 3.82 5.60 1.26
## min max range skew kurtosis se
## Price_Median 503.07 8590.87 8087.80 1.54 2.98 233.96
## Area_Median 47.00 100.00 53.00 0.28 -0.99 2.15
## Room_Median 2.00 3.00 1.00 0.04 -2.02 0.07
## Density 1315.07 20617.90 19302.83 2.74 8.74 508.07
## Population 401389.00 12915158.00 12513769.00 3.15 9.93 351012.07
## GDP_PC 2534.91 86463.74 83928.83 0.53 -0.26 3008.57
## URate 1.70 23.12 21.42 0.75 -0.08 0.76
## MIR2010 0.92 26.20 25.28 1.82 2.32 0.99
# Matriz de correlaciones (ya la tienes, pero la incluimos para completitud)
cat("\n=== MATRIZ DE CORRELACIONES ===\n")
##
## === MATRIZ DE CORRELACIONES ===
cor_matrix <- cor(datos_analisis, use = "complete.obs")
print(round(cor_matrix, 3))
## Price_Median Area_Median Room_Median Density Population GDP_PC
## Price_Median 1.000 0.121 0.276 0.498 0.132 0.630
## Area_Median 0.121 1.000 0.748 0.061 0.060 0.304
## Room_Median 0.276 0.748 1.000 0.041 0.020 0.502
## Density 0.498 0.061 0.041 1.000 0.105 0.157
## Population 0.132 0.060 0.020 0.105 1.000 -0.172
## GDP_PC 0.630 0.304 0.502 0.157 -0.172 1.000
## URate -0.140 0.272 0.197 0.058 -0.097 -0.034
## MIR2010 -0.383 -0.545 -0.546 -0.090 0.017 -0.591
## URate MIR2010
## Price_Median -0.140 -0.383
## Area_Median 0.272 -0.545
## Room_Median 0.197 -0.546
## Density 0.058 -0.090
## Population -0.097 0.017
## GDP_PC -0.034 -0.591
## URate 1.000 -0.346
## MIR2010 -0.346 1.000
# Configurar tema para gráficos
theme_set(theme_minimal())
# A) HISTOGRAMAS DE DISTRIBUCIÓN
library(readxl)
library(ggplot2)
# Cargar la base de datos
datos <- read_excel("C:/Users/carab/Downloads/Rosi_files/dp2015-13_Dataset.xls")
# Variables principales
vars <- c("Price_Median", "Area_Median", "Room_Median",
"Population", "GDP_PC", "URate", "MIR2010")
# Generar histogramas con leyenda
for (v in vars) {
p <- ggplot(datos, aes_string(x = v, fill = shQuote(v))) +
geom_histogram(color = "black", bins = 20, alpha = 0.7) +
labs(title = paste("Distribución de", v),
x = v,
y = "Frecuencia",
fill = "Variable") +
theme_minimal(base_size = 14) +
theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
axis.title = element_text(face = "bold"),
legend.position = "right",
panel.grid.major = element_line(color = "purple"))
print(p) # Mostrar cada histograma
}
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_bin()`).
# c) GRÁFICO DE VIOLÍN PARA PRINCIPALES VARIABLES
cat("\nGenerando gráficos de violín...\n")
##
## Generando gráficos de violín...
library(vioplot)
## Cargando paquete requerido: sm
## Package 'sm', version 2.2-6.0: type help(sm) for summary information
## Cargando paquete requerido: zoo
##
## Adjuntando el paquete: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
par(mfrow = c(2, 2))
vioplot(datos_analisis$Price_Median, main = "Distribución Price_Median", col = "gold")
vioplot(datos_analisis$Area_Median, main = "Distribución Area_Median", col = "lightblue")
vioplot(datos_analisis$GDP_PC, main = "Distribución GDP_PC", col = "lightgreen")
vioplot(datos_analisis$URate, main = "Distribución URate", col = "pink")
# CARGAR LIBRERÍAS
library(leaflet)
library(readxl)
# CARGAR Y PREPARAR DATOS
datos <- read_excel("C:/Users/carab/Downloads/Rosi_files/dp2015-13_Dataset.xls")
# Verificar nombres de columnas reales
cat("Nombres de columnas en tu dataset:\n")
## Nombres de columnas en tu dataset:
print(names(datos))
## [1] "City" "City_Eng"
## [3] "City_Short" "NAds"
## [5] "Price_Median" "Price_Mean"
## [7] "Area_Median" "Area_Mean"
## [9] "Room_Median" "Room_Mean"
## [11] "Euro_area" "EU"
## [13] "Population" "City_Area"
## [15] "Density" "GDP_PC"
## [17] "GDP_PC_PPS" "GDP_PC2008"
## [19] "GDP_PC2009" "GDP_PC2010"
## [21] "Gini" "HOR"
## [23] "Kearny_GCI2010" "LRIR"
## [25] "Inflation2010" "Inflation2011"
## [27] "URate" "MIR2009"
## [29] "MIR2010" "Mortgage_PC2010"
## [31] "Tppl1989_1993" "Tppl1994_1998"
## [33] "Tppl1999_2002" "Tppl2003_2006"
## [35] "Tppl2007_2009" "GDP_PC_PPS1989_1993"
## [37] "GDP_PC_PPS1994_1998" "GDP_PC_PPS1999_2002"
## [39] "GDP_PC_PPS2003_2006" "GDP_PC_PPS2007_2009"
## [41] "CITIES" "DemoDepend1989_1993"
## [43] "DemoDepend1994_1998" "DemoDepend1999_2002"
## [45] "DemoDepend2003_2006" "DemoDepend2007_2009"
## [47] "DemoODepend1989_1993" "DemoODepend1994_1998"
## [49] "DemoODepend1999_2002" "DemoODepend2003_2006"
## [51] "DemoODepend2007_2009" "Thh1989_1993"
## [53] "Thh1994_1998" "Thh1999_2002"
## [55] "Thh2003_2006" "Thh2007_2009"
## [57] "Ndwe1989_1993" "Ndwe1994_1998"
## [59] "Ndwe1999_2002" "Ndwe2003_2006"
## [61] "Ndwe2007_2009" "Napart1989_1993"
## [63] "Napart1994_1998" "Napart1999_2002"
## [65] "Napart2003_2006" "Napart2007_2009"
## [67] "Nhouse1989_1993" "Nhouse1994_1998"
## [69] "Nhouse1999_2002" "Nhouse2003_2006"
## [71] "Nhouse2007_2009" "Aphouse1989_1993"
## [73] "Aphouse1994_1998" "Aphouse1999_2002"
## [75] "Aphouse2003_2006" "Aphouse2007_2009"
## [77] "ApapartMincome1989_1993" "ApapartMincome1994_1998"
## [79] "ApapartMincome1999_2002" "ApapartMincome2003_2006"
## [81] "ApapartMincome2007_2009" "Arent-housing1989_1993"
## [83] "Arent-housing1994_1998" "Arent-housing1999_2002"
## [85] "Arent-housing2003_2006" "Arent-housing2007_2009"
## [87] "Alarea1989_1993" "Alarea1994_1998"
## [89] "Alarea1999_2002" "Alarea2003_2006"
## [91] "Alarea2007_2009" "Phh-owndwe1989_1993"
## [93] "Phh-owndwe1994_1998" "Phh-owndwe1999_2002"
## [95] "Phh-owndwe2003_2006" "Phh-owndwe2007_2009"
## [97] "Urate1989_1993" "Urate1994_1998"
## [99] "Urate1999_2002" "Urate2003_2006"
## [101] "Urate2007_2009" "Ncom-head1989_1993"
## [103] "Ncom-head1994_1998" "Ncom-head1999_2002"
## [105] "Ncom-head2003_2006" "Ncom-head2007_2009"
## [107] "Mhhincome1989_1993" "Mhhincome1994_1998"
## [109] "Mhhincome1999_2002" "Mhhincome2003_2006"
## [111] "Mhhincome2007_2009" "Ahhincome1989_1993"
## [113] "Ahhincome1994_1998" "Ahhincome1999_2002"
## [115] "Ahhincome2003_2006" "Ahhincome2007_2009"
## [117] "RQ1-Q4earn1989_1993" "RQ1-Q4earn1994_1998"
## [119] "RQ1-Q4earn1999_2002" "RQ1-Q4earn2003_2006"
## [121] "RQ1-Q4earn2007_2009" "HhincomeQ21989_1993"
## [123] "HhincomeQ21994_1998" "HhincomeQ21999_2002"
## [125] "HhincomeQ22003_2006" "HhincomeQ22007_2009"
## [127] "HhincomeQ31989_1993" "HhincomeQ31994_1998"
## [129] "HhincomeQ31999_2002" "HhincomeQ32003_2006"
## [131] "HhincomeQ32007_2009" "Tlandarea1989_1993"
## [133] "Tlandarea1994_1998" "Tlandarea1999_2002"
## [135] "Tlandarea2003_2006" "Tlandarea2007_2009"
## [137] "Larea-leisure1989_1993" "Larea-leisure1994_1998"
## [139] "Larea-leisure1999_2002" "Larea-leisure2003_2006"
## [141] "Larea-leisure2007_2009" "Parea-housing1989_1993"
## [143] "Parea-housing1994_1998" "Parea-housing1999_2002"
## [145] "Parea-housing2003_2006" "Parea-housing2007_2009"
## [147] "Ppldens1989_1993" "Ppldens1994_1998"
## [149] "Ppldens1999_2002" "Ppldens2003_2006"
## [151] "Ppldens2007_2009" "Netresidens-housingarea1989_1993"
## [153] "Netresidens-housingarea1994_1998" "Netresidens-housingarea1999_2002"
## [155] "Netresidens-housingarea2003_2006" "Netresidens-housingarea2007_2009"
## [157] "APApartment1989_1993" "APApartment1994_1998"
## [159] "APApartment1999_2002" "APApartment2003_2006"
## [161] "APApartment2007_2009" "Temp_Jul1989_1993"
## [163] "Temp_Jul1994_1998" "Temp_Jul1999_2002"
## [165] "Temp_Jul2003_2006" "Temp_Jul2007_2009"
## [167] "Temp_Jan1989_1993" "Temp_Jan1994_1998"
## [169] "Temp_Jan1999_2002" "Temp_Jan2003_2006"
## [171] "Temp_Jan2007_2009" "Latitude_deg"
## [173] "Latitude_min" "Latitude_sec"
## [175] "Longitude_deg" "Longitude_min"
## [177] "Longitude_sec" "Lat"
## [179] "Lon" "Liveability2010"
## [181] "Mercer_Qual_Liv2011" "Mercer_Per_Safe2011"
## [183] "ECM2010" "ECM_Cost2010"
# Limpiar datos (usando nombres genéricos - ajústalos según tu dataset)
# Asumiendo que las columnas se llaman "Lon" y "Lat"
if ("Lon" %in% names(datos) & "Lat" %in% names(datos)) {
datos <- datos[complete.cases(datos$Lon, datos$Lat), ]
} else {
# Si tienen otros nombres, busca columnas que puedan ser coordenadas
coords_cols <- grep("lon|lat|longitud|latitud", names(datos), ignore.case = TRUE, value = TRUE)
cat("Columnas que podrían ser coordenadas:", paste(coords_cols, collapse = ", "), "\n")
stop("Ajusta los nombres de las columnas de coordenadas en el código")
}
# BUSCAR VARIABLE DE PIB (ajusta el nombre según tu dataset)
variable_pib <- NULL
posibles_nombres_pib <- c("GDP_PC", "PIB", "PIB_PC", "GDP", "PIB_per_capita", "GDP_per_capita")
for (nombre in posibles_nombres_pib) {
if (nombre %in% names(datos)) {
variable_pib <- nombre
break
}
}
if (is.null(variable_pib)) {
cat("No se encontró variable de PIB. Variables numéricas disponibles:\n")
numeric_vars <- names(datos)[sapply(datos, is.numeric)]
print(numeric_vars)
stop("Selecciona una variable numérica y ajusta 'variable_pib' en el código")
}
cat("Usando variable:", variable_pib, "\n")
## Usando variable: GDP_PC
# CONFIGURACIÓN DEL MAPA
centro_europa <- c(50, 10) # Latitud, Longitud (centro de Europa)
nivel_zoom <- 4
# CREAR EL MAPA
paleta_pib <- colorNumeric(
palette = c("#f7fbff", "#08306b"), # Azul claro a oscuro
domain = datos[[variable_pib]]
)
mapa <- leaflet(datos) %>%
setView(lng = centro_europa[2], lat = centro_europa[1], zoom = nivel_zoom) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addCircleMarkers(
lng = ~Lon,
lat = ~Lat,
radius = 6,
fillColor = ~paleta_pib(datos[[variable_pib]]),
fillOpacity = 0.8,
stroke = TRUE,
color = "white",
weight = 1,
popup = ~paste(
"<b>PIB per cápita:</b> €",
format(round(datos[[variable_pib]], 0), big.mark = ",")
)
) %>%
addLegend(
position = "bottomright",
pal = paleta_pib,
values = datos[[variable_pib]],
title = "PIB per cápita (€)",
opacity = 0.9
) %>%
addScaleBar(
position = "bottomleft",
options = scaleBarOptions(metric = TRUE, imperial = FALSE)
)
# MOSTRAR MAPA
print(mapa)
cat("Mapa creado exitosamente!\n")
## Mapa creado exitosamente!
# Filtrar solo las variables numéricas de interés
vars <- c("Price_Median", "Area_Median", "Room_Median", "GDP_PC", "URate", "MIR2010")
datos_cor <- base_taller[, vars]
library(Hmisc)
cor_test <- rcorr(as.matrix(datos_cor))
library(Hmisc)
library(corrplot)
vars <- c("Price_Median", "Area_Median", "Room_Median", "GDP_PC", "URate", "MIR2010")
# Asegurar que sean numéricas y eliminar NA
datos_cor <- base_taller[, vars]
datos_cor <- data.frame(lapply(datos_cor, as.numeric))
datos_cor <- na.omit(datos_cor)
# Calcular correlaciones
cor_test <- rcorr(as.matrix(datos_cor))
# Matriz de correlaciones y p-valores
matrizcor <- round(cor_test$r, 2) # redondeo a 2 decimales como en tu imagen
pval <- cor_test$P
# Reemplazar posibles NA
pval[is.na(pval)] <- 1
# Graficar correlación completa
corrplot(
matrizcor,
method = "color",
type = "full", # 👈 Toda la matriz
tl.col = "black",
tl.srt = 45,
addCoef.col = "black", # mostrar coeficientes en cada celda
number.cex = 0.8,
p.mat = pval, # Matriz de p-valores
sig.level = 0.05, # Solo significativas
insig = "blank", # Ocultar las no significativas
title = "Mapa de calor de correlaciones significativas",
mar = c(0, 0, 2, 0)
)
vars <- c(“Price_Median”, “Area_Median”, “Room_Median”, “GDP_PC”, “URate”, “MIR2010”)
pairs(base_taller[vars], main = “Matriz de dispersión”, pch = 19, col = “blue”) # cambia el color aquí (ej: “red”, “purple”, “orange”)
## Matriz de dispersión con pares de variables
# Seleccionar las variables
vars <- c("Price_Median", "Area_Median", "Room_Median", "GDP_PC", "URate", "MIR2010")
# Matriz de dispersión con otro color
pairs(base_taller[vars],
main = "Matriz de dispersión",
pch = 19, col = "blue") # puedes cambiar: "red", "purple", "orange", etc.
## Dispersión de Price_Median vs Area_Median
ggplot(base_taller, aes(x = Area_Median, y = Price_Median)) +
geom_point(aes(color = "Puntos"), alpha = 0.6, size = 2) +
geom_smooth(aes(color = "Línea de tendencia"), method = "lm", se = FALSE, size = 1) +
scale_color_manual(name = "Elementos",
values = c("Puntos" = "steelblue", "Línea de tendencia" = "red")) +
labs(title = "Dispersión de Price_Median vs Area_Median",
x = "Area_Median", y = "Price_Median") +
theme_minimal() +
theme(legend.position = "bottom",
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold"))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'
## Gráfico 2 – Price_Median vs Room_Media
# Dispersión de Price_Median vs Room_Median con mejoras estéticas
ggplot(base_taller, aes(x = Room_Median, y = Price_Median)) +
geom_point(aes(color = "Puntos"), alpha = 0.6, size = 2) +
geom_smooth(aes(color = "Línea de tendencia"), method = "lm", se = FALSE, size = 1) +
scale_color_manual(name = "Elementos",
values = c("Puntos" = "steelblue", "Línea de tendencia" = "red")) +
labs(title = "Dispersión de Price_Median vs Room_Median",
x = "Room_Median", y = "Price_Median") +
theme_minimal() +
theme(legend.position = "bottom",
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold"))
## `geom_smooth()` using formula = 'y ~ x'
## Gráfico 3
## Dispersión de Price_Median vs GDP_PC
ggplot(base_taller, aes(x = GDP_PC, y = Price_Median)) +
geom_point(aes(color = "Puntos"), alpha = 0.6, size = 2) +
geom_smooth(aes(color = "Línea de tendencia"), method = "lm", se = FALSE, size = 1) +
scale_color_manual(name = "Elementos",
values = c("Puntos" = "steelblue", "Línea de tendencia" = "red")) +
labs(title = "Dispersión de Price_Median vs GDP_PC",
x = "GDP_PC", y = "Price_Median") +
theme_minimal() +
theme(legend.position = "bottom",
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold"))
## `geom_smooth()` using formula = 'y ~ x'
## Gráfico 4
## Dispersión de Price_Median vs URate
ggplot(base_taller, aes(x = URate, y = Price_Median)) +
geom_point(aes(color = "Puntos"), alpha = 0.6, size = 2) +
geom_smooth(aes(color = "Línea de tendencia"), method = "lm", se = FALSE, size = 1) +
scale_color_manual(name = "Elementos",
values = c("Puntos" = "steelblue", "Línea de tendencia" = "red")) +
labs(title = "Dispersión de Price_Median vs URate",
x = "URate", y = "Price_Median") +
theme_minimal() +
theme(legend.position = "bottom",
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold"))
## `geom_smooth()` using formula = 'y ~ x'
## Gráfico 5
## Dispersión de Price_Median vs MIR2010
ggplot(base_taller, aes(x = MIR2010, y = Price_Median)) +
geom_point(aes(color = "Puntos"), alpha = 0.6, size = 2) +
geom_smooth(aes(color = "Línea de tendencia"), method = "lm", se = FALSE, size = 1) +
scale_color_manual(name = "Elementos",
values = c("Puntos" = "steelblue", "Línea de tendencia" = "red")) +
labs(title = "Dispersión de Price_Median vs MIR2010",
x = "MIR2010", y = "Price_Median") +
theme_minimal() +
theme(legend.position = "bottom",
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold"))
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
# 3: Modelo de regresión lineal múltiple
# 1. CARGAR LIBRERÍAS
library(gridExtra)
##
## Adjuntando el paquete: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
# 2. PREPARAR DATOS
cat("=== PREPARACIÓN DE DATOS ===\n")
## === PREPARACIÓN DE DATOS ===
variables_modelo <- c("Price_Median", "Area_Median", "Room_Median", "Density",
"Population", "GDP_PC", "URate", "MIR2010")
# Filtrar variables existentes
vars_disponibles <- variables_modelo[variables_modelo %in% names(base_taller)]
cat("Variables disponibles:", paste(vars_disponibles, collapse = ", "), "\n")
## Variables disponibles: Price_Median, Area_Median, Room_Median, Density, Population, GDP_PC, URate, MIR2010
datos_completos <- base_taller[, vars_disponibles]
datos_completos <- data.frame(lapply(datos_completos, as.numeric))
datos_completos <- na.omit(datos_completos)
cat("Observaciones finales:", nrow(datos_completos), "\n\n")
## Observaciones finales: 48
# 3. ESTADÍSTICOS DESCRIPTIVOS
cat("=== ESTADÍSTICOS DESCRIPTIVOS ===\n")
## === ESTADÍSTICOS DESCRIPTIVOS ===
print(summary(datos_completos))
## Price_Median Area_Median Room_Median Density
## Min. : 503.1 Min. : 47.00 Min. :2.00 Min. : 1315
## 1st Qu.:1324.7 1st Qu.: 55.10 1st Qu.:2.00 1st Qu.: 2551
## Median :2167.8 Median : 68.50 Median :2.25 Median : 3297
## Mean :2488.7 Mean : 69.17 Mean :2.49 Mean : 4549
## 3rd Qu.:3127.2 3rd Qu.: 79.25 3rd Qu.:3.00 3rd Qu.: 5394
## Max. :8590.9 Max. :100.00 Max. :3.00 Max. :20618
## Population GDP_PC URate MIR2010
## Min. : 401389 Min. : 2535 Min. : 1.700 Min. : 0.920
## 1st Qu.: 784105 1st Qu.:19825 1st Qu.: 5.805 1st Qu.: 2.993
## Median : 1132700 Median :30700 Median : 7.990 Median : 3.815
## Mean : 1885327 Mean :32034 Mean : 9.545 Mean : 7.026
## 3rd Qu.: 1704168 3rd Qu.:43000 3rd Qu.:11.908 3rd Qu.: 8.615
## Max. :12915158 Max. :86464 Max. :23.124 Max. :26.200
# 4. GRÁFICOS EXPLORATORIOS
cat("\n=== GRÁFICOS EXPLORATORIOS ===\n")
##
## === GRÁFICOS EXPLORATORIOS ===
# Histogramas
plot_list <- list()
for(i in 1:ncol(datos_completos)) {
var_name <- colnames(datos_completos)[i]
p <- ggplot(datos_completos, aes(x = .data[[var_name]])) +
geom_histogram(fill = "lightblue", color = "black", bins = 10) +
labs(title = paste("Distribución de", var_name), x = var_name) +
theme_minimal()
plot_list[[i]] <- p
}
grid.arrange(grobs = plot_list, ncol = 3)
# Diagramas de dispersión vs Price_Median
scatter_list <- list()
vars_ind <- setdiff(vars_disponibles, "Price_Median")
for(var in vars_ind) {
p <- ggplot(datos_completos, aes(x = .data[[var]], y = Price_Median)) +
geom_point(alpha = 0.6, color = "blue") +
geom_smooth(method = "lm", color = "red") +
labs(title = paste("Price_Median vs", var), x = var, y = "Price_Median") +
theme_minimal()
scatter_list[[var]] <- p
}
grid.arrange(grobs = scatter_list, ncol = 3)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
# 5. MODELO DE REGRESIÓN
cat("\n=== MODELO DE REGRESIÓN LINEAL MÚLTIPLE ===\n")
##
## === MODELO DE REGRESIÓN LINEAL MÚLTIPLE ===
# Especificar fórmula del modelo
formula_modelo <- as.formula(paste("Price_Median ~", paste(vars_ind, collapse = " + ")))
modelo <- lm(formula_modelo, data = datos_completos)
# Resultados detallados
summary_modelo <- summary(modelo)
print(summary_modelo)
##
## Call:
## lm(formula = formula_modelo, data = datos_completos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1839.24 -694.31 52.88 490.60 2596.36
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.487e+03 1.250e+03 1.190 0.241245
## Area_Median -1.699e+01 1.702e+01 -0.998 0.324133
## Room_Median 1.940e+02 5.286e+02 0.367 0.715582
## Density 1.844e-01 4.591e-02 4.016 0.000253 ***
## Population 1.173e-04 6.807e-05 1.724 0.092485 .
## GDP_PC 4.082e-02 1.112e-02 3.671 0.000706 ***
## URate -4.559e+01 3.445e+01 -1.323 0.193250
## MIR2010 -3.396e+01 3.510e+01 -0.967 0.339108
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1074 on 40 degrees of freedom
## Multiple R-squared: 0.6265, Adjusted R-squared: 0.5612
## F-statistic: 9.586 on 7 and 40 DF, p-value: 6.164e-07
# 6. RESULTADOS RESUMIDOS EN TABLA
cat("\n=== TABLA RESUMEN DE RESULTADOS ===\n")
##
## === TABLA RESUMEN DE RESULTADOS ===
resultados_tabla <- data.frame(
Variable = c("Intercepto", vars_ind),
Coeficiente = round(coef(modelo), 4),
Error_Std = round(summary_modelo$coefficients[, 2], 4),
t_value = round(summary_modelo$coefficients[, 3], 3),
p_value = round(summary_modelo$coefficients[, 4], 4)
)
print(resultados_tabla)
## Variable Coeficiente Error_Std t_value p_value
## (Intercept) Intercepto 1487.1538 1250.2043 1.190 0.2412
## Area_Median Area_Median -16.9894 17.0183 -0.998 0.3241
## Room_Median Room_Median 193.9879 528.6325 0.367 0.7156
## Density Density 0.1844 0.0459 4.016 0.0003
## Population Population 0.0001 0.0001 1.724 0.0925
## GDP_PC GDP_PC 0.0408 0.0111 3.671 0.0007
## URate URate -45.5866 34.4492 -1.323 0.1933
## MIR2010 MIR2010 -33.9564 35.0971 -0.967 0.3391
# 7. INTERPRETACIÓN BÁSICA
cat("\n=== INTERPRETACIÓN DE RESULTADOS ===\n")
##
## === INTERPRETACIÓN DE RESULTADOS ===
# Bondad de ajuste
cat("BONDAD DE AJUSTE:\n")
## BONDAD DE AJUSTE:
cat("- R-cuadrado:", round(summary_modelo$r.squared, 4), "(", round(summary_modelo$r.squared * 100, 1), "%)\n")
## - R-cuadrado: 0.6265 ( 62.7 %)
cat("- R-cuadrado ajustado:", round(summary_modelo$adj.r.squared, 4), "\n")
## - R-cuadrado ajustado: 0.5612
cat("- Estadístico F:", round(summary_modelo$fstatistic[1], 2), "\n")
## - Estadístico F: 9.59
# Variables significativas
vars_significativas <- rownames(summary_modelo$coefficients)[summary_modelo$coefficients[, 4] < 0.05]
vars_significativas <- vars_significativas[vars_significativas != "(Intercept)"]
cat("\nVARIABLES SIGNIFICATIVAS (p < 0.05):",
if(length(vars_significativas) > 0) paste(vars_significativas, collapse = ", ") else "Ninguna", "\n")
##
## VARIABLES SIGNIFICATIVAS (p < 0.05): Density, GDP_PC
# 8. ECUACIÓN DEL MODELO
cat("\n=== ECUACIÓN DEL MODELO ===\n")
##
## === ECUACIÓN DEL MODELO ===
cat("Price_Median =", round(coef(modelo)[1], 2))
## Price_Median = 1487.15
for(i in 2:length(coef(modelo))) {
cat(" +", round(coef(modelo)[i], 2), "*", names(coef(modelo))[i])
}
## + -16.99 * Area_Median + 193.99 * Room_Median + 0.18 * Density + 0 * Population + 0.04 * GDP_PC + -45.59 * URate + -33.96 * MIR2010
cat("\n")
cat("\n=== ANÁLISIS COMPLETADO ===\n")
##
## === ANÁLISIS COMPLETADO ===
# 1. CARGAR LIBRERÍAS
library(knitr)
library(broom)
# 2. ESTIMACIÓN MCO
cat("=== ESTIMACIÓN POR MÍNIMOS CUADRADOS ORDINARIOS (MCO) ===\n")
## === ESTIMACIÓN POR MÍNIMOS CUADRADOS ORDINARIOS (MCO) ===
# Definir variables independientes (excluyendo Price_Median que es la dependiente)
variables_independientes <- setdiff(vars_disponibles, "Price_Median")
# Crear fórmula del modelo
formula_mco <- as.formula(paste("Price_Median ~", paste(variables_independientes, collapse = " + ")))
cat("Fórmula del modelo:", deparse(formula_mco), "\n\n")
## Fórmula del modelo: Price_Median ~ Area_Median + Room_Median + Density + Population + GDP_PC + URate + MIR2010
# Estimación MCO
modelo_mco <- lm(formula_mco, data = datos_completos)
# 3. RESULTADOS DE LA ESTIMACIÓN MCO
cat("=== RESULTADOS DEL MODELO MCO ===\n")
## === RESULTADOS DEL MODELO MCO ===
# Resumen completo
resumen_mco <- summary(modelo_mco)
print(resumen_mco)
##
## Call:
## lm(formula = formula_mco, data = datos_completos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1839.24 -694.31 52.88 490.60 2596.36
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.487e+03 1.250e+03 1.190 0.241245
## Area_Median -1.699e+01 1.702e+01 -0.998 0.324133
## Room_Median 1.940e+02 5.286e+02 0.367 0.715582
## Density 1.844e-01 4.591e-02 4.016 0.000253 ***
## Population 1.173e-04 6.807e-05 1.724 0.092485 .
## GDP_PC 4.082e-02 1.112e-02 3.671 0.000706 ***
## URate -4.559e+01 3.445e+01 -1.323 0.193250
## MIR2010 -3.396e+01 3.510e+01 -0.967 0.339108
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1074 on 40 degrees of freedom
## Multiple R-squared: 0.6265, Adjusted R-squared: 0.5612
## F-statistic: 9.586 on 7 and 40 DF, p-value: 6.164e-07
# 4. TABLA DE COEFICIENTES MCO FORMATO BONITO
cat("\n=== TABLA DE COEFICIENTES MCO ===\n")
##
## === TABLA DE COEFICIENTES MCO ===
tabla_coeficientes <- tidy(modelo_mco)
tabla_coeficientes$significancia <- ifelse(tabla_coeficientes$p.value < 0.001, "***",
ifelse(tabla_coeficientes$p.value < 0.01, "**",
ifelse(tabla_coeficientes$p.value < 0.05, "*",
ifelse(tabla_coeficientes$p.value < 0.1, ".", ""))))
print(tabla_coeficientes)
## # A tibble: 8 × 6
## term estimate std.error statistic p.value significancia
## <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 (Intercept) 1487. 1250. 1.19 0.241 ""
## 2 Area_Median -17.0 17.0 -0.998 0.324 ""
## 3 Room_Median 194. 529. 0.367 0.716 ""
## 4 Density 0.184 0.0459 4.02 0.000253 "***"
## 5 Population 0.000117 0.0000681 1.72 0.0925 "."
## 6 GDP_PC 0.0408 0.0111 3.67 0.000706 "***"
## 7 URate -45.6 34.4 -1.32 0.193 ""
## 8 MIR2010 -34.0 35.1 -0.967 0.339 ""
# Mostrar tabla formateada
kable(tabla_coeficientes,
col.names = c("Variable", "Coeficiente", "Error Estándar", "Estadístico t", "Valor p", "Significancia"),
digits = 4,
caption = "Resultados de la Estimación MCO")
| Variable | Coeficiente | Error Estándar | Estadístico t | Valor p | Significancia |
|---|---|---|---|---|---|
| (Intercept) | 1487.1538 | 1250.2043 | 1.1895 | 0.2412 | |
| Area_Median | -16.9894 | 17.0183 | -0.9983 | 0.3241 | |
| Room_Median | 193.9879 | 528.6325 | 0.3670 | 0.7156 | |
| Density | 0.1844 | 0.0459 | 4.0160 | 0.0003 | *** |
| Population | 0.0001 | 0.0001 | 1.7237 | 0.0925 | . |
| GDP_PC | 0.0408 | 0.0111 | 3.6711 | 0.0007 | *** |
| URate | -45.5866 | 34.4492 | -1.3233 | 0.1933 | |
| MIR2010 | -33.9564 | 35.0971 | -0.9675 | 0.3391 |
# 5. ESTADÍSTICOS DE BONDAD DE AJUSTE
cat("\n=== BONDAD DE AJUSTE ===\n")
##
## === BONDAD DE AJUSTE ===
bondad_ajuste <- glance(modelo_mco)
cat("R-cuadrado:", round(bondad_ajuste$r.squared, 4), "(", round(bondad_ajuste$r.squared * 100, 1), "%)\n")
## R-cuadrado: 0.6265 ( 62.7 %)
cat("R-cuadrado ajustado:", round(bondad_ajuste$adj.r.squared, 4), "\n")
## R-cuadrado ajustado: 0.5612
cat("Estadístico F:", round(bondad_ajuste$statistic, 2), "\n")
## Estadístico F: 9.59
cat("Valor-p del modelo:", bondad_ajuste$p.value, "\n")
## Valor-p del modelo: 6.16443e-07
cat("Número de observaciones:", bondad_ajuste$nobs, "\n")
## Número de observaciones: 48
# 6. ECUACIÓN DEL MODELO MCO
cat("\n=== ECUACIÓN ESTIMADA POR MCO ===\n")
##
## === ECUACIÓN ESTIMADA POR MCO ===
coeficientes <- coef(modelo_mco)
cat("Price_Median =", round(coeficientes[1], 4))
## Price_Median = 1487.154
for(i in 2:length(coeficientes)) {
signo <- ifelse(coeficientes[i] >= 0, "+", "")
cat(" ", signo, round(coeficientes[i], 4), "*", names(coeficientes)[i])
}
## -16.9894 * Area_Median + 193.9879 * Room_Median + 0.1844 * Density + 1e-04 * Population + 0.0408 * GDP_PC -45.5866 * URate -33.9564 * MIR2010
cat("\n")
# 7. INTERPRETACIÓN DE COEFICIENTES
cat("\n=== INTERPRETACIÓN DE COEFICIENTES ===\n")
##
## === INTERPRETACIÓN DE COEFICIENTES ===
for(i in 2:length(coeficientes)) {
var_name <- names(coeficientes)[i]
coef_value <- coeficientes[i]
p_value <- tabla_coeficientes$p.value[tabla_coeficientes$term == var_name]
cat("•", var_name, ":")
if(p_value < 0.05) {
cat(" SIGNIFICATIVO")
} else {
cat(" No significativo")
}
cat(" - Coeficiente:", round(coef_value, 4))
if(p_value < 0.05) {
if(coef_value > 0) {
cat(" (efecto positivo)")
} else {
cat(" (efecto negativo)")
}
}
cat(" | p-value:", round(p_value, 4), "\n")
}
## • Area_Median : No significativo - Coeficiente: -16.9894 | p-value: 0.3241
## • Room_Median : No significativo - Coeficiente: 193.9879 | p-value: 0.7156
## • Density : SIGNIFICATIVO - Coeficiente: 0.1844 (efecto positivo) | p-value: 3e-04
## • Population : No significativo - Coeficiente: 1e-04 | p-value: 0.0925
## • GDP_PC : SIGNIFICATIVO - Coeficiente: 0.0408 (efecto positivo) | p-value: 7e-04
## • URate : No significativo - Coeficiente: -45.5866 | p-value: 0.1933
## • MIR2010 : No significativo - Coeficiente: -33.9564 | p-value: 0.3391
# 8. PREDICCIONES DEL MODELO MCO
cat("\n=== PREDICCIONES MCO ===\n")
##
## === PREDICCIONES MCO ===
predicciones <- predict(modelo_mco)
residuos <- residuals(modelo_mco)
# Mostrar primeras 5 predicciones vs valores reales
cat("Primeras 5 observaciones (Real vs Predicho):\n")
## Primeras 5 observaciones (Real vs Predicho):
ejemplo_pred <- data.frame(
Real = datos_completos$Price_Median[1:5],
Predicho = round(predicciones[1:5], 2),
Residuo = round(residuos[1:5], 2)
)
print(ejemplo_pred)
## Real Predicho Residuo
## 1 3419.973 3397.91 22.07
## 2 2064.004 1955.39 108.62
## 3 3140.000 4570.87 -1430.87
## 5 2150.338 2020.00 130.33
## 6 2357.143 3138.36 -781.22
# 9. GRÁFICO DE PREDICCIONES VS REALES CON LEYENDA
# Crear un data frame para el gráfico
df_grafico <- data.frame(Real = datos_completos$Price_Median, Predicho = predicciones)
# Crear el gráfico
ggplot(df_grafico, aes(x = Real, y = Predicho)) +
geom_point(aes(color = "Predicciones"), alpha = 0.6) +
geom_abline(aes(linetype = "Línea de identidad (y=x)", color = "Línea de identidad (y=x)"),
slope = 1, intercept = 0) +
scale_color_manual(name = "Elementos",
values = c("Predicciones" = "blue", "Línea de identidad (y=x)" = "red")) +
scale_linetype_manual(name = "Elementos", values = c("Línea de identidad (y=x)" = "dashed")) +
labs(title = "Predicciones MCO vs Valores Reales",
x = "Valor Real de Price_Median",
y = "Valor Predicho por MCO") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"),
legend.position = "bottom")
## Warning: `geom_abline()`: Ignoring `mapping` because `slope` and/or `intercept` were
## provided.
## Warning: No shared levels found between `names(values)` of the manual scale and the
## data's linetype values.