Final project
Data coding
Introduzione
Il dataset analizzato contiene informazioni su 2.930 proprietà ad Ames, Iowa, nel periodo 2006-2010, incluse colonne relative a:
Caratteristiche della casa (bedrooms, garage, fireplace, pool, porch, etc.)
Località (neighborhood)
Informazioni sul lotto (zoning, shape, size, etc.)
Valutazioni di condizione e qualità
Prezzo di vendita
L’obiettivo di modellazione è vedere come varia il prezzo di vendita (Sale_price ) di una casa sulla base di altre informazioni in nostro possesso, come le sue caratteristiche e l’ubicazione.
Statistiche descrittive
na_count <- sapply (ames, function (ames) sum (length (which (is.na (ames)))))
na_count
MS_SubClass MS_Zoning Lot_Frontage Lot_Area
0 0 0 0
Street Alley Lot_Shape Land_Contour
0 0 0 0
Utilities Lot_Config Land_Slope Neighborhood
0 0 0 0
Condition_1 Condition_2 Bldg_Type House_Style
0 0 0 0
Overall_Cond Year_Built Year_Remod_Add Roof_Style
0 0 0 0
Roof_Matl Exterior_1st Exterior_2nd Mas_Vnr_Type
0 0 0 0
Mas_Vnr_Area Exter_Cond Foundation Bsmt_Cond
0 0 0 0
Bsmt_Exposure BsmtFin_Type_1 BsmtFin_SF_1 BsmtFin_Type_2
0 0 0 0
BsmtFin_SF_2 Bsmt_Unf_SF Total_Bsmt_SF Heating
0 0 0 0
Heating_QC Central_Air Electrical First_Flr_SF
0 0 0 0
Second_Flr_SF Gr_Liv_Area Bsmt_Full_Bath Bsmt_Half_Bath
0 0 0 0
Full_Bath Half_Bath Bedroom_AbvGr Kitchen_AbvGr
0 0 0 0
TotRms_AbvGrd Functional Fireplaces Garage_Type
0 0 0 0
Garage_Finish Garage_Cars Garage_Area Garage_Cond
0 0 0 0
Paved_Drive Wood_Deck_SF Open_Porch_SF Enclosed_Porch
0 0 0 0
Three_season_porch Screen_Porch Pool_Area Pool_QC
0 0 0 0
Fence Misc_Feature Misc_Val Mo_Sold
0 0 0 0
Year_Sold Sale_Type Sale_Condition Sale_Price
0 0 0 0
Longitude Latitude
0 0
library (summarytools)
descr (ames)
Descriptive Statistics
ames
N: 2930
Bedroom_AbvGr Bsmt_Full_Bath Bsmt_Half_Bath Bsmt_Unf_SF BsmtFin_SF_1
----------------- --------------- ---------------- ---------------- ------------- --------------
Mean 2.85 0.43 0.06 559.07 4.18
Std.Dev 0.83 0.52 0.25 439.54 2.23
Min 0.00 0.00 0.00 0.00 0.00
Q1 2.00 0.00 0.00 219.00 3.00
Median 3.00 0.00 0.00 465.50 3.00
Q3 3.00 1.00 0.00 802.00 7.00
Max 8.00 3.00 2.00 2336.00 7.00
MAD 0.00 0.00 0.00 414.39 2.97
IQR 1.00 1.00 0.00 582.75 4.00
CV 0.29 1.22 4.01 0.79 0.53
Skewness 0.31 0.62 3.94 0.92 0.09
SE.Skewness 0.05 0.05 0.05 0.05 0.05
Kurtosis 1.88 -0.75 14.90 0.40 -1.51
N.Valid 2930.00 2930.00 2930.00 2930.00 2930.00
Pct.Valid 100.00 100.00 100.00 100.00 100.00
Table: Table continues below
BsmtFin_SF_2 Enclosed_Porch Fireplaces First_Flr_SF Full_Bath
----------------- -------------- ---------------- ------------ -------------- -----------
Mean 49.71 23.01 0.60 1159.56 1.57
Std.Dev 169.14 64.14 0.65 391.89 0.55
Min 0.00 0.00 0.00 334.00 0.00
Q1 0.00 0.00 0.00 876.00 1.00
Median 0.00 0.00 1.00 1084.00 2.00
Q3 0.00 0.00 1.00 1384.00 2.00
Max 1526.00 1012.00 4.00 5095.00 4.00
MAD 0.00 0.00 1.48 349.89 0.00
IQR 0.00 0.00 1.00 507.75 1.00
CV 3.40 2.79 1.08 0.34 0.35
Skewness 4.14 4.01 0.74 1.47 0.17
SE.Skewness 0.05 0.05 0.05 0.05 0.05
Kurtosis 18.74 28.42 0.10 6.95 -0.54
N.Valid 2930.00 2930.00 2930.00 2930.00 2930.00
Pct.Valid 100.00 100.00 100.00 100.00 100.00
Table: Table continues below
Garage_Area Garage_Cars Gr_Liv_Area Half_Bath Kitchen_AbvGr Latitude
----------------- ------------- ------------- ------------- ----------- --------------- ----------
Mean 472.66 1.77 1499.69 0.38 1.04 42.03
Std.Dev 215.19 0.76 505.51 0.50 0.21 0.02
Min 0.00 0.00 334.00 0.00 0.00 41.99
Q1 320.00 1.00 1126.00 0.00 1.00 42.02
Median 480.00 2.00 1442.00 0.00 1.00 42.03
Q3 576.00 2.00 1743.00 1.00 1.00 42.05
Max 1488.00 5.00 5642.00 2.00 3.00 42.06
MAD 182.36 0.00 461.09 0.00 0.00 0.02
IQR 256.00 1.00 616.75 1.00 0.00 0.03
CV 0.46 0.43 0.34 1.32 0.20 0.00
Skewness 0.24 -0.22 1.27 0.70 4.31 -0.49
SE.Skewness 0.05 0.05 0.05 0.05 0.05 0.05
Kurtosis 0.94 0.24 4.12 -1.03 19.82 -0.18
N.Valid 2930.00 2930.00 2930.00 2930.00 2930.00 2930.00
Pct.Valid 100.00 100.00 100.00 100.00 100.00 100.00
Table: Table continues below
Longitude Lot_Area Lot_Frontage Mas_Vnr_Area Misc_Val Mo_Sold
----------------- ----------- ----------- -------------- -------------- ---------- ---------
Mean -93.64 10147.92 57.65 101.10 50.64 6.22
Std.Dev 0.03 7880.02 33.50 178.63 566.34 2.71
Min -93.69 1300.00 0.00 0.00 0.00 1.00
Q1 -93.66 7440.00 43.00 0.00 0.00 4.00
Median -93.64 9436.50 63.00 0.00 0.00 6.00
Q3 -93.62 11556.00 78.00 163.00 0.00 8.00
Max -93.58 215245.00 313.00 1600.00 17000.00 12.00
MAD 0.03 3024.50 25.20 0.00 0.00 2.97
IQR 0.04 4115.00 35.00 162.75 0.00 4.00
CV 0.00 0.78 0.58 1.77 11.18 0.44
Skewness -0.31 12.81 0.03 2.62 21.98 0.19
SE.Skewness 0.05 0.05 0.05 0.05 0.05 0.05
Kurtosis -0.94 264.39 2.15 9.34 564.85 -0.46
N.Valid 2930.00 2930.00 2930.00 2930.00 2930.00 2930.00
Pct.Valid 100.00 100.00 100.00 100.00 100.00 100.00
Table: Table continues below
Open_Porch_SF Pool_Area Sale_Price Screen_Porch Second_Flr_SF
----------------- --------------- ----------- ------------ -------------- ---------------
Mean 47.53 2.24 180796.06 16.00 335.46
Std.Dev 67.48 35.60 79886.69 56.09 428.40
Min 0.00 0.00 12789.00 0.00 0.00
Q1 0.00 0.00 129500.00 0.00 0.00
Median 27.00 0.00 160000.00 0.00 0.00
Q3 70.00 0.00 213500.00 0.00 704.00
Max 742.00 800.00 755000.00 576.00 2065.00
MAD 40.03 0.00 54856.20 0.00 0.00
IQR 70.00 0.00 84000.00 0.00 703.75
CV 1.42 15.87 0.44 3.51 1.28
Skewness 2.53 16.92 1.74 3.95 0.87
SE.Skewness 0.05 0.05 0.05 0.05 0.05
Kurtosis 10.92 299.06 5.10 17.81 -0.42
N.Valid 2930.00 2930.00 2930.00 2930.00 2930.00
Pct.Valid 100.00 100.00 100.00 100.00 100.00
Table: Table continues below
Three_season_porch Total_Bsmt_SF TotRms_AbvGrd Wood_Deck_SF Year_Built
----------------- -------------------- --------------- --------------- -------------- ------------
Mean 2.59 1051.26 6.44 93.75 1971.36
Std.Dev 25.14 440.97 1.57 126.36 30.25
Min 0.00 0.00 2.00 0.00 1872.00
Q1 0.00 793.00 5.00 0.00 1954.00
Median 0.00 990.00 6.00 0.00 1973.00
Q3 0.00 1302.00 7.00 168.00 2001.00
Max 508.00 6110.00 15.00 1424.00 2010.00
MAD 0.00 349.89 1.48 0.00 37.06
IQR 0.00 508.50 2.00 168.00 47.00
CV 9.70 0.42 0.24 1.35 0.02
Skewness 11.39 1.15 0.75 1.84 -0.60
SE.Skewness 0.05 0.05 0.05 0.05 0.05
Kurtosis 149.63 9.08 1.15 6.73 -0.50
N.Valid 2930.00 2930.00 2930.00 2930.00 2930.00
Pct.Valid 100.00 100.00 100.00 100.00 100.00
Table: Table continues below
Year_Remod_Add Year_Sold
----------------- ---------------- -----------
Mean 1984.27 2007.79
Std.Dev 20.86 1.32
Min 1950.00 2006.00
Q1 1965.00 2007.00
Median 1993.00 2008.00
Q3 2004.00 2009.00
Max 2010.00 2010.00
MAD 20.76 1.48
IQR 39.00 2.00
CV 0.01 0.00
Skewness -0.45 0.13
SE.Skewness 0.05 0.05
Kurtosis -1.34 -1.16
N.Valid 2930.00 2930.00
Pct.Valid 100.00 100.00
Dalle statistiche descrittive è possibile notare che la superficie dei lotti delle abitazioni vari tra i 1300 piedi quadri e circa 215000 piedi quadri, le case sono state costruite tra il 1872 e il 2010, e la superficie abitabile varia tra 334 piedi quadri ai 5642 piedi quadri, con il valore medio di 1500 piedi quadrati. In media, le case hanno un bagno e mezzo, 1 cucina, 3 camere da letto.
Distribuzione “Sale_price”
ggplot (ames, aes (x = Sale_Price)) +
geom_histogram (bins = 50 , col= "white" , fill= "light blue" )
ggplot (ames, aes (x = Sale_Price)) +
geom_histogram (bins = 50 , col= "white" , fill= "light blue" ) +
scale_x_log10 ()
ames <- ames %>% mutate (Sale_Price = log10 (Sale_Price))
Sale_Price rispetto ai quartieri
Sale_Price rispetto al numero di camere da letto
Sale_Price rispetto al numero di camere da letto
Sale_Price rispetto alle condizioni dell’abitazione
Sale_Price rispetto alle condizioni dell’abitazione
Correlazione tra le variabili
Matrice di correlazione che mostra solo le variabili con coefficienti di correlazione superiori a 0,5.
Var1 Var2 Freq
2250 BsmtFin_Type_1 BsmtFin_SF_1 0.9991444
4050 Garage_Cars Garage_Area 0.8898660
1650 Exterior_1st Exterior_2nd 0.8654165
3594 Gr_Liv_Area TotRms_AbvGrd 0.8077721
2921 Total_Bsmt_SF First_Flr_SF 0.8004287
1037 MS_SubClass Bldg_Type 0.7188418
2976 House_Style Second_Flr_SF 0.7175417
2400 BsmtFin_Type_2 BsmtFin_SF_2 -0.7113410
5296 Gr_Liv_Area Sale_Price 0.6958623
5308 Garage_Cars Sale_Price 0.6748777
3599 Bedroom_AbvGr TotRms_AbvGrd 0.6726472
3075 Second_Flr_SF Gr_Liv_Area 0.6552512
5309 Garage_Area Sale_Price 0.6507663
1942 Year_Built Foundation 0.6366324
3298 Gr_Liv_Area Full_Bath 0.6303208
5289 Total_Bsmt_SF Sale_Price 0.6256220
5272 Year_Built Sale_Price 0.6154845
1350 Year_Built Year_Remod_Add 0.6120953
3371 Second_Flr_SF Half_Bath 0.6116337
5294 First_Flr_SF Sale_Price 0.6026285
5273 Year_Remod_Add Sale_Price 0.5861531
3593 Second_Flr_SF TotRms_AbvGrd 0.5852137
3346 House_Style Half_Bath 0.5850323
5299 Full_Bath Sale_Price 0.5773341
4725 Pool_Area Pool_QC -0.5699490
3074 First_Flr_SF Gr_Liv_Area 0.5621658
3792 Year_Built Garage_Type -0.5430286
3940 Year_Built Garage_Cars 0.5379817
3597 Full_Bath TotRms_AbvGrd 0.5285992
3446 Gr_Liv_Area Bedroom_AbvGr 0.5168075
5306 Garage_Type Sale_Price -0.5047736
3445 Second_Flr_SF Bedroom_AbvGr 0.5046506
2683 Year_Remod_Add Heating_QC -0.5036757
Sale_Price rispetto all’anno di costruzione
Sale_Price rispetto all’anno di costruzione
Sale_Price rispetto all’area totale del seminterrato
Sale_Price rispetto al tipo di garage
Sale_Price rispetto al tipo di garage
Sale_Price rispetto all’area del garage
Regressione
Lasso e Ridge
Cross Validation e model selection
Random Forest
rmse
0.07736809
0.3616007
0.3566065
0.07481052
0.01971177