The author explores the relationship between a quality of life index and property-price-to-ratio, as listed on the Numbeo.com, a database of user contributed data about cities and countries worldwide.
https://www.numbeo.com/quality-of-life/rankings.jsp
The dataset has the following indices but we look at columns 3 and 8. We remove one outlier (Caracas, Venezuela), which has a Propertry Price to Income ration of 143.21, nearly 3 times the second highest, likely due to inflation.
names(life)
## [1] "ï..Rank" "City"
## [3] "Quality.of.Life.Index" "Purchasing.Power.Index"
## [5] "Safety.Index" "Health.Care.Index"
## [7] "Cost.of.Living.Index" "Property.Price.to.Income.Ratio"
## [9] "Traffic.Commute.Time.Index" "Pollution.Index"
## [11] "Climate.Index"
life <- life[c(2,3,8)]
life <- life %>% filter(Property.Price.to.Income.Ratio < 60)
## Warning: package 'bindrcpp' was built under R version 3.5.1
We perform a linear regression, after noting a negative correlation of -0.605.
## [1] -0.6498441
##
## Call:
## lm(formula = life$Quality.of.Life ~ life$Property.Price.to.Income.Ratio,
## data = life)
##
## Residuals:
## Min 1Q Median 3Q Max
## -79.276 -20.598 1.484 21.050 74.443
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 181.1469 3.2977 54.93 <2e-16
## life$Property.Price.to.Income.Ratio -3.1301 0.2452 -12.77 <2e-16
##
## (Intercept) ***
## life$Property.Price.to.Income.Ratio ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28.06 on 223 degrees of freedom
## Multiple R-squared: 0.4223, Adjusted R-squared: 0.4197
## F-statistic: 163 on 1 and 223 DF, p-value: < 2.2e-16
plot(life$Property.Price.to.Income.Ratio, life$Quality.of.Life)
abline(lm1, col="blue")
The regression line has a slope of -3.13, suggesting a drop in 3.13 points in the Quality of Life index for every 1 point increase in the ratio.
Next, we check the fitted values again the actual values:
plot(fitted(lm1),resid(lm1))
abline(h=0, col="blue")
We see that for Quality of Life values over 100, they seem to be randomly distributed about 0, while for the sparse values below 100, the model predicts higher QoL indices than actually observed.
hist(resid(lm1))
The histogram above confirms that the residuals are nearly normal.
Other related graphs are displayed here, with Index 196
opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0),
mar = c(4.1, 4.1, 2.1, 1.1))
plot(lm1)
par(opar)
Cities that have residuals well beyond predicted values include the following three cities:
life[c(196,224,225),]
## City Quality.of.Life.Index
## 196 Hong Kong, Hong Kong 100.90
## 224 Dhaka, Bangladesh 60.46
## 225 Lagos, Nigeria 51.31
## Property.Price.to.Income.Ratio
## 196 49.42
## 224 13.23
## 225 17.23