Numbeo Quality of Life Indices

The author explores the relationship between a quality of life index and property-price-to-ratio, as listed on the Numbeo.com, a database of user contributed data about cities and countries worldwide.

https://www.numbeo.com/quality-of-life/rankings.jsp

The dataset has the following indices but we look at columns 3 and 8. We remove one outlier (Caracas, Venezuela), which has a Propertry Price to Income ration of 143.21, nearly 3 times the second highest, likely due to inflation.

names(life)
##  [1] "ï..Rank"                        "City"                          
##  [3] "Quality.of.Life.Index"          "Purchasing.Power.Index"        
##  [5] "Safety.Index"                   "Health.Care.Index"             
##  [7] "Cost.of.Living.Index"           "Property.Price.to.Income.Ratio"
##  [9] "Traffic.Commute.Time.Index"     "Pollution.Index"               
## [11] "Climate.Index"
life <- life[c(2,3,8)]
life <- life %>% filter(Property.Price.to.Income.Ratio < 60)
## Warning: package 'bindrcpp' was built under R version 3.5.1

Including Plots

We perform a linear regression, after noting a negative correlation of -0.605.

## [1] -0.6498441
## 
## Call:
## lm(formula = life$Quality.of.Life ~ life$Property.Price.to.Income.Ratio, 
##     data = life)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -79.276 -20.598   1.484  21.050  74.443 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)
## (Intercept)                         181.1469     3.2977   54.93   <2e-16
## life$Property.Price.to.Income.Ratio  -3.1301     0.2452  -12.77   <2e-16
##                                        
## (Intercept)                         ***
## life$Property.Price.to.Income.Ratio ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 28.06 on 223 degrees of freedom
## Multiple R-squared:  0.4223, Adjusted R-squared:  0.4197 
## F-statistic:   163 on 1 and 223 DF,  p-value: < 2.2e-16
plot(life$Property.Price.to.Income.Ratio, life$Quality.of.Life)
abline(lm1, col="blue")

The regression line has a slope of -3.13, suggesting a drop in 3.13 points in the Quality of Life index for every 1 point increase in the ratio.

Next, we check the fitted values again the actual values:

plot(fitted(lm1),resid(lm1))
abline(h=0, col="blue")

We see that for Quality of Life values over 100, they seem to be randomly distributed about 0, while for the sparse values below 100, the model predicts higher QoL indices than actually observed.

hist(resid(lm1))

The histogram above confirms that the residuals are nearly normal.

Other related graphs are displayed here, with Index 196

opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0),
            mar = c(4.1, 4.1, 2.1, 1.1))
plot(lm1)

par(opar)

Cities that have residuals well beyond predicted values include the following three cities:

life[c(196,224,225),]
##                     City Quality.of.Life.Index
## 196 Hong Kong, Hong Kong                100.90
## 224    Dhaka, Bangladesh                 60.46
## 225       Lagos, Nigeria                 51.31
##     Property.Price.to.Income.Ratio
## 196                          49.42
## 224                          13.23
## 225                          17.23