The Numbeo website (www.numbeo.com) provides access to a variety of data. One table lists prices of certain items in selected cities around the world. They also report an overall cost-of-living index for each city compared to the costs of hundreds of items in New York City. For example, London at 110.69 is 10.69% more expensive than New York. In the data file Cost_of_living_2013.txt included are the Cost of Living Index, a Rent Index, a Groceries Index, a Restaurant Price Index, and a Local Purchasing Power Index that measures the ability of the average wage earner in a city to buy foods and services. All indices are measured relative to New York City, which is scored 100.
setwd("C:/Users/cris-/Desktop/r111")
x <- read.table("Cost_of_living_2013.txt", sep = '\t', header = TRUE)
names(x) #getting to know my dataset
## [1] "City" "Cost.of.Living.Index"
## [3] "Rent.Index" "Groceries.Index"
## [5] "Restaurant.Price.Index" "Local.Purchasing.Power.Index"
Cost of Living Index vs. Rent
Moderate positive linear relationship
plot(Cost.of.Living.Index ~ Rent.Index, data = x)
Cost of Living Index vs. Groceries Index
Strong positive linear relationship
plot(Cost.of.Living.Index ~ Groceries.Index, data = x)
Cost of living Index vs. Restaurant Price Index
Strong Positive linear relationship
plot(Cost.of.Living.Index ~ Restaurant.Price.Index, data = x)
Cost of living Index vs. Local Purchasing Power Index
Weak linear relationship
plot(Cost.of.Living.Index ~ Local.Purchasing.Power.Index, data = x)
Cost of Living Index vs. Rent
cor(x$Cost.of.Living.Index, x$Rent.Index)
## [1] 0.7722926
Cost of Living Index vs. Groceries Index
cor(x$Cost.of.Living.Index, x$Groceries.Index)
## [1] 0.9538616
Cost of living Index vs. Restaurant Price Index
cor(x$Cost.of.Living.Index, x$Restaurant.Price.Index)
## [1] 0.9493554
Cost of living Index vs. Local Purchasing Power Index
cor(x$Cost.of.Living.Index, x$Local.Purchasing.Power.Index)
## [1] 0.525902
Model 1
m1 <- lm(Cost.of.Living.Index ~ Rent.Index, data = x)
coef(m1)
## (Intercept) Rent.Index
## 45.232600 1.024624
Model 2
m2 <- lm(Cost.of.Living.Index ~ Groceries.Index, data = x)
coef(m2)
## (Intercept) Groceries.Index
## 9.2178364 0.9529463
Model 3
m3 <- lm(Cost.of.Living.Index ~ Restaurant.Price.Index, data = x)
coef(m3)
## (Intercept) Restaurant.Price.Index
## 24.6635984 0.8033304
Model 4
m4 <- lm(Cost.of.Living.Index ~ Local.Purchasing.Power.Index, data = x)
coef(m4)
## (Intercept) Local.Purchasing.Power.Index
## 48.9974246 0.3761637
R-squared
Look at R-squared; R-squared is the percentage/fraction of the variability in the predicted value that is accounted for by the regression model
Model 1
summary(m1)$r.squared
## [1] 0.5964358
Model 2
summary(m2)$r.squared
## [1] 0.909852
Model 3
summary(m3)$r.squared
## [1] 0.9012757
Model 4
summary(m4)$r.squared
## [1] 0.2765729
The best model is Model 2, since it has high r (correlation) and R-squared, whereas the worst is Model 4.
summary(m2)
##
## Call:
## lm(formula = Cost.of.Living.Index ~ Groceries.Index, data = x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.2714 -6.2766 0.4478 5.2780 20.7336
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.21784 1.31039 7.034 1.22e-11 ***
## Groceries.Index 0.95295 0.01677 56.831 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.356 on 320 degrees of freedom
## Multiple R-squared: 0.9099, Adjusted R-squared: 0.9096
## F-statistic: 3230 on 1 and 320 DF, p-value: < 2.2e-16
Cost of living for Beijing? fitted value
predicting = which(x$City == 'Beijing, China')
m2$fitted.values[predicting]
## 172
## 88.85556
Residual for prediction
Its residual is an overestimate, that is, The cost of living index for Beijing, as predicted by Groceries Index, is 88.86 (11.14% less expensive than New York).
m2$residuals[predicting]
## 172
## -11.66556