2024-09-16

Cost of Living Index by Country Dataset

Simple Linear Regression

\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon \] Where:

  • \(y\) = Dependent variable (Cost of Living Index)
  • \(\beta_0\) = Intercept
  • \(\beta_1\) = Coefficient for Rent Index
  • \(\beta_2\) = Coefficient for Groceries Index
  • \(x_1\) = Rent Index
  • \(x_2\) = Groceries Index
  • \(\epsilon\) = Error term (residual)

Rent & Groceries Indices

Call:
lm(formula = Cost.of.Living.Index ~ Rent.Index + Groceries.Index, 
    data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.829  -2.460  -0.302   2.647  10.153 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      5.35365    1.10862   4.829 4.16e-06 ***
Rent.Index       0.28593    0.05206   5.492 2.32e-07 ***
Groceries.Index  0.75995    0.03484  21.814  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.145 on 118 degrees of freedom
Multiple R-squared:  0.9352,    Adjusted R-squared:  0.9341 
F-statistic: 851.4 on 2 and 118 DF,  p-value: < 2.2e-16

Plotly 3D Plot

plot_ly(data=df, x=~Rent.Index, y=~Groceries.Index, 
        z=~Cost.of.Living.Index,
        type="scatter3d", mode="markers")

Regression Line using Groceries Index

`geom_smooth()` using formula = 'y ~ x'

Residuals Plot

Residual Calculation

\[ \text{Residual} = y - \hat{y} \] Where:

  • \(y\) = Actual value (observed Cost of Living Index)
  • \(\hat{y}\) = Predicted value (from the multiple linear regression model)