Simple Linear Regression

Kontak : \(\downarrow\)
Email
Instagram https://www.instagram.com/dhelaagatha/
RPubs https://rpubs.com/dhelaasafiani/
Nama Dhela Agatha
NIM 20214920009
Prodi Statistika

Load Data and Library

library(ggplot2)
library(dplyr)
library(broom)
library(ggpubr)
inc = read.csv("income.data.csv")
summary (inc)
##        X             income        happiness    
##  Min.   :  1.0   Min.   :1.506   Min.   :0.266  
##  1st Qu.:125.2   1st Qu.:3.006   1st Qu.:2.266  
##  Median :249.5   Median :4.424   Median :3.473  
##  Mean   :249.5   Mean   :4.467   Mean   :3.393  
##  3rd Qu.:373.8   3rd Qu.:5.992   3rd Qu.:4.503  
##  Max.   :498.0   Max.   :7.482   Max.   :6.863

Assumption Testing on The Data Used

Autocorrelation Test

Because we only have one independent variable and one dependent variable, we don’t need to test for any hidden relationships among variables.

Normality test

Testing apakah data yang digunakan berdistribusi normal atau tidak

hist(inc$happiness)

shapiro.test(inc$happiness)
## 
##  Shapiro-Wilk normality test
## 
## data:  inc$happiness
## W = 0.98705, p-value = 0.0002095

Karena p-value < 0,05 dan Histrogram mengvisualisasikan data nya mirip seperti lonceng, maka diasumsikan datanya memiliki distribusi normal.

Linearity

Variabel Dependen dan Independen harus memiliki hubungan linear yang jelas

plot(happiness ~ income, data = inc)

Hasil Grafk menunjukkan sebuah hubungan linear positif yang kuat antara happiness dan income

Homogenitas Variansi

Homogenitas Variansi akan diuji setelah model sudsh dibuat untuk menunjukkan predksi tidak akan meleset jauh daripada prediksi lainnya.

Linear Model

lm.inc <- lm(happiness ~ income, data = inc)

summary(lm.inc)
## 
## Call:
## lm(formula = happiness ~ income, data = inc)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.02479 -0.48526  0.04078  0.45898  2.37805 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.20427    0.08884   2.299   0.0219 *  
## income       0.71383    0.01854  38.505   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7181 on 496 degrees of freedom
## Multiple R-squared:  0.7493, Adjusted R-squared:  0.7488 
## F-statistic:  1483 on 1 and 496 DF,  p-value: < 2.2e-16

Let’s see if there’s a linear relationship between income and happiness in our survey of 500 people with incomes ranging from $15k to $75k, where happiness is measured on a scale of 1 to 10.

\[ Y = 0.204 + 0.7138 X \]

Karena p-value < 0.05 bisa di bilang Model Linear yang ada akan berfungsu cukup baik dengan tingkat akurasi (R_SQ) kurang lebih 75%.

artinya setiap X atau $10000 dapat meningkatkan Index Kebahagian sebesar 0.71

Graph Visualization

Scatter Plot

income.graph<-ggplot(inc, aes(x=income, y=happiness))+
                     geom_point()
income.graph

Add The Regression Line

income.graph <- income.graph + geom_smooth(method="lm", col="black")

income.graph

Add The Regression Equation

income.graph <- income.graph +
  stat_regline_equation(label.x = 3, label.y = 7)

income.graph

Graph Finalization

income.graph +
  theme_minimal() +
  labs(title = "Reported happiness as a function of income",
      x = "Income (x$10,000)",
      y = "Happiness score (0 to 10)")

Results

After we see the Graph, we can conclude that there is significant relation between Income and Hppiness

cor(inc$income, inc$happiness)
## [1] 0.8656337

Correlation test antara keduanya menyatakan bahwa Income memengaruhi Happiness sebesar kira-kira 86,5% Artinya, dari model yang telah kita buat, memang Keduanya memiliki korealsi psitif dengan tiap kenaikan $10000 Income akan naik 0,71 pada Skala Happiness

Therefore, It’s True That MONEY CAN’T BUY HAPPINESS, But it can be used to buy a BMW which will make u Happy :)