Simple Linear Regression
| Kontak | : \(\downarrow\) |
| dehlaagatha@gmail.com | |
| https://www.instagram.com/dhelaagatha/ | |
| RPubs | https://rpubs.com/dhelaasafiani/ |
| Nama | Dhela Agatha |
| NIM | 20214920009 |
| Prodi | Statistika |
Load Data and Library
library(ggplot2)
library(dplyr)
library(broom)
library(ggpubr)inc = read.csv("income.data.csv")
summary (inc)## X income happiness
## Min. : 1.0 Min. :1.506 Min. :0.266
## 1st Qu.:125.2 1st Qu.:3.006 1st Qu.:2.266
## Median :249.5 Median :4.424 Median :3.473
## Mean :249.5 Mean :4.467 Mean :3.393
## 3rd Qu.:373.8 3rd Qu.:5.992 3rd Qu.:4.503
## Max. :498.0 Max. :7.482 Max. :6.863
Assumption Testing on The Data Used
Autocorrelation Test
Because we only have one independent variable and one dependent variable, we don’t need to test for any hidden relationships among variables.
Normality test
Testing apakah data yang digunakan berdistribusi normal atau tidak
hist(inc$happiness)shapiro.test(inc$happiness)##
## Shapiro-Wilk normality test
##
## data: inc$happiness
## W = 0.98705, p-value = 0.0002095
Karena p-value < 0,05 dan Histrogram mengvisualisasikan data nya mirip seperti lonceng, maka diasumsikan datanya memiliki distribusi normal.
Linearity
Variabel Dependen dan Independen harus memiliki hubungan linear yang jelas
plot(happiness ~ income, data = inc)
Hasil Grafk menunjukkan sebuah hubungan linear positif yang kuat antara
happiness dan income
Homogenitas Variansi
Homogenitas Variansi akan diuji setelah model sudsh dibuat untuk menunjukkan predksi tidak akan meleset jauh daripada prediksi lainnya.
Linear Model
lm.inc <- lm(happiness ~ income, data = inc)
summary(lm.inc)##
## Call:
## lm(formula = happiness ~ income, data = inc)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.02479 -0.48526 0.04078 0.45898 2.37805
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.20427 0.08884 2.299 0.0219 *
## income 0.71383 0.01854 38.505 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7181 on 496 degrees of freedom
## Multiple R-squared: 0.7493, Adjusted R-squared: 0.7488
## F-statistic: 1483 on 1 and 496 DF, p-value: < 2.2e-16
Let’s see if there’s a linear relationship between income and happiness in our survey of 500 people with incomes ranging from $15k to $75k, where happiness is measured on a scale of 1 to 10.
\[ Y = 0.204 + 0.7138 X \]
Karena p-value < 0.05 bisa di bilang Model Linear yang ada akan berfungsu cukup baik dengan tingkat akurasi (R_SQ) kurang lebih 75%.
artinya setiap X atau $10000 dapat meningkatkan Index Kebahagian sebesar 0.71
Graph Visualization
Scatter Plot
income.graph<-ggplot(inc, aes(x=income, y=happiness))+
geom_point()
income.graphAdd The Regression Line
income.graph <- income.graph + geom_smooth(method="lm", col="black")
income.graphAdd The Regression Equation
income.graph <- income.graph +
stat_regline_equation(label.x = 3, label.y = 7)
income.graphGraph Finalization
income.graph +
theme_minimal() +
labs(title = "Reported happiness as a function of income",
x = "Income (x$10,000)",
y = "Happiness score (0 to 10)")Results
After we see the Graph, we can conclude that there is significant relation between Income and Hppiness
cor(inc$income, inc$happiness)## [1] 0.8656337
Correlation test antara keduanya menyatakan bahwa Income memengaruhi Happiness sebesar kira-kira 86,5% Artinya, dari model yang telah kita buat, memang Keduanya memiliki korealsi psitif dengan tiap kenaikan $10000 Income akan naik 0,71 pada Skala Happiness
Therefore, It’s True That MONEY CAN’T BUY HAPPINESS, But it can be used to buy a BMW which will make u Happy :)