Tugas 6 Komputasi Statistika

Simple Linear Regression in R


Kontak \(\downarrow\)
Email
Instagram https://www.instagram.com/nbrigittag/
RPubs https://rpubs.com/naftalibrigitta/
Nama Naftali Brigitta Gunawan
NIM 20214920002

Step 1 : Load the data into R

library(ggplot2)
library(dplyr)
library(broom)
library(ggpubr)

income <- read.csv("income.data.csv")
summary(income)
##        X             income        happiness    
##  Min.   :  1.0   Min.   :1.506   Min.   :0.266  
##  1st Qu.:125.2   1st Qu.:3.006   1st Qu.:2.266  
##  Median :249.5   Median :4.424   Median :3.473  
##  Mean   :249.5   Mean   :4.467   Mean   :3.393  
##  3rd Qu.:373.8   3rd Qu.:5.992   3rd Qu.:4.503  
##  Max.   :498.0   Max.   :7.482   Max.   :6.863

The conclusion of summary(income) :

  • Dependent Variables = happiness

  • Independent Variables = income


Step 2 : Make sure your data meet the assumptions

There are four main assumptions for linear regression.

1. Independence of observations (or no autocorrelation)

Because we only have one independent variable and one dependent variable, so we don’t need to test th relationship. And we can move to the next step

2. Normality

hist(income$happiness)

Because the histogram are like bell-shaped (high in the middle and fewer on the tails), so we can move to the next step.

3. Linearity

The relationship between independent and dependent variable must be linear, so we can use plot to visualize with a scatter plot.

plot(happiness ~ income, data = income)

The result of the plot looks roughly linear, so we can move to the next step.

4. Homoscedasticity (or homogeneity of variance)

This means that the prediction error doesn’t change significantly. We can test this assumption later, after fitting the linear model.


Step 3 : Perform the linear regression analysis

income.happiness.lm <- lm(happiness ~ income, data = income)

summary(income.happiness.lm)
## 
## Call:
## lm(formula = happiness ~ income, data = income)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.02479 -0.48526  0.04078  0.45898  2.37805 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.20427    0.08884   2.299   0.0219 *  
## income       0.71383    0.01854  38.505   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7181 on 496 degrees of freedom
## Multiple R-squared:  0.7493, Adjusted R-squared:  0.7488 
## F-statistic:  1483 on 1 and 496 DF,  p-value: < 2.2e-16

The result are :

  • The estimates (Estimate) for the model parameters – the value of the y-intercept (in this case 0.204) and the estimated effect of income on happiness (0.713).

  • The p value (Pr(>|t|)) is 2.2e-16, so the alternative hypothesis is accepted and null hypothesis is rejected.


Step 4 : Check for homoscedasticity

par(mfrow=c(2,2))
plot(income.happiness.lm)

par(mfrow=c(1,1))

The residuals form our models almost perfectly (linear line), based on these residuals, we can say that our model meets the assumption of homoscedasticity.


Step 5 : Visualize the results with a graph

1. Plot the data points on a graph

income.graph<-ggplot(income, aes(x=income, y=happiness))+
                     geom_point()
income.graph

2. Add the linear regression line to the plotted data

We can use geom_smoothand typing in lm to show linear regression line.

income.graph <- income.graph + geom_smooth(method="lm", col="red")

income.graph

3. Add the equation for the regression line

income.graph <- income.graph +
  stat_regline_equation(label.x = 3, label.y = 7)

income.graph

4. Make the graph ready for publication

We can use theme_bw to add some style and use labs() to make custom labels

income.graph +
  theme_bw() +
  labs(title = "Reported happiness as a function of income",
      x = "Income (x$10,000)",
      y = "Happiness score (0 to 10)")


Step 6 : Report your results

We found a significant relationship between income and happiness (p < 0.001, R2 = 0.73 ± 0.0193), with a 0.73-unit increase in reported happiness for every $10,000 increase in income.