This experiment came from a Linear regression tutorial on Scribbr - https://www.scribbr.com/statistics/linear-regression-in-r
The dataset contains observations about income and happiness taken from a sample of 500 people.
# Import the CSV dataset into R.
income_dataset_url <- 'https://raw.githubusercontent.com/stephen-haslett/data605/data605-week-11/income_to_happy.csv'
income_dataset <- read.csv(income_dataset_url)
head(income_dataset)
## X income happiness
## 1 1 3.862647 2.314489
## 2 2 4.979381 3.433490
## 3 3 4.923957 4.599373
## 4 4 3.214372 2.791114
## 5 5 7.196409 5.596398
## 6 6 3.729643 2.458556
Take a quick look at the data.
summary(income_dataset)
## X income happiness
## Min. : 1.0 Min. :1.506 Min. :0.266
## 1st Qu.:125.2 1st Qu.:3.006 1st Qu.:2.266
## Median :249.5 Median :4.424 Median :3.473
## Mean :249.5 Mean :4.467 Mean :3.393
## 3rd Qu.:373.8 3rd Qu.:5.992 3rd Qu.:4.503
## Max. :498.0 Max. :7.482 Max. :6.863
hist(income_dataset$happiness)
plot(happiness ~ income, data = income_dataset)
Is there a linear relationship between income and happiness?
income_dataset_happiness_lm <- lm(happiness ~ income, data = income_dataset)
summary(income_dataset_happiness_lm)
##
## Call:
## lm(formula = happiness ~ income, data = income_dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.02479 -0.48526 0.04078 0.45898 2.37805
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.20427 0.08884 2.299 0.0219 *
## income 0.71383 0.01854 38.505 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7181 on 496 degrees of freedom
## Multiple R-squared: 0.7493, Adjusted R-squared: 0.7488
## F-statistic: 1483 on 1 and 496 DF, p-value: < 2.2e-16
Check if the residual means are close to zero. In this case they are as they hug the red lines in the graphs, which means our model is valid, and we can contune with our study.
par(mfrow = c(2,2))
plot(income_dataset_happiness_lm)
par(mfrow=c(1,1))
income_dataset_graph<-ggplot(income_dataset, aes(x=income, y=happiness))+
geom_point()
income_dataset_graph
income_dataset_graph <- income_dataset_graph + geom_smooth(method="lm", col="black")
income_dataset_graph