Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?
We were told to choose a data of interest and build a multiple regression model from that data set. The data that I chose is world happiness reports.
I will import the data that I choose, load it and read it.
worldhappiness <- read.csv(file = "https://raw.githubusercontent.com/jnaval88/DATA605/main/Week12/world-happiness-report.csv")
# looking at the data
glimpse(worldhappiness)
## Rows: 1,949
## Columns: 11
## $ Country.name <chr> "Afghanistan", "Afghanistan", "Afghan…
## $ year <int> 2008, 2009, 2010, 2011, 2012, 2013, 2…
## $ Life.Ladder <dbl> 3.724, 4.402, 4.758, 3.832, 3.783, 3.…
## $ Log.GDP.per.capita <dbl> 7.370, 7.540, 7.647, 7.620, 7.705, 7.…
## $ Social.support <dbl> 0.451, 0.552, 0.539, 0.521, 0.521, 0.…
## $ Healthy.life.expectancy.at.birth <dbl> 50.80, 51.20, 51.60, 51.92, 52.24, 52…
## $ Freedom.to.make.life.choices <dbl> 0.718, 0.679, 0.600, 0.496, 0.531, 0.…
## $ Generosity <dbl> 0.168, 0.190, 0.121, 0.162, 0.236, 0.…
## $ Perceptions.of.corruption <dbl> 0.882, 0.850, 0.707, 0.731, 0.776, 0.…
## $ Positive.affect <dbl> 0.518, 0.584, 0.618, 0.611, 0.710, 0.…
## $ Negative.affect <dbl> 0.258, 0.237, 0.275, 0.267, 0.268, 0.…
names(worldhappiness)
## [1] "Country.name" "year"
## [3] "Life.Ladder" "Log.GDP.per.capita"
## [5] "Social.support" "Healthy.life.expectancy.at.birth"
## [7] "Freedom.to.make.life.choices" "Generosity"
## [9] "Perceptions.of.corruption" "Positive.affect"
## [11] "Negative.affect"
finding number of row in my dataset before cleaning
# Number of rows in full dataset
nrow(worldhappiness)
## [1] 1949
# Exclude rows that have missing data in ANY variable
worldhappiness_no_NA <- na.omit(worldhappiness)
summary(worldhappiness_no_NA)
## Country.name year Life.Ladder Log.GDP.per.capita
## Length:1708 Min. :2005 Min. :2.375 Min. : 6.635
## Class :character 1st Qu.:2010 1st Qu.:4.595 1st Qu.: 8.394
## Mode :character Median :2013 Median :5.364 Median : 9.457
## Mean :2013 Mean :5.447 Mean : 9.322
## 3rd Qu.:2017 3rd Qu.:6.259 3rd Qu.:10.272
## Max. :2020 Max. :7.971 Max. :11.648
## Social.support Healthy.life.expectancy.at.birth Freedom.to.make.life.choices
## Min. :0.2900 Min. :32.30 Min. :0.2580
## 1st Qu.:0.7410 1st Qu.:58.17 1st Qu.:0.6440
## Median :0.8350 Median :65.10 Median :0.7575
## Mean :0.8103 Mean :63.23 Mean :0.7394
## 3rd Qu.:0.9080 3rd Qu.:68.69 3rd Qu.:0.8520
## Max. :0.9870 Max. :77.10 Max. :0.9850
## Generosity Perceptions.of.corruption Positive.affect
## Min. :-0.3350000 Min. :0.035 Min. :0.3220
## 1st Qu.:-0.1112500 1st Qu.:0.697 1st Qu.:0.6230
## Median :-0.0255000 Median :0.806 Median :0.7220
## Mean :-0.0006376 Mean :0.751 Mean :0.7095
## 3rd Qu.: 0.0890000 3rd Qu.:0.875 3rd Qu.:0.8013
## Max. : 0.6890000 Max. :0.983 Max. :0.9440
## Negative.affect
## Min. :0.0940
## 1st Qu.:0.2080
## Median :0.2590
## Mean :0.2694
## 3rd Qu.:0.3192
## Max. :0.7050
# Number of rows in full dataset after removing missing value
nrow(worldhappiness_no_NA)
## [1] 1708
# looking at new data
glimpse(worldhappiness_no_NA)
## Rows: 1,708
## Columns: 11
## $ Country.name <chr> "Afghanistan", "Afghanistan", "Afghan…
## $ year <int> 2008, 2009, 2010, 2011, 2012, 2013, 2…
## $ Life.Ladder <dbl> 3.724, 4.402, 4.758, 3.832, 3.783, 3.…
## $ Log.GDP.per.capita <dbl> 7.370, 7.540, 7.647, 7.620, 7.705, 7.…
## $ Social.support <dbl> 0.451, 0.552, 0.539, 0.521, 0.521, 0.…
## $ Healthy.life.expectancy.at.birth <dbl> 50.80, 51.20, 51.60, 51.92, 52.24, 52…
## $ Freedom.to.make.life.choices <dbl> 0.718, 0.679, 0.600, 0.496, 0.531, 0.…
## $ Generosity <dbl> 0.168, 0.190, 0.121, 0.162, 0.236, 0.…
## $ Perceptions.of.corruption <dbl> 0.882, 0.850, 0.707, 0.731, 0.776, 0.…
## $ Positive.affect <dbl> 0.518, 0.584, 0.618, 0.611, 0.710, 0.…
## $ Negative.affect <dbl> 0.258, 0.237, 0.275, 0.267, 0.268, 0.…
names(worldhappiness_no_NA)
## [1] "Country.name" "year"
## [3] "Life.Ladder" "Log.GDP.per.capita"
## [5] "Social.support" "Healthy.life.expectancy.at.birth"
## [7] "Freedom.to.make.life.choices" "Generosity"
## [9] "Perceptions.of.corruption" "Positive.affect"
## [11] "Negative.affect"
In this step I will perform a multiple linear regresssion on new dataset after cleaning
# Multiple Linear Regression
worldhappiness_lm <- lm(worldhappiness_no_NA$year ~ worldhappiness_no_NA$Life.Ladder + worldhappiness_no_NA$Healthy.life.expectancy.at.birth + worldhappiness_no_NA$Freedom.to.make.life.choices + worldhappiness_no_NA$Social.support + worldhappiness_no_NA$Generosity + worldhappiness_no_NA$Positive.affect + worldhappiness_no_NA$Negative.affect, data = worldhappiness_no_NA)
summary(worldhappiness_lm)
##
## Call:
## lm(formula = worldhappiness_no_NA$year ~ worldhappiness_no_NA$Life.Ladder +
## worldhappiness_no_NA$Healthy.life.expectancy.at.birth + worldhappiness_no_NA$Freedom.to.make.life.choices +
## worldhappiness_no_NA$Social.support + worldhappiness_no_NA$Generosity +
## worldhappiness_no_NA$Positive.affect + worldhappiness_no_NA$Negative.affect,
## data = worldhappiness_no_NA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.8517 -2.7037 0.2048 2.9311 9.3336
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 1999.90721 1.13114
## worldhappiness_no_NA$Life.Ladder -0.62356 0.14939
## worldhappiness_no_NA$Healthy.life.expectancy.at.birth 0.13015 0.01864
## worldhappiness_no_NA$Freedom.to.make.life.choices 11.70890 0.83700
## worldhappiness_no_NA$Social.support -1.52846 1.13442
## worldhappiness_no_NA$Generosity -2.20828 0.61086
## worldhappiness_no_NA$Positive.affect -2.92461 1.16561
## worldhappiness_no_NA$Negative.affect 11.89024 1.22595
## t value Pr(>|t|)
## (Intercept) 1768.039 < 2e-16 ***
## worldhappiness_no_NA$Life.Ladder -4.174 3.14e-05 ***
## worldhappiness_no_NA$Healthy.life.expectancy.at.birth 6.984 4.10e-12 ***
## worldhappiness_no_NA$Freedom.to.make.life.choices 13.989 < 2e-16 ***
## worldhappiness_no_NA$Social.support -1.347 0.178048
## worldhappiness_no_NA$Generosity -3.615 0.000309 ***
## worldhappiness_no_NA$Positive.affect -2.509 0.012197 *
## worldhappiness_no_NA$Negative.affect 9.699 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.676 on 1700 degrees of freedom
## Multiple R-squared: 0.189, Adjusted R-squared: 0.1857
## F-statistic: 56.61 on 7 and 1700 DF, p-value: < 2.2e-16
In this step I will create a quadratic term for healthy life Expectancy at birth
# Quadratic term for Healthy life Expectancy at birth
worldhappiness_Quad <- worldhappiness_no_NA$Healthy.life.expectancy.at.birth^2
worldhappiness_Quad_1 <- worldhappiness_no_NA$Freedom.to.make.life.choices^2
summary(worldhappiness_Quad)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1043 3384 4238 4057 4718 5944
summary(worldhappiness_Quad_1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.06656 0.41474 0.57381 0.56717 0.72590 0.97023
In this step I’m creating a dichotomous term for country name by filtering United States and create a dummy variable for it.
#Creating Dichotomous term
worldhappiness_no_NA$Country.name <- ifelse(worldhappiness_no_NA$Country.name == "United States", 0, 1)
print(worldhappiness_no_NA$Country.name)
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [260] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [297] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [334] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [371] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [408] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [445] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [482] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [519] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [556] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [593] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [630] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [667] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [704] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [741] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [778] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [815] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [852] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [889] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [926] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [963] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1000] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1037] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1074] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1111] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1148] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1185] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1222] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1259] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1296] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1333] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1370] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1407] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1444] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1481] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1518] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1555] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1592] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## [1629] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1666] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1703] 1 1 1 1 1 1
# Interaction
worldhappiness_Interaction <- worldhappiness_no_NA$Healthy.life.expectancy.at.birth * worldhappiness_no_NA$Freedom.to.make.life.choices
summary(worldhappiness_Interaction)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 12.05 38.05 46.94 47.17 56.53 72.32
plot(worldhappiness_Interaction)
# Plot of Residual Analysis
par(mfrow = c(1,1))
plot(worldhappiness_lm)
Ans. In conclusion I think the linear model was appropriate for this data set because the residuals vs fitted plot appears to have constant variability, and the QQ plot would indicate that the residuals are somewhat normally distributed. Also with Residual standard error of 3.676 on 1700 degrees of freedom along with Multiple R-squared of 0.189, Adjusted R-squared of 0.1857 and with a p-value of < 2.2e-16. lastly It show multiple relationship between variable.