Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?

Introduction

We were told to choose a data of interest and build a multiple regression model from that data set. The data that I chose is world happiness reports.

Data Import

I will import the data that I choose, load it and read it.

worldhappiness <- read.csv(file = "https://raw.githubusercontent.com/jnaval88/DATA605/main/Week12/world-happiness-report.csv")

Loading my data before performing any cleaning

# looking at the data

glimpse(worldhappiness)
## Rows: 1,949
## Columns: 11
## $ Country.name                     <chr> "Afghanistan", "Afghanistan", "Afghan…
## $ year                             <int> 2008, 2009, 2010, 2011, 2012, 2013, 2…
## $ Life.Ladder                      <dbl> 3.724, 4.402, 4.758, 3.832, 3.783, 3.…
## $ Log.GDP.per.capita               <dbl> 7.370, 7.540, 7.647, 7.620, 7.705, 7.…
## $ Social.support                   <dbl> 0.451, 0.552, 0.539, 0.521, 0.521, 0.…
## $ Healthy.life.expectancy.at.birth <dbl> 50.80, 51.20, 51.60, 51.92, 52.24, 52…
## $ Freedom.to.make.life.choices     <dbl> 0.718, 0.679, 0.600, 0.496, 0.531, 0.…
## $ Generosity                       <dbl> 0.168, 0.190, 0.121, 0.162, 0.236, 0.…
## $ Perceptions.of.corruption        <dbl> 0.882, 0.850, 0.707, 0.731, 0.776, 0.…
## $ Positive.affect                  <dbl> 0.518, 0.584, 0.618, 0.611, 0.710, 0.…
## $ Negative.affect                  <dbl> 0.258, 0.237, 0.275, 0.267, 0.268, 0.…
names(worldhappiness)
##  [1] "Country.name"                     "year"                            
##  [3] "Life.Ladder"                      "Log.GDP.per.capita"              
##  [5] "Social.support"                   "Healthy.life.expectancy.at.birth"
##  [7] "Freedom.to.make.life.choices"     "Generosity"                      
##  [9] "Perceptions.of.corruption"        "Positive.affect"                 
## [11] "Negative.affect"

finding number of row in my dataset before cleaning

# Number of rows in full dataset
nrow(worldhappiness)
## [1] 1949

Data Cleaning

# Exclude rows that have missing data in ANY variable
worldhappiness_no_NA <- na.omit(worldhappiness)
summary(worldhappiness_no_NA)
##  Country.name            year       Life.Ladder    Log.GDP.per.capita
##  Length:1708        Min.   :2005   Min.   :2.375   Min.   : 6.635    
##  Class :character   1st Qu.:2010   1st Qu.:4.595   1st Qu.: 8.394    
##  Mode  :character   Median :2013   Median :5.364   Median : 9.457    
##                     Mean   :2013   Mean   :5.447   Mean   : 9.322    
##                     3rd Qu.:2017   3rd Qu.:6.259   3rd Qu.:10.272    
##                     Max.   :2020   Max.   :7.971   Max.   :11.648    
##  Social.support   Healthy.life.expectancy.at.birth Freedom.to.make.life.choices
##  Min.   :0.2900   Min.   :32.30                    Min.   :0.2580              
##  1st Qu.:0.7410   1st Qu.:58.17                    1st Qu.:0.6440              
##  Median :0.8350   Median :65.10                    Median :0.7575              
##  Mean   :0.8103   Mean   :63.23                    Mean   :0.7394              
##  3rd Qu.:0.9080   3rd Qu.:68.69                    3rd Qu.:0.8520              
##  Max.   :0.9870   Max.   :77.10                    Max.   :0.9850              
##    Generosity         Perceptions.of.corruption Positive.affect 
##  Min.   :-0.3350000   Min.   :0.035             Min.   :0.3220  
##  1st Qu.:-0.1112500   1st Qu.:0.697             1st Qu.:0.6230  
##  Median :-0.0255000   Median :0.806             Median :0.7220  
##  Mean   :-0.0006376   Mean   :0.751             Mean   :0.7095  
##  3rd Qu.: 0.0890000   3rd Qu.:0.875             3rd Qu.:0.8013  
##  Max.   : 0.6890000   Max.   :0.983             Max.   :0.9440  
##  Negative.affect 
##  Min.   :0.0940  
##  1st Qu.:0.2080  
##  Median :0.2590  
##  Mean   :0.2694  
##  3rd Qu.:0.3192  
##  Max.   :0.7050
# Number of rows in full dataset after removing missing value
nrow(worldhappiness_no_NA)
## [1] 1708
# looking at new data 

glimpse(worldhappiness_no_NA)
## Rows: 1,708
## Columns: 11
## $ Country.name                     <chr> "Afghanistan", "Afghanistan", "Afghan…
## $ year                             <int> 2008, 2009, 2010, 2011, 2012, 2013, 2…
## $ Life.Ladder                      <dbl> 3.724, 4.402, 4.758, 3.832, 3.783, 3.…
## $ Log.GDP.per.capita               <dbl> 7.370, 7.540, 7.647, 7.620, 7.705, 7.…
## $ Social.support                   <dbl> 0.451, 0.552, 0.539, 0.521, 0.521, 0.…
## $ Healthy.life.expectancy.at.birth <dbl> 50.80, 51.20, 51.60, 51.92, 52.24, 52…
## $ Freedom.to.make.life.choices     <dbl> 0.718, 0.679, 0.600, 0.496, 0.531, 0.…
## $ Generosity                       <dbl> 0.168, 0.190, 0.121, 0.162, 0.236, 0.…
## $ Perceptions.of.corruption        <dbl> 0.882, 0.850, 0.707, 0.731, 0.776, 0.…
## $ Positive.affect                  <dbl> 0.518, 0.584, 0.618, 0.611, 0.710, 0.…
## $ Negative.affect                  <dbl> 0.258, 0.237, 0.275, 0.267, 0.268, 0.…
names(worldhappiness_no_NA)
##  [1] "Country.name"                     "year"                            
##  [3] "Life.Ladder"                      "Log.GDP.per.capita"              
##  [5] "Social.support"                   "Healthy.life.expectancy.at.birth"
##  [7] "Freedom.to.make.life.choices"     "Generosity"                      
##  [9] "Perceptions.of.corruption"        "Positive.affect"                 
## [11] "Negative.affect"

Multiple Linear Regression Model

In this step I will perform a multiple linear regresssion on new dataset after cleaning

# Multiple Linear Regression

worldhappiness_lm <- lm(worldhappiness_no_NA$year ~ worldhappiness_no_NA$Life.Ladder + worldhappiness_no_NA$Healthy.life.expectancy.at.birth + worldhappiness_no_NA$Freedom.to.make.life.choices + worldhappiness_no_NA$Social.support + worldhappiness_no_NA$Generosity + worldhappiness_no_NA$Positive.affect + worldhappiness_no_NA$Negative.affect, data = worldhappiness_no_NA)

summary(worldhappiness_lm)
## 
## Call:
## lm(formula = worldhappiness_no_NA$year ~ worldhappiness_no_NA$Life.Ladder + 
##     worldhappiness_no_NA$Healthy.life.expectancy.at.birth + worldhappiness_no_NA$Freedom.to.make.life.choices + 
##     worldhappiness_no_NA$Social.support + worldhappiness_no_NA$Generosity + 
##     worldhappiness_no_NA$Positive.affect + worldhappiness_no_NA$Negative.affect, 
##     data = worldhappiness_no_NA)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.8517  -2.7037   0.2048   2.9311   9.3336 
## 
## Coefficients:
##                                                         Estimate Std. Error
## (Intercept)                                           1999.90721    1.13114
## worldhappiness_no_NA$Life.Ladder                        -0.62356    0.14939
## worldhappiness_no_NA$Healthy.life.expectancy.at.birth    0.13015    0.01864
## worldhappiness_no_NA$Freedom.to.make.life.choices       11.70890    0.83700
## worldhappiness_no_NA$Social.support                     -1.52846    1.13442
## worldhappiness_no_NA$Generosity                         -2.20828    0.61086
## worldhappiness_no_NA$Positive.affect                    -2.92461    1.16561
## worldhappiness_no_NA$Negative.affect                    11.89024    1.22595
##                                                        t value Pr(>|t|)    
## (Intercept)                                           1768.039  < 2e-16 ***
## worldhappiness_no_NA$Life.Ladder                        -4.174 3.14e-05 ***
## worldhappiness_no_NA$Healthy.life.expectancy.at.birth    6.984 4.10e-12 ***
## worldhappiness_no_NA$Freedom.to.make.life.choices       13.989  < 2e-16 ***
## worldhappiness_no_NA$Social.support                     -1.347 0.178048    
## worldhappiness_no_NA$Generosity                         -3.615 0.000309 ***
## worldhappiness_no_NA$Positive.affect                    -2.509 0.012197 *  
## worldhappiness_no_NA$Negative.affect                     9.699  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.676 on 1700 degrees of freedom
## Multiple R-squared:  0.189,  Adjusted R-squared:  0.1857 
## F-statistic: 56.61 on 7 and 1700 DF,  p-value: < 2.2e-16

Quadratic Term

In this step I will create a quadratic term for healthy life Expectancy at birth

# Quadratic term for Healthy life Expectancy at birth
worldhappiness_Quad <- worldhappiness_no_NA$Healthy.life.expectancy.at.birth^2 

worldhappiness_Quad_1 <- worldhappiness_no_NA$Freedom.to.make.life.choices^2

summary(worldhappiness_Quad)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1043    3384    4238    4057    4718    5944
summary(worldhappiness_Quad_1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.06656 0.41474 0.57381 0.56717 0.72590 0.97023

Dichotomous Term

In this step I’m creating a dichotomous term for country name by filtering United States and create a dummy variable for it.

#Creating Dichotomous term
worldhappiness_no_NA$Country.name <-  ifelse(worldhappiness_no_NA$Country.name == "United States", 0, 1)

print(worldhappiness_no_NA$Country.name)
##    [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##   [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##   [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [260] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [297] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [334] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [371] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [408] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [445] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [482] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [519] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [556] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [593] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [630] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [667] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [704] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [741] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [778] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [815] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [852] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [889] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [926] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [963] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1000] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1037] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1074] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1111] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1148] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1185] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1222] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1259] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1296] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1333] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1370] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1407] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1444] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1481] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1518] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1555] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1592] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## [1629] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1666] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1703] 1 1 1 1 1 1

Interaction between healthy life Expect and freedom life choice

# Interaction

worldhappiness_Interaction <- worldhappiness_no_NA$Healthy.life.expectancy.at.birth * worldhappiness_no_NA$Freedom.to.make.life.choices

summary(worldhappiness_Interaction)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.05   38.05   46.94   47.17   56.53   72.32
plot(worldhappiness_Interaction)

Residual Analysis

# Plot of Residual Analysis

par(mfrow = c(1,1))
plot(worldhappiness_lm)

Conclusion

Was the linear model appropriate? Why or why not?

Ans. In conclusion I think the linear model was appropriate for this data set because the residuals vs fitted plot appears to have constant variability, and the QQ plot would indicate that the residuals are somewhat normally distributed. Also with Residual standard error of 3.676 on 1700 degrees of freedom along with Multiple R-squared of 0.189, Adjusted R-squared of 0.1857 and with a p-value of < 2.2e-16. lastly It show multiple relationship between variable.