Week 12

Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?

Getting started: Load libraries

For this assignment, I will use the Human Freedom Index 2023, which presents the state of human freedom worldwide based on personal, civil, and economic freedom indicators.

Importing and reading the dataset

data <- read_csv("https://raw.githubusercontent.com/Heleinef/Data-Science-Master_Heleine/main/Human%20Freedom%20Index_2023.csv")

## Rows: 495 Columns: 146
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (5): iso, countries, region, ef_government_tax_income_data, ef_governm...
## dbl (139): year, hf_score, hf_rank, hf_quartile, pf_rol_procedural, pf_rol_c...
## lgl   (2): pf_identity_inheritance_widows, pf_identity_inheritance_daughters
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

hfi <- data

names(hfi)

##   [1] "year"                                
##   [2] "iso"                                 
##   [3] "countries"                           
##   [4] "region"                              
##   [5] "hf_score"                            
##   [6] "hf_rank"                             
##   [7] "hf_quartile"                         
##   [8] "pf_rol_procedural"                   
##   [9] "pf_rol_civil"                        
##  [10] "pf_rol_criminal"                     
##  [11] "pf_rol_vdem"                         
##  [12] "pf_rol"                              
##  [13] "pf_ss_homicide"                      
##  [14] "pf_ss_homicide_data"                 
##  [15] "pf_ss_disappearances_disap"          
##  [16] "pf_ss_disappearances_violent"        
##  [17] "pf_ss_disappearances_violent_data"   
##  [18] "pf_ss_disappearances_organized"      
##  [19] "pf_ss_disappearances_fatalities"     
##  [20] "pf_ss_disappearances_fatalities_data"
##  [21] "pf_ss_disappearances_injuries"       
##  [22] "pf_ss_disappearances_injuries_data"  
##  [23] "pf_ss_disappearances_torture"        
##  [24] "pf_ss_killings"                      
##  [25] "pf_ss_disappearances"                
##  [26] "pf_ss"                               
##  [27] "pf_movement_vdem_foreign"            
##  [28] "pf_movement_vdem_men"                
##  [29] "pf_movement_vdem_women"              
##  [30] "pf_movement_vdem"                    
##  [31] "pf_movement_cld"                     
##  [32] "pf_movement"                         
##  [33] "pf_religion_freedom_vdem"            
##  [34] "pf_religion_freedom_cld"             
##  [35] "pf_religion_freedom"                 
##  [36] "pf_religion_suppression"             
##  [37] "pf_religion"                         
##  [38] "pf_assembly_entry"                   
##  [39] "pf_assembly_freedom_house"           
##  [40] "pf_assembly_freedom_bti"             
##  [41] "pf_assembly_freedom_cld"             
##  [42] "pf_assembly_freedom"                 
##  [43] "pf_assembly_parties_barriers"        
##  [44] "pf_assembly_parties_bans"            
##  [45] "pf_assembly_parties_auton"           
##  [46] "pf_assembly_parties"                 
##  [47] "pf_assembly_civil"                   
##  [48] "pf_assembly"                         
##  [49] "pf_expression_direct_killed"         
##  [50] "pf_expression_direct_killed_data"    
##  [51] "pf_expression_direct_jailed"         
##  [52] "pf_expression_direct_jailed_data"    
##  [53] "pf_expression_direct"                
##  [54] "pf_expression_vdem_cultural"         
##  [55] "pf_expression_vdem_harass"           
##  [56] "pf_expression_vdem_gov"              
##  [57] "pf_expression_vdem_internet"         
##  [58] "pf_expression_vdem_selfcens"         
##  [59] "pf_expression_vdem"                  
##  [60] "pf_expression_house"                 
##  [61] "pf_expression_bti"                   
##  [62] "pf_expression_cld"                   
##  [63] "pf_expression"                       
##  [64] "pf_identity_same_m"                  
##  [65] "pf_identity_same_f"                  
##  [66] "pf_identity_same"                    
##  [67] "pf_identity_divorce"                 
##  [68] "pf_identity_inheritance_widows"      
##  [69] "pf_identity_inheritance_daughters"   
##  [70] "pf_identity_inheritance"             
##  [71] "pf_identity_fgm"                     
##  [72] "pf_identity"                         
##  [73] "pf_score"                            
##  [74] "pf_rank"                             
##  [75] "ef_government_consumption"           
##  [76] "ef_government_consumption_data"      
##  [77] "ef_government_transfers"             
##  [78] "ef_government_transfers_data"        
##  [79] "ef_government_investment"            
##  [80] "ef_government_investment_data"       
##  [81] "ef_government_tax_income"            
##  [82] "ef_government_tax_income_data"       
##  [83] "ef_government_tax_payroll"           
##  [84] "ef_government_tax_payroll_data"      
##  [85] "ef_government_tax"                   
##  [86] "ef_government_soa"                   
##  [87] "ef_government"                       
##  [88] "ef_legal_judicial"                   
##  [89] "ef_legal_courts"                     
##  [90] "ef_legal_protection"                 
##  [91] "ef_legal_military"                   
##  [92] "ef_legal_integrity"                  
##  [93] "ef_legal_enforcement"                
##  [94] "ef_legal_regulatory"                 
##  [95] "ef_legal_police"                     
##  [96] "ef_gender"                           
##  [97] "ef_legal"                            
##  [98] "ef_money_growth"                     
##  [99] "ef_money_growth_data"                
## [100] "ef_money_sd"                         
## [101] "ef_money_sd_data"                    
## [102] "ef_money_inflation"                  
## [103] "ef_money_inflation_data"             
## [104] "ef_money_currency"                   
## [105] "ef_money"                            
## [106] "ef_trade_tariffs_revenue"            
## [107] "ef_trade_tariffs_revenue_data"       
## [108] "ef_trade_tariffs_mean"               
## [109] "ef_trade_tariffs_mean_data"          
## [110] "ef_trade_tariffs_sd"                 
## [111] "ef_trade_tariffs_sd_data"            
## [112] "ef_trade_tariffs"                    
## [113] "ef_trade_regulatory_nontariff"       
## [114] "ef_trade_regulatory_costs"           
## [115] "ef_trade_regulatory"                 
## [116] "ef_trade_black"                      
## [117] "ef_trade_movement_open"              
## [118] "ef_trade_movement_capital"           
## [119] "ef_trade_movement_visit"             
## [120] "ef_trade_movement_assets"            
## [121] "ef_trade_movement"                   
## [122] "ef_trade"                            
## [123] "ef_regulation_credit_ownership"      
## [124] "ef_regulation_credit_private"        
## [125] "ef_regulation_credit_interest"       
## [126] "ef_regulation_credit"                
## [127] "ef_regulation_labor_minwage"         
## [128] "ef_regulation_labor_firing"          
## [129] "ef_regulation_labor_bargain"         
## [130] "ef_regulation_labor_hours"           
## [131] "ef_regulation_labor_dismissal"       
## [132] "ef_regulation_labor_conscription"    
## [133] "ef_regulation_labor_foreign"         
## [134] "ef_regulation_labor"                 
## [135] "ef_regulation_business_burden"       
## [136] "ef_regulation_business_costs"        
## [137] "ef_regulation_business_impartial"    
## [138] "ef_regulation_business_compliance"   
## [139] "ef_regulation_business"              
## [140] "ef_regulation_enter_openness"        
## [141] "ef_regulation_enter_permits"         
## [142] "ef_regulation_enter_distortion"      
## [143] "ef_regulation_enter"                 
## [144] "ef_regulation"                       
## [145] "ef_score"                            
## [146] "ef_rank"

1. Fitting a simple Multiple Regression Model:

# Subset the dataset with the variables of interest
hfi_subset <- hfi[, c("ef_score", "hf_score", "pf_score", "ef_money", "ef_trade")]

# Fit multiple regression model
model <- lm(hf_score ~ ef_score + pf_score + ef_money + ef_trade, data = hfi_subset)

# Print the summary of the model
summary(model)

## 
## Call:
## lm(formula = hf_score ~ ef_score + pf_score + ef_money + ef_trade, 
##     data = hfi_subset)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0091435 -0.0025929 -0.0000205  0.0026162  0.0087637 
## 
## Coefficients:
##               Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)  1.449e-03  1.044e-03    1.387    0.166    
## ef_score     4.162e-01  4.853e-04  857.658   <2e-16 ***
## pf_score     5.835e-01  1.347e-04 4331.914   <2e-16 ***
## ef_money     9.187e-05  1.982e-04    0.464    0.643    
## ef_trade    -2.385e-05  2.375e-04   -0.100    0.920    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.003429 on 490 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 1.751e+07 on 4 and 490 DF,  p-value: < 2.2e-16

2.Including one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term to the model

Selecting my working variables

# Sub-setting the dataset with my variables of interest
hfi_subset <- hfi[, c("hf_score", "ef_score", "pf_score", "ef_money", "ef_trade", "region")]

Creating quadratic term

# Creating quadratic term for ef_score
hfi_subset$ef_score_squared <- hfi_subset$ef_score^2

Creating dichotomous term

# Creating dichotomous term for region (e.g., if region is "Europe", then 1, else 0)
hfi_subset$is_europe <- ifelse(hfi_subset$region == "Europe", 1, 0)

Creating interaction term between ef_score and ef_money

# Creating interaction term between ef_score and ef_money
hfi_subset$interaction_ef_money_ef_score <- hfi_subset$ef_money * hfi_subset$ef_score

2. Fitting the new multiple regression model

# Fitting multiple regression model with quadratic, dichotomous, and interaction terms
model <- lm(hf_score ~ ef_score + ef_score_squared + pf_score + ef_money + ef_trade + is_europe + interaction_ef_money_ef_score, data = hfi_subset)

The model summary

# Print the summary of the model
summary(model)

## 
## Call:
## lm(formula = hf_score ~ ef_score + ef_score_squared + pf_score + 
##     ef_money + ef_trade + is_europe + interaction_ef_money_ef_score, 
##     data = hfi_subset)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0090682 -0.0026219 -0.0001298  0.0025724  0.0089482 
## 
## Coefficients: (1 not defined because of singularities)
##                                 Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)                   -5.300e-03  4.505e-03   -1.177    0.240    
## ef_score                       4.177e-01  2.090e-03  199.858   <2e-16 ***
## ef_score_squared              -2.347e-05  2.280e-04   -0.103    0.918    
## pf_score                       5.835e-01  1.346e-04 4335.157   <2e-16 ***
## ef_money                       7.026e-04  8.921e-04    0.788    0.431    
## ef_trade                       4.769e-05  2.467e-04    0.193    0.847    
## is_europe                             NA         NA       NA       NA    
## interaction_ef_money_ef_score -1.306e-04  1.501e-04   -0.870    0.385    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.003422 on 488 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 1.173e+07 on 6 and 488 DF,  p-value: < 2.2e-16

Residual analysis

# Residual analysis
par(mfrow=c(1,1))
plot(model)

Conclusion:

Based on the summary output and the residual analysis, one can conclude that the model is appropriate. The residual plots analysis indicates that none of the assumptions of the linear model(homoscedasticity, linearity, and normality of residuals) have been violated. Therefore,the linear model is appropriate for the data.