'data.frame': 27820 obs. of 12 variables:
$ country : Factor w/ 101 levels "Albania","Antigua and Barbuda",..: 1 1 1 1 1 1 1 1 1 1 ...
$ year : int 1987 1987 1987 1987 1987 1987 1987 1987 1987 1987 ...
$ sex : Factor w/ 2 levels "female","male": 2 2 1 2 2 1 1 1 2 1 ...
$ age : Factor w/ 6 levels "15-24 years",..: 1 3 1 6 2 6 3 2 5 4 ...
$ suicides_no : int 21 16 14 1 9 1 6 4 1 0 ...
$ population : int 312900 308000 289700 21800 274300 35600 278800 257200 137500 311000 ...
$ suicides.100k.pop : num 6.71 5.19 4.83 4.59 3.28 2.81 2.15 1.56 0.73 0 ...
$ country.year : Factor w/ 2321 levels "Albania1987",..: 1 1 1 1 1 1 1 1 1 1 ...
$ HDI.for.year : num NA NA NA NA NA NA NA NA NA NA ...
$ gdp_for_year.... : Factor w/ 2321 levels "1,002,219,052,968",..: 727 727 727 727 727 727 727 727 727 727 ...
$ gdp_per_capita....: int 796 796 796 796 796 796 796 796 796 796 ...
$ generation : Factor w/ 6 levels "Boomers","G.I. Generation",..: 3 6 3 2 1 2 6 1 2 3 ...
country year sex
0.00 0.00 0.00
age suicides_no population
0.00 0.00 0.00
suicides.100k.pop country.year HDI.for.year
0.00 0.00 69.94
gdp_for_year.... gdp_per_capita.... generation
0.00 0.00 0.00
[1] 27820 12
'data.frame': 27820 obs. of 11 variables:
$ country : Factor w/ 101 levels "Albania","Antigua and Barbuda",..: 1 1 1 1 1 1 1 1 1 1 ...
$ year : int 1987 1987 1987 1987 1987 1987 1987 1987 1987 1987 ...
$ sex : Factor w/ 2 levels "female","male": 2 2 1 2 2 1 1 1 2 1 ...
$ age : Factor w/ 6 levels "15-24 years",..: 1 3 1 6 2 6 3 2 5 4 ...
$ suicides_no : int 21 16 14 1 9 1 6 4 1 0 ...
$ population : int 312900 308000 289700 21800 274300 35600 278800 257200 137500 311000 ...
$ suicides.100k.pop : num 6.71 5.19 4.83 4.59 3.28 2.81 2.15 1.56 0.73 0 ...
$ HDI.for.year : num NA NA NA NA NA NA NA NA NA NA ...
$ gdp_per_capita....: int 796 796 796 796 796 796 796 796 796 796 ...
$ generation : Factor w/ 6 levels "Boomers","G.I. Generation",..: 3 6 3 2 1 2 6 1 2 3 ...
$ continent : Factor w/ 5 levels "Africa","Americas",..: 4 4 4 4 4 4 4 4 4 4 ...
country year sex age suicides_no population suicides.100k.pop
1 Albania 1987 male 15-24 years 21 312900 6.71
2 Albania 1987 male 35-54 years 16 308000 5.19
3 Albania 1987 female 15-24 years 14 289700 4.83
4 Albania 1987 male 75+ years 1 21800 4.59
5 Albania 1987 male 25-34 years 9 274300 3.28
6 Albania 1987 female 75+ years 1 35600 2.81
HDI.for.year gdp_per_capita.... generation continent
1 NA 796 Generation X Europe
2 NA 796 Silent Europe
3 NA 796 Generation X Europe
4 NA 796 G.I. Generation Europe
5 NA 796 Boomers Europe
6 NA 796 G.I. Generation Europe
country
VARIABLE FROM MASTER DATASET [1] "year" "sex" "age"
[4] "suicides_no" "population" "suicides.100k.pop"
[7] "HDI.for.year" "gdp_per_capita...." "generation"
[10] "continent"
year sex age
0 0 0
suicides_no population suicides.100k.pop
0 0 0
HDI.for.year gdp_per_capita.... generation
0 0 0
continent
0
year sex age
0 0 0
suicides_no population suicides.100k.pop
0 0 0
HDI.for.year gdp_per_capita.... generation
100 0 0
continent
0
#TRAIN MODEL ON ‘COMPLETE’ SUB DATASET & PREDICT Human Development Index
Call:
lm(formula = HDI.for.year ~ ., data = sub)
Residuals:
Min 1Q Median 3Q Max
-0.187340 -0.021486 0.004924 0.031165 0.144485
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.231e+00 2.922e-01 -14.477 < 2e-16 ***
year 2.450e-03 1.472e-04 16.641 < 2e-16 ***
sexmale -3.063e-03 1.202e-03 -2.548 0.01085 *
age25-34 years -1.965e-03 2.018e-03 -0.974 0.33002
age35-54 years -9.841e-03 3.274e-03 -3.006 0.00266 **
age5-14 years 5.801e-03 2.744e-03 2.114 0.03453 *
age55-74 years -1.175e-02 5.270e-03 -2.229 0.02583 *
age75+ years -1.162e-02 6.447e-03 -1.803 0.07148 .
suicides_no 8.342e-06 1.190e-06 7.010 2.57e-12 ***
population 1.558e-09 1.991e-10 7.823 5.81e-15 ***
suicides.100k.pop 9.751e-05 3.952e-05 2.468 0.01362 *
gdp_per_capita.... 2.380e-06 2.727e-08 87.295 < 2e-16 ***
generationG.I. Generation 7.902e-03 4.832e-03 1.636 0.10198
generationGeneration X -3.750e-03 2.911e-03 -1.288 0.19771
generationGeneration Z -1.291e-02 6.507e-03 -1.984 0.04731 *
[ reached getOption("max.print") -- omitted 6 rows ]
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.04864 on 8343 degrees of freedom
Multiple R-squared: 0.7293, Adjusted R-squared: 0.7286
F-statistic: 1124 on 20 and 8343 DF, p-value: < 2.2e-16
[1] 0.04857793
[1] "year" "sex" "age"
[4] "suicides_no" "population" "suicides.100k.pop"
[7] "HDI.for.year" "gdp_per_capita...." "generation"
[10] "continent" "pred"
[1] "year" "sex" "age"
[4] "suicides_no" "population" "suicides.100k.pop"
[7] "HDI.for.year" "gdp_per_capita...." "generation"
[10] "continent" "HDI.pred"
[1] "year" "sex" "age"
[4] "suicides_no" "population" "suicides.100k.pop"
[7] "HDI.pred" "gdp_per_capita...." "generation"
[10] "continent" "HDI.for.year"
[1] "year" "sex" "age"
[4] "suicides_no" "population" "suicides.100k.pop"
[7] "HDI.for.year" "gdp_per_capita...." "generation"
[10] "continent" "pred"
names()
OF BOTH DATASETS AND MERGE THEM [1] "year" "sex" "age"
[4] "suicides_no" "population" "suicides.100k.pop"
[7] "HDI.for.year" "gdp_per_capita...." "generation"
[10] "continent"
[1] "year" "sex" "age"
[4] "suicides_no" "population" "suicides.100k.pop"
[7] "HDI.for.year" "gdp_per_capita...." "generation"
[10] "continent"
[1] 27820 10
year sex age suicides_no population suicides.100k.pop
73 1995 male 25-34 years 13 232900 5.58
74 1995 male 55-74 years 9 178000 5.06
75 1995 female 75+ years 2 40800 4.90
76 1995 female 15-24 years 13 283500 4.59
77 1995 male 15-24 years 11 241200 4.56
78 1995 male 75+ years 1 25100 3.98
HDI.for.year gdp_per_capita.... generation continent
73 0.619 835 Generation X Europe
74 0.619 835 Silent Europe
75 0.619 835 G.I. Generation Europe
76 0.619 835 Generation X Europe
77 0.619 835 Generation X Europe
78 0.619 835 G.I. Generation Europe
'data.frame': 27820 obs. of 10 variables:
$ year : int 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 ...
$ sex : Factor w/ 2 levels "female","male": 2 2 1 1 2 2 2 1 1 2 ...
$ age : Factor w/ 6 levels "15-24 years",..: 2 5 6 1 1 6 3 2 3 4 ...
$ suicides_no : int 13 9 2 13 11 1 14 7 8 6 ...
$ population : int 232900 178000 40800 283500 241200 25100 375900 264000 356400 376500 ...
$ suicides.100k.pop : num 5.58 5.06 4.9 4.59 4.56 3.98 3.72 2.65 2.24 1.59 ...
$ HDI.for.year : num 0.619 0.619 0.619 0.619 0.619 0.619 0.619 0.619 0.619 0.619 ...
$ gdp_per_capita....: int 835 835 835 835 835 835 835 835 835 835 ...
$ generation : Factor w/ 6 levels "Boomers","G.I. Generation",..: 3 6 2 3 3 2 1 3 1 5 ...
$ continent : Factor w/ 5 levels "Africa","Americas",..: 4 4 4 4 4 4 4 4 4 4 ...
Call:
lm(formula = suicides_no ~ ., data = final)
Residuals:
Min 1Q Median 3Q Max
-3035.9 -164.0 10.6 157.0 17446.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.127e+04 1.894e+03 11.231 < 2e-16 ***
year -1.237e+01 9.698e-01 -12.756 < 2e-16 ***
sexmale 9.221e+01 8.521e+00 10.821 < 2e-16 ***
age25-34 years 3.880e+01 1.612e+01 2.407 0.01609 *
age35-54 years 1.303e+02 2.519e+01 5.175 2.30e-07 ***
age5-14 years -8.879e+01 1.684e+01 -5.274 1.35e-07 ***
age55-74 years 1.180e+02 3.769e+01 3.131 0.00174 **
age75+ years -1.581e+00 4.389e+01 -0.036 0.97127
population 1.271e-04 1.081e-06 117.530 < 2e-16 ***
suicides.100k.pop 1.198e+01 2.518e-01 47.566 < 2e-16 ***
HDI.for.year 4.792e+03 1.409e+02 34.025 < 2e-16 ***
gdp_per_capita.... -1.142e-02 4.085e-04 -27.954 < 2e-16 ***
generationG.I. Generation -6.277e+01 3.306e+01 -1.899 0.05760 .
generationGeneration X -1.430e+01 1.925e+01 -0.743 0.45769
generationGeneration Z 7.279e+01 4.233e+01 1.720 0.08553 .
[ reached getOption("max.print") -- omitted 6 rows ]
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 638.7 on 27799 degrees of freedom
Multiple R-squared: 0.4991, Adjusted R-squared: 0.4987
F-statistic: 1385 on 20 and 27799 DF, p-value: < 2.2e-16
[1] 638.4137