Question

The attached who.csv dataset contains real-world data from 2008. The variables included follow:
- Country: name of the country
- LifeExp: average life expectancy for the country in years
- InfantSurvival: proportion of those surviving to one year or more
- Under5Survival: proportion of those surviving to five years or more
- TBFree: proportion of the population without TB
- PropMD: proportion of the population who are MDs
- PropRN: proportion of the population who are RNs
- PersExp: mean personal expenditures on healthcare in US dollars at average exchange rate
- GovtExp: mean government expenditures per capita on healthcare, US dollars at average exchange rate
- TotExp: sum of personal and government expenditures

Solution

Load required libraries

library(tidyverse)
library(kableExtra)

Read the Data into memory

url_who <- "https://raw.githubusercontent.com/chinedu2301/data605-computer-maths/main/homeworks/hw12/who.csv"
who_raw <- read_csv(url_who)

Display the data

who_raw_display = kable(who_raw) %>%
                  kable_paper("hover", full_width = F) %>%
                  scroll_box(width = "850px", height = "350px")
who_raw_display
Country LifeExp InfantSurvival Under5Survival TBFree PropMD PropRN PersExp GovtExp TotExp
Afghanistan 42 0.835 0.743 0.99769 0.0002288 0.0005723 20 92 112
Albania 71 0.985 0.983 0.99974 0.0011431 0.0046144 169 3128 3297
Algeria 71 0.967 0.962 0.99944 0.0010605 0.0020914 108 5184 5292
Andorra 82 0.997 0.996 0.99983 0.0032973 0.0035000 2589 169725 172314
Angola 41 0.846 0.740 0.99656 0.0000704 0.0011462 36 1620 1656
Antigua and Barbuda 73 0.990 0.989 0.99991 0.0001429 0.0027738 503 12543 13046
Argentina 75 0.986 0.983 0.99952 0.0027802 0.0007410 484 19170 19654
Armenia 69 0.979 0.976 0.99920 0.0036987 0.0049189 88 1856 1944
Australia 82 0.995 0.994 0.99993 0.0023320 0.0091494 3181 187616 190797
Austria 80 0.996 0.996 0.99990 0.0036109 0.0064587 3788 189354 193142
Azerbaijan 64 0.927 0.911 0.99913 0.0036600 0.0084779 62 780 842
Bahamas 74 0.987 0.986 0.99960 0.0009541 0.0040459 1224 55783 57007
Bahrain 75 0.991 0.990 0.99955 0.0026793 0.0059675 710 45784 46494
Bangladesh 63 0.948 0.931 0.99609 0.0002749 0.0002530 12 75 87
Barbados 75 0.989 0.988 0.99989 0.0010990 0.0033720 725 24433 25158
Belarus 69 0.994 0.992 0.99929 0.0047587 0.0124571 204 11315 11519
Belgium 79 0.996 0.995 0.99989 0.0042305 0.0140792 3451 239105 242556
Belize 69 0.986 0.984 0.99944 0.0008901 0.0010745 198 5376 5574
Benin 55 0.912 0.852 0.99865 0.0000355 0.0006608 28 600 628
Bhutan 64 0.937 0.930 0.99904 0.0000801 0.0011248 52 407 459
Bolivia 66 0.950 0.939 0.99734 0.0011042 0.0019340 71 2860 2931
Bosnia and Herzegovina 75 0.987 0.985 0.99943 0.0014111 0.0046694 243 6578 6821
Botswana 52 0.910 0.876 0.99546 0.0003848 0.0025581 431 19604 20035
Brazil 72 0.981 0.980 0.99945 0.0010466 0.0034814 371 13940 14311
Brunei Darussalam 77 0.992 0.991 0.99901 0.0010471 0.0055497 519 30562 31081
Bulgaria 73 0.990 0.988 0.99959 0.0002535 0.0045532 272 11550 11822
Burkina Faso 47 0.878 0.796 0.99524 0.0000493 0.0004566 27 304 331
Burundi 49 0.891 0.819 0.99286 0.0000245 0.0001649 3 10 13
Cambodia 62 0.935 0.918 0.99335 0.0001442 0.0007836 29 140 169
Cameroon 51 0.913 0.851 0.99763 0.0001719 0.0014328 49 784 833
Canada 81 0.995 0.994 0.99996 0.0019126 0.0100446 3430 192800 196230
Cape Verde 70 0.975 0.966 0.99676 0.0004451 0.0007900 114 5394 5508
Central African Republic 48 0.886 0.826 0.99472 0.0000776 0.0003782 13 190 203
Chad 46 0.876 0.791 0.99430 0.0000330 0.0002387 22 234 256
Chile 78 0.992 0.991 0.99984 0.0010477 0.0006073 397 17952 18349
China 73 0.980 0.976 0.99799 0.0014021 0.0009795 81 1302 1383
Colombia 74 0.983 0.979 0.99941 0.0012898 0.0005255 201 12410 12611
Comoros 65 0.949 0.932 0.99914 0.0001406 0.0007188 14 304 318
Congo 54 0.921 0.874 0.99434 0.0002049 0.0009954 31 915 946
Cook Islands 73 0.984 0.981 0.99976 0.0014286 0.0057143 466 27264 27730
Costa Rica 78 0.989 0.988 0.99983 0.0011830 0.0008304 327 15376 15703
C<f4>te d’Ivoire 53 0.910 0.873 0.99253 0.0001100 0.0005382 34 315 349
Croatia 76 0.995 0.994 0.99936 0.0024693 0.0054592 651 30210 30861
Cuba 78 0.995 0.993 0.99990 0.0059081 0.0074448 310 21075 21385
Cyprus 80 0.997 0.996 0.99994 0.0332281 0.0039728 1350 39399 40749
Czech Republic 77 0.997 0.996 0.99990 0.0035916 0.0089430 868 56137 57005
Democratic Republic of the Congo 47 0.871 0.795 0.99355 0.0000961 0.0004747 5 66 71
Denmark 79 0.997 0.996 0.99993 0.0035519 0.0099582 4350 314588 318938
Djibouti 56 0.914 0.870 0.98700 0.0001709 0.0003614 61 4002 4063
Dominica 74 0.987 0.985 0.99984 0.0005588 0.0046618 288 13206 13494
Dominican Republic 70 0.975 0.971 0.99882 0.0016297 0.0015967 197 4148 4345
Ecuador 73 0.979 0.976 0.99805 0.0013888 0.0015593 147 3717 3864
Egypt 68 0.971 0.965 0.99969 0.0024256 0.0033679 78 1290 1368
El Salvador 71 0.978 0.975 0.99936 0.0011739 0.0007547 177 5700 5877
Equatorial Guinea 46 0.876 0.794 0.99596 0.0003085 0.0005464 211 6474 6685
Eritrea 63 0.952 0.926 0.99782 0.0000458 0.0005339 8 80 88
Estonia 73 0.995 0.994 0.99960 0.0032940 0.0069007 516 27393 27909
Ethiopia 56 0.923 0.877 0.99359 0.0000239 0.0001919 6 64 70
Fiji 69 0.984 0.982 0.99970 0.0004562 0.0019928 148 5355 5503
Finland 79 0.997 0.997 0.99996 0.0032992 0.0089204 2824 133956 136780
France 81 0.996 0.995 0.99989 0.0033797 0.0079244 3819 234850 238669
Gabon 58 0.940 0.909 0.99572 0.0003013 0.0051701 276 17220 17496
Gambia 59 0.916 0.886 0.99577 0.0000938 0.0011311 15 550 565
Georgia 70 0.972 0.968 0.99916 0.0046463 0.0040314 123 1248 1371
Germany 80 0.996 0.995 0.99995 0.0034417 0.0080106 3628 209250 212878
Ghana 57 0.924 0.880 0.99621 0.0001408 0.0008565 30 490 520
Greece 80 0.996 0.996 0.99984 0.0049947 0.0035962 2580 65195 67775
Grenada 68 0.983 0.980 0.99992 0.0007547 0.0030755 342 6944 7286
Guatemala 68 0.969 0.959 0.99897 0.0007648 0.0034528 132 2400 2532
Guinea 53 0.902 0.839 0.99534 0.0001075 0.0004801 21 66 87
Guinea-Bissau 48 0.881 0.800 0.99687 0.0001142 0.0006513 10 90 100
Guyana 64 0.954 0.938 0.99785 0.0004953 0.0023518 60 1400 1460
Haiti 61 0.940 0.920 0.99598 0.0002063 0.0000883 28 546 574
Honduras 70 0.977 0.973 0.99905 0.0005275 0.0012237 91 2162 2253
Hungary 73 0.994 0.993 0.99979 0.0030399 0.0091639 855 40602 41457
Iceland 81 0.998 0.997 0.99997 0.0037584 0.0099329 5154 395622 400776
India 63 0.943 0.924 0.99701 0.0005607 0.0011913 36 203 239
Indonesia 68 0.974 0.966 0.99747 0.0001289 0.0007863 26 588 614
Iran (Islamic Republic of) 71 0.970 0.965 0.99972 0.0008805 0.0015811 212 7973 8185
Iraq 56 0.963 0.953 0.99922 0.0006669 0.0013331 59 2948 3007
Ireland 80 0.996 0.996 0.99989 0.0029363 0.0194032 3993 193553 197546
Israel 81 0.996 0.995 0.99994 0.0036913 0.0062568 1533 93748 95281
Italy 81 0.997 0.996 0.99994 0.0036578 0.0071373 2692 140148 142840
Jamaica 72 0.974 0.968 0.99992 0.0008348 0.0016206 170 4399 4569
Japan 83 0.997 0.996 0.99971 0.0021130 0.0094615 2936 159192 162128
Jordan 71 0.979 0.975 0.99994 0.0023494 0.0032185 241 9047 9288
Kazakhstan 64 0.974 0.971 0.99858 0.0037556 0.0073853 148 5510 5658
Kenya 53 0.921 0.879 0.99666 0.0001233 0.0010153 24 231 255
Kiribati 65 0.953 0.936 0.99598 0.0002128 0.0027660 118 4578 4696
Kuwait 78 0.991 0.989 0.99975 0.0017416 0.0035768 687 51940 52627
Kyrgyzstan 66 0.964 0.959 0.99863 0.0024168 0.0058612 28 396 424
Lao People’s Democratic Republic 60 0.941 0.925 0.99708 0.0003473 0.0009724 18 84 102
Latvia 71 0.992 0.991 0.99940 0.0031455 0.0056094 443 18224 18667
Lebanon 70 0.973 0.969 0.99988 0.0020814 0.0011640 460 17400 17860
Lesotho 42 0.898 0.868 0.99487 0.0000446 0.0005629 41 437 478
Liberia 44 0.843 0.765 0.99422 0.0000288 0.0002892 10 413 423
Libyan Arab Jamahiriya 72 0.983 0.982 0.99982 0.0011707 0.0044974 223 13175 13398
Lithuania 71 0.993 0.991 0.99939 0.0039642 0.0076702 448 19932 20380
Luxembourg 80 0.997 0.996 0.99990 0.0027223 0.0095835 6330 476420 482750
Madagascar 59 0.928 0.885 0.99585 0.0002715 0.0002955 9 162 171
Malawi 50 0.924 0.880 0.99678 0.0000196 0.0005353 19 252 271
Malaysia 72 0.990 0.988 0.99875 0.0006518 0.0016612 222 6732 6954
Maldives 72 0.974 0.970 0.99946 0.0010067 0.0029533 316 8100 8416
Mali 46 0.881 0.783 0.99422 0.0000880 0.0006967 28 434 462
Malta 79 0.995 0.994 0.99995 0.0038617 0.0059531 1235 91776 93011
Marshall Islands 63 0.950 0.944 0.99759 0.0004138 0.0026207 294 18876 19170
Mauritania 58 0.922 0.875 0.99394 0.0001028 0.0006219 17 451 468
Mauritius 73 0.988 0.985 0.99960 0.0010407 0.0036773 218 4704 4922
Mexico 74 0.971 0.965 0.99975 0.0018596 0.0008418 474 16340 16814
Micronesia (Federated States of) 69 0.967 0.959 0.99891 0.0005405 0.0022523 290 5830 6120
Monaco 82 0.997 0.996 0.99998 0.0056364 0.0140606 6128 458700 464828
Mongolia 66 0.965 0.958 0.99809 0.0025843 0.0033881 35 1539 1574
Montenegro 74 0.991 0.990 0.99951 0.0020516 0.0057171 299 13725 14024
Morocco 72 0.966 0.963 0.99921 0.0005183 0.0007885 89 1947 2036
Mozambique 50 0.904 0.862 0.99376 0.0000245 0.0002948 14 315 329
Namibia 61 0.955 0.939 0.99342 0.0002921 0.0030020 165 3888 4053
Nauru 61 0.975 0.970 0.99866 0.0010000 0.0063000 567 30200 30767
Nepal 62 0.954 0.941 0.99756 0.0001948 0.0004278 16 64 80
Netherlands 80 0.996 0.995 0.99994 0.0036949 0.0146024 3560 187191 190751
New Zealand 80 0.995 0.994 0.99991 0.0019783 0.0083425 2403 159960 162363
Nicaragua 71 0.971 0.964 0.99926 0.0003697 0.0010597 75 2183 2258
Niger 42 0.852 0.747 0.99686 0.0000215 0.0002051 9 85 94
Nigeria 48 0.901 0.809 0.99385 0.0002413 0.0014532 27 392 419
Niue 70 0.966 0.958 0.99915 0.0020000 0.0110000 1082 35211 36293
Norway 80 0.997 0.996 0.99996 0.0037531 0.0161332 5910 380380 386290
Oman 74 0.990 0.989 0.99986 0.0016850 0.0037376 312 18886 19198
Pakistan 63 0.922 0.903 0.99737 0.0007851 0.0004393 15 105 120
Palau 69 0.990 0.989 0.99949 0.0015000 0.0060500 690 43890 44580
Panama 76 0.982 0.977 0.99957 0.0013476 0.0024811 351 17424 17775
Papua New Guinea 62 0.946 0.927 0.99487 0.0000443 0.0004581 34 390 424
Paraguay 75 0.981 0.978 0.99900 0.0010563 0.0017056 92 2006 2098
Peru 73 0.979 0.975 0.99813 0.0010801 0.0006201 125 4453 4578
Philippines 68 0.976 0.968 0.99568 0.0010476 0.0055749 37 882 919
Poland 75 0.994 0.993 0.99973 0.0019939 0.0052339 495 21266 21761
Portugal 79 0.997 0.996 0.99976 0.0034160 0.0046298 1800 75458 77258
Qatar 77 0.991 0.989 0.99927 0.0026188 0.0059440 2186 163680 165866
Republic of Korea 79 0.995 0.995 0.99877 0.0015618 0.0019159 973 41715 42688
Republic of Moldova 68 0.984 0.981 0.99846 0.0029097 0.0067908 58 1504 1562
Romania 73 0.986 0.984 0.99860 0.0019253 0.0042122 250 9504 9754
Russian Federation 66 0.990 0.987 0.99875 0.0042884 0.0084784 277 12483 12760
Rwanda 52 0.903 0.840 0.99438 0.0000456 0.0003854 19 220 239
Saint Kitts and Nevis 71 0.983 0.981 0.99983 0.0009200 0.0039600 478 9933 10411
Saint Lucia 75 0.988 0.986 0.99978 0.0045951 0.0020307 323 5068 5391
Saint Vincent and the Grenadines 70 0.983 0.980 0.99953 0.0007417 0.0037250 218 6302 6520
Samoa 68 0.977 0.972 0.99975 0.0002703 0.0016757 113 2093 2206
San Marino 82 0.997 0.997 0.99995 0.0351290 0.0708387 3490 278163 281653
Sao Tome and Principe 61 0.937 0.904 0.99748 0.0005226 0.0019871 49 2419 2468
Saudi Arabia 70 0.979 0.974 0.99938 0.0014172 0.0030657 448 27621 28069
Senegal 59 0.940 0.884 0.99496 0.0000492 0.0002723 38 504 542
Serbia 73 0.993 0.992 0.99959 0.0019877 0.0042873 212 7956 8168
Seychelles 72 0.988 0.987 0.99944 0.0014070 0.0073721 557 20502 21059
Sierra Leone 40 0.841 0.731 0.99023 0.0000293 0.0004371 8 164 172
Singapore 80 0.997 0.997 0.99975 0.0014560 0.0043565 944 30100 31044
Slovakia 74 0.993 0.992 0.99982 0.0031307 0.0066364 626 26096 26722
Slovenia 78 0.997 0.996 0.99985 0.0023603 0.0078516 1495 55233 56728
Solomon Islands 67 0.945 0.928 0.99806 0.0001240 0.0013492 28 442 470
South Africa 51 0.944 0.931 0.99002 0.0007214 0.0038205 437 10920 11357
Spain 81 0.996 0.996 0.99976 0.0030829 0.0074940 2152 118426 120578
Sri Lanka 72 0.989 0.987 0.99920 0.0005456 0.0017303 51 360 411
Sudan 60 0.938 0.911 0.99581 0.0002939 0.0008846 29 462 491
Suriname 68 0.971 0.961 0.99905 0.0004198 0.0015121 209 7326 7535
Swaziland 42 0.888 0.836 0.98916 0.0001508 0.0060212 146 2256 2402
Sweden 81 0.997 0.996 0.99995 0.0032155 0.0106857 3727 255696 259423
Switzerland 82 0.996 0.995 0.99995 0.0038648 0.0106174 5694 258248 263942
Syrian Arab Republic 72 0.988 0.987 0.99960 0.0005329 0.0014060 61 1581 1642
Tajikistan 64 0.944 0.932 0.99702 0.0019980 0.0049947 18 100 118
Thailand 72 0.993 0.992 0.99803 0.0003536 0.0027186 98 2079 2177
The former Yugoslav Republic of Macedonia 73 0.985 0.983 0.99967 0.0025476 0.0043384 224 11060 11284
Timor-Leste 66 0.953 0.945 0.99211 0.0000709 0.0016113 45 1053 1098
Togo 57 0.931 0.893 0.99213 0.0000351 0.0003022 18 205 223
Tonga 71 0.980 0.976 0.99966 0.0003000 0.0035000 104 1896 2000
Trinidad and Tobago 69 0.967 0.962 0.99990 0.0007560 0.0027508 513 3575 4088
Tunisia 72 0.981 0.977 0.99972 0.0013049 0.0027936 158 4620 4778
Turkey 73 0.976 0.974 0.99968 0.0015694 0.0029448 383 18632 19015
Turkmenistan 63 0.955 0.949 0.99922 0.0024923 0.0047001 156 4888 5044
Tuvalu 65 0.969 0.962 0.99496 0.0010000 0.0050000 212 8786 8998
Uganda 50 0.922 0.866 0.99439 0.0000739 0.0006344 22 78 100
Ukraine 67 0.980 0.976 0.99886 0.0030871 0.0083434 128 4624 4752
United Arab Emirates 78 0.992 0.992 0.99976 0.0011676 0.0024341 833 45969 46802
United Kingdom 79 0.995 0.994 0.99988 0.0022085 0.0122411 3064 240120 243184
United Republic of Tanzania 50 0.926 0.882 0.99541 0.0000208 0.0003369 17 225 242
United States of America 78 0.993 0.992 0.99997 0.0024132 0.0088152 6350 231822 238172
Uruguay 75 0.987 0.985 0.99969 0.0037178 0.0008646 404 15824 16228
Uzbekistan 68 0.962 0.956 0.99855 0.0026153 0.0107543 26 444 470
Vanuatu 69 0.970 0.964 0.99935 0.0001357 0.0016290 67 1056 1123
Venezuela (Bolivarian Republic of) 74 0.982 0.979 0.99948 0.0017653 0.0010298 247 10528 10775
Viet Nam 72 0.985 0.983 0.99775 0.0005215 0.0007170 37 270 307
Yemen 61 0.925 0.900 0.99868 0.0003101 0.0006325 39 448 487
Zambia 43 0.898 0.818 0.99432 0.0001081 0.0018818 36 595 631
Zimbabwe 43 0.945 0.915 0.99403 0.0001577 0.0007074 21 324 345


Glipmse of the data

glimpse(who_raw)
## Rows: 190
## Columns: 10
## $ Country        <chr> "Afghanistan", "Albania", "Algeria", "Andorra", "Angola…
## $ LifeExp        <dbl> 42, 71, 71, 82, 41, 73, 75, 69, 82, 80, 64, 74, 75, 63,…
## $ InfantSurvival <dbl> 0.835, 0.985, 0.967, 0.997, 0.846, 0.990, 0.986, 0.979,…
## $ Under5Survival <dbl> 0.743, 0.983, 0.962, 0.996, 0.740, 0.989, 0.983, 0.976,…
## $ TBFree         <dbl> 0.99769, 0.99974, 0.99944, 0.99983, 0.99656, 0.99991, 0…
## $ PropMD         <dbl> 0.000228841, 0.001143127, 0.001060478, 0.003297297, 0.0…
## $ PropRN         <dbl> 0.000572294, 0.004614439, 0.002091362, 0.003500000, 0.0…
## $ PersExp        <dbl> 20, 169, 108, 2589, 36, 503, 484, 88, 3181, 3788, 62, 1…
## $ GovtExp        <dbl> 92, 3128, 5184, 169725, 1620, 12543, 19170, 1856, 18761…
## $ TotExp         <dbl> 112, 3297, 5292, 172314, 1656, 13046, 19654, 1944, 1907…


Summary of the data

summary(who_raw)
##    Country             LifeExp      InfantSurvival   Under5Survival  
##  Length:190         Min.   :40.00   Min.   :0.8350   Min.   :0.7310  
##  Class :character   1st Qu.:61.25   1st Qu.:0.9433   1st Qu.:0.9253  
##  Mode  :character   Median :70.00   Median :0.9785   Median :0.9745  
##                     Mean   :67.38   Mean   :0.9624   Mean   :0.9459  
##                     3rd Qu.:75.00   3rd Qu.:0.9910   3rd Qu.:0.9900  
##                     Max.   :83.00   Max.   :0.9980   Max.   :0.9970  
##      TBFree           PropMD              PropRN             PersExp       
##  Min.   :0.9870   Min.   :0.0000196   Min.   :0.0000883   Min.   :   3.00  
##  1st Qu.:0.9969   1st Qu.:0.0002444   1st Qu.:0.0008455   1st Qu.:  36.25  
##  Median :0.9992   Median :0.0010474   Median :0.0027584   Median : 199.50  
##  Mean   :0.9980   Mean   :0.0017954   Mean   :0.0041336   Mean   : 742.00  
##  3rd Qu.:0.9998   3rd Qu.:0.0024584   3rd Qu.:0.0057164   3rd Qu.: 515.25  
##  Max.   :1.0000   Max.   :0.0351290   Max.   :0.0708387   Max.   :6350.00  
##     GovtExp             TotExp      
##  Min.   :    10.0   Min.   :    13  
##  1st Qu.:   559.5   1st Qu.:   584  
##  Median :  5385.0   Median :  5541  
##  Mean   : 40953.5   Mean   : 41696  
##  3rd Qu.: 25680.2   3rd Qu.: 26331  
##  Max.   :476420.0   Max.   :482750

Question 1

Provide a scatterplot of LifeExp~TotExp, and run simple linear regression. Do not transform the variables. Provide and interpret the F statistics, R^2, standard error,and p-values only. Discuss whether the assumptions of simple linear regression met.

Scatter Plot

p = ggplot(who_raw, aes(x=TotExp, y=LifeExp)) + geom_point() + theme_minimal() +
    theme(panel.grid.major = element_line(colour = "lemonchiffon3"),
    panel.grid.minor = element_line(colour = "lemonchiffon3"),
    axis.title = element_text(size = 13),
    axis.text = element_text(size = 11),
    axis.text.x = element_text(family = "sans",
        size = 11), axis.text.y = element_text(family = "sans",
        size = 11), plot.title = element_text(size = 15,
        hjust = 0.5), panel.background = element_rect(fill = "gray85"),
    plot.background = element_rect(fill = "antiquewhite")) +labs(title = "LifeExp vs TotExp",
    x = "TotExp", y = "LifeExp")
p


Simple Linear Regression

lm_who <- lm(LifeExp ~ TotExp, data = who_raw)
summary(lm_who)
## 
## Call:
## lm(formula = LifeExp ~ TotExp, data = who_raw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -24.764  -4.778   3.154   7.116  13.292 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6.475e+01  7.535e-01  85.933  < 2e-16 ***
## TotExp      6.297e-05  7.795e-06   8.079 7.71e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.371 on 188 degrees of freedom
## Multiple R-squared:  0.2577, Adjusted R-squared:  0.2537 
## F-statistic: 65.26 on 1 and 188 DF,  p-value: 7.714e-14

Residual vs Fitted

plot(fitted(lm_who),resid(lm_who), main="Residuals vs Fitted", xlab = "Fitted", ylab = "Residuals")
abline(0, 0)


Q-Q Plot

qqnorm(resid(lm_who))
qqline(resid(lm_who))


Interpret the F statistics, R^2, standard error,and p-values only:
F statistic: 65.26 on 1 and 188 DF
p-value: 7.714e-14 - The p-value being less than 0.05 means that the result is statistically significant. R-squared: 0.2577 - This means that the model only accounts for about 25% of variability in the data.
Residual standard error: The TotExp can deviate from the regression line by approximately 9.371

There are four (4) main assumptions for Linear Regression and they are:
  • Linearity:
  • The relationship between X and Y must be linear. As can be seen from the scatter plot above, LifeExp vs TotExp does not have a linear relationship and this condition is not satisfied.
  • Homoscedacity:
  • There should be constant variance in the residuals. From the Residual vs Fitted Plot shown above, it does not appear that there is a constant variance and thus the homoscedacity criterion is not satisfied.
  • Normality:
  • The data should be normally distributed. From the QQ plot shown above, the data does not follow a normal distribution.
  • Independence:
  • The observations should be independent of each other. This may be difficult to determine from looking at the data and we may have to rely on the assumptions provided by the data collector.
    Since the Linearity, Homoscedacity, and Normality conditions are not satisfied, we can conclude that the assumptions for Linear Regression are not met.

    Question 2

    Raise life expectancy to the 4.6 power (i.e., LifeExp^4.6). Raise total expenditures to the 0.06 power (nearly a log transform, TotExp^.06). Plot LifeExp^4.6 as a function of TotExp^.06, and r re-run the simple regression model using the transformed variables. Provide and interpret the F statistics, R^2, standard error, and p-values. Which model is “better?”

    who_raw2 <- who_raw %>% 
                mutate(LifeExp2 = LifeExp^(4.6),
                TotExp2 = TotExp^(0.06))

    Scatter Plot - Q2

    p2 = ggplot(who_raw2, aes(x=TotExp2, y=LifeExp2)) + geom_point() + theme_minimal() +
        theme(panel.grid.major = element_line(colour = "lemonchiffon3"),
        panel.grid.minor = element_line(colour = "lemonchiffon3"),
        axis.title = element_text(size = 13),
        axis.text = element_text(size = 11),
        axis.text.x = element_text(family = "sans",
            size = 11), axis.text.y = element_text(family = "sans",
            size = 11), plot.title = element_text(size = 15,
            hjust = 0.5), panel.background = element_rect(fill = "gray85"),
        plot.background = element_rect(fill = "antiquewhite")) +labs(title = "LifeExp2 vs TotExp2",
        x = "TotExp2", y = "LifeExp2")
    p2


    Simple Linear Regression - Q2

    lm_who2 <- lm(LifeExp2 ~ TotExp2, data = who_raw2)
    summary(lm_who2)
    ## 
    ## Call:
    ## lm(formula = LifeExp2 ~ TotExp2, data = who_raw2)
    ## 
    ## Residuals:
    ##        Min         1Q     Median         3Q        Max 
    ## -308616089  -53978977   13697187   59139231  211951764 
    ## 
    ## Coefficients:
    ##               Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept) -736527910   46817945  -15.73   <2e-16 ***
    ## TotExp2      620060216   27518940   22.53   <2e-16 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 90490000 on 188 degrees of freedom
    ## Multiple R-squared:  0.7298, Adjusted R-squared:  0.7283 
    ## F-statistic: 507.7 on 1 and 188 DF,  p-value: < 2.2e-16

    Residual vs Fitted - Q2

    plot(fitted(lm_who2),resid(lm_who2), main="Residuals vs Fitted - Q2", xlab = "Fitted", ylab = "Residuals")
    abline(0, 0)


    Q-Q Plot

    qqnorm(resid(lm_who2))
    qqline(resid(lm_who2))


    Interpret the F statistics, R^2, standard error,and p-values only:
    F statistic: 507.7 on 1 and 188 DF
    p-value: < 2.2e-16 - The p-value being less than 0.05 means that the result is statistically significant. R-squared: 0.7298 - This means that the model accounts for about 72.98% of variability in the data.
    Residual standard error: The TotExp can deviate from the regression line by approximately 90490000 on 188 degrees of freedom
    There are four (4) main assumptions for Linear Regression and they are:
  • Linearity:
  • The relationship between X and Y must be linear. As can be seen from the scatter plot above, LifeExp vs TotExp does have a linear relationship and this condition is satisfied.
  • Homoscedacity:
  • There should be constant variance in the residuals. From the Residual vs Fitted Plot shown above, it appears that there is a constant variance and thus the homoscedacity criterion is satisfied.
  • Normality:
  • The data should be normally distributed. From the QQ plot shown above, the data follows a normal distribution.
  • Independence:
  • The observations should be independent of each other. This may be difficult to determine from looking at the data and we may have to rely on the assumptions provided by the data collector. Also, we can see from the residual plot that the data points do not appear to be dependent on one another.
    Since the Linearity, Homoscedacity, Normality, and Independence conditions are satisfied, we can conclude that the assumptions for Linear Regression are met.

    Clearly, the second model that involves a transformation is better. This goes to tell us that sometimes even if the data does not appear to satisfy the assumptions of linear regression, with some transformations, we may be able to get a transformed data that will satisfy the criteria for linear regression and still make using linear regression possible on the dataset.


    Question 3

    Using the results from 3, forecast life expectancy when TotExp^.06 = 1.5. Then forecast life expectancy when TotExp^.06 = 2.5

    Based on the results of the model with the transformed data above, the linear relationship is given by:
    \(LifeExp2 = -736527910 + 620060216*TotExp2\)
    Using the equation above, we can forcast the values for life expectancy for the given TotExp.
    When TotExp^.06 = 1.5

    TotExp2 = 1.5
    LifeExp2 = -736527910 + 620060216*TotExp2
    LifeExp = LifeExp2 ^ (1/4.6) # We have to transform back to get the actual LifeExp
    LifeExp
    ## [1] 63.31153

    Therefore, for TotExp^.06 = 1.5, the LifeExp will be about 63.3 after transforming back to the original units.

    When TotExp^.06 = 2.5

    TotExp2 = 2.5
    LifeExp2 = -736527910 + 620060216*TotExp2
    LifeExp = LifeExp2 ^ (1/4.6) # We have to transform back to get the actual LifeExp
    LifeExp
    ## [1] 86.50645
    Therefore, for TotExp^.06 = 2.5, the LifeExp will be about 86.5 after transforming back to the original units.

    Question 4

    Build the following multiple regression model and interpret the F Statistics, R^2, standard error, and p-values. How good is the model? LifeExp = b0+b1 x PropMd + b2 x TotExp +b3 x PropMD x TotExp

    Scatter Plot - Q4

    p4 = ggplot(who_raw, aes(x=(TotExp + PropMD + (PropMD * TotExp)), y=LifeExp2)) + geom_point() + theme_minimal() +
        theme(panel.grid.major = element_line(colour = "lemonchiffon3"),
        panel.grid.minor = element_line(colour = "lemonchiffon3"),
        axis.title = element_text(size = 13),
        axis.text = element_text(size = 11),
        axis.text.x = element_text(family = "sans",
            size = 11), axis.text.y = element_text(family = "sans",
            size = 11), plot.title = element_text(size = 15,
            hjust = 0.5), panel.background = element_rect(fill = "gray85"),
        plot.background = element_rect(fill = "antiquewhite")) +labs(title = "LifeExp - Multi Regression",
        x = "TotExp + PropMD + (PropMD * TotExp)", y = "LifeExp")
    p4


    Simple Linear Regression - Q4

    lm_who4 <- lm(LifeExp ~ TotExp + PropMD + (PropMD * TotExp), data = who_raw)
    summary(lm_who4)
    ## 
    ## Call:
    ## lm(formula = LifeExp ~ TotExp + PropMD + (PropMD * TotExp), data = who_raw)
    ## 
    ## Residuals:
    ##     Min      1Q  Median      3Q     Max 
    ## -27.320  -4.132   2.098   6.540  13.074 
    ## 
    ## Coefficients:
    ##                 Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept)    6.277e+01  7.956e-01  78.899  < 2e-16 ***
    ## TotExp         7.233e-05  8.982e-06   8.053 9.39e-14 ***
    ## PropMD         1.497e+03  2.788e+02   5.371 2.32e-07 ***
    ## TotExp:PropMD -6.026e-03  1.472e-03  -4.093 6.35e-05 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 8.765 on 186 degrees of freedom
    ## Multiple R-squared:  0.3574, Adjusted R-squared:  0.3471 
    ## F-statistic: 34.49 on 3 and 186 DF,  p-value: < 2.2e-16

    Residual vs Fitted - Q4

    plot(fitted(lm_who4),resid(lm_who4), main="Residuals vs Fitted - Q4", xlab = "Fitted", ylab = "Residuals")
    abline(0, 0)


    Q-Q Plot

    qqnorm(resid(lm_who2))
    qqline(resid(lm_who2))


    Interpret the F statistics, R^2, standard error,and p-values only:
    F statistic: 34.49 on 3 and 186 DF
    p-value < 2.2e-16 - The p-value being less than 0.05 means that the result is statistically significant. R-squared: 0.3574 - This means that the model only accounts for about 35.74% of variability in the data.
    Residual standard error: The TotExp can deviate from the regression line by approximately 8.765 on 186 degrees of freedom
    There are four (4) main assumptions for Linear Regression and they are:
  • Linearity:
  • The relationship between X and Y must be linear. As can be seen from the scatter plot above, LifeExp vs TotExp + PropMD + (PropMD * TotExp) does not have a linear relationship and this condition is not satisfied.
  • Homoscedacity:
  • There should be constant variance in the residuals. From the Residual vs Fitted Plot shown above, it does not appear that there is a constant variance and thus the homoscedacity criterion is not satisfied.
  • Normality:
  • The data should be normally distributed. From the QQ plot shown above, the data does not follow a normal distribution.
  • Independence:
  • The observations should be independent of each other. This may be difficult to determine from looking at the data and we may have to rely on the assumptions provided by the data collector.
    Since the Linearity, Homoscedacity, and Normality conditions are not satisfied, we can conclude that the assumptions for Linear Regression are not met.

    Comparing the results of the three models, the second model is still the best and the third model is by no means better at all. Although the third model is slightly better than the first model without any transformation, it still falls far short when compared to the second model. This still tells us that even if the data can be transformed to produce better results, not all transformations will make sense and will produce better models.

    Question 5

    Forecast LifeExp when PropMD=.03 and TotExp = 14. Does this forecast seem realistic? Why or why not?

    Based on the results of the model with the transformed data above, the linear relationship is given by:
    \(LifeExp = 6.277e+01 + 1.497e+03 * PropMd + 7.233e-05 * TotExp - 6.026e-03 * PropMD * TotExp\) Using the equation above, we can forcast the values for life expectancy for the given TotExp.
    When TotExp^.06 = 1.5

    PropMD = 0.03
    TotExp = 14
    LifeExp5 = 6.277 * 10^(1) +  (1.497 * 10^(3) * PropMD) + (7.233 * 10^(-5) * TotExp) - (6.026 * 10^(-3) * PropMD * TotExp)
    LifeExp5
    ## [1] 107.6785

    From the data provided, the max LifeExp is about 83, and the mean LifeExp is about 67. Hence, a value of 107 for LifeExp is not realistic based on the given data.