── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)library(dplyr)library(sf)
Linking to GEOS 3.11.2, GDAL 3.8.2, PROJ 9.3.1; sf_use_s2() is TRUE
Rows: 3306 Columns: 16
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): Country Name, Country Code, Region, IncomeGroup
dbl (12): Year, Life Expectancy World Bank, Prevelance of Undernourishment, ...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
# A tibble: 6 × 16
`Country Name` `Country Code` Region IncomeGroup Year Life Expectancy Worl…¹
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 Afghanistan AFG South… Low income 2001 56.3
2 Angola AGO Sub-S… Lower midd… 2001 47.1
3 Albania ALB Europ… Upper midd… 2001 74.3
4 Andorra AND Europ… High income 2001 NA
5 United Arab Em… ARE Middl… High income 2001 74.5
6 Argentina ARG Latin… Upper midd… 2001 73.8
# ℹ abbreviated name: ¹`Life Expectancy World Bank`
# ℹ 10 more variables: `Prevelance of Undernourishment` <dbl>, CO2 <dbl>,
# `Health Expenditure %` <dbl>, `Education Expenditure %` <dbl>,
# Unemployment <dbl>, Corruption <dbl>, Sanitation <dbl>, Injuries <dbl>,
# Communicable <dbl>, NonCommunicable <dbl>
Clean the data
#removing all NA datadata_clean <- data |>filter(!is.na(`Life Expectancy World Bank`)) |>filter(!is.na(`Prevelance of Undernourishment`)) |>filter(!is.na(CO2)) |>filter(!is.na(`Health Expenditure %`)) |>filter(!is.na(`Education Expenditure %`)) |>filter(!is.na(Unemployment)) |>filter(!is.na(Sanitation)) |>filter(!is.na(Injuries)) #getting rid of columns i dont want/deem unnecessary data_clean2 <- data_clean[, -c(12, 15, 16)]#removing the non-numeric columns for the correlation plotcor_data_clean3 <- data_clean2[, -c(1, 2, 3, 4)]#renaming for easecor_data_clean4 <- cor_data_clean3 |>rename(lifeExp =`Life Expectancy World Bank`, undernourished =`Prevelance of Undernourishment`, healthExp =`Health Expenditure %`, eduExp =`Education Expenditure %`, unemployment = Unemployment, sanitation = Sanitation, injuries = Injuries)
ggplot(cor_data_clean4, aes(x =`undernourished`, y =`lifeExp`)) +labs(title ="Undernourishment vs Life Expectancy", x ="Undernourishment", y ="Life Expectancy") +geom_point(color ="pink") +geom_smooth(method = lm) +theme_minimal()
r squared went down so we stick with the first fit
Diagnostic Plots
par(mfrow =c(2,2))plot(fit1)
Equation
The equation for my model is: 71.71 + (-0.5634)undernourished + (-2.036e-07)CO2 + (0.5543)healthExp + (-0.1641)eduExp + (-0.1986)unemployment + (7.684e-02)sanitation + (5.695e-08)injuries