Data
Below, load your data and offer one sentence (or two) description of
each data set (excepting our map data).
## Selection
load(here::here("data", "dashboard_data", "5_nations_df.rda" ))
## Map data
load(here::here("data", "dashboard_data", "world_map2_project.rda" ))
load(here::here("data", "dashboard_data", "gapminder_dat.rda" ))
## Gapminder One
load(here::here("data", "dashboard_data", "basic_tidy.rda" ))
Gapminder Case Study
I chose five countries in the data set: Brazil, Nigeria, China,
United Kingdom, and Australia. They are from different continent and
develop differently. By this way, I can be eaasier to find out the
rule.
five_df %>% glimpse()
## Rows: 285
## Columns: 11
## $ country <chr> "Australia", "Brazil", "China", "Nigeria", "United Ki…
## $ continent <fct> Oceania, Americas, Asia, Africa, Europe, Oceania, Ame…
## $ region <fct> Australia and New Zealand, South America, Eastern Asi…
## $ decade <fct> 1960s, 1960s, 1960s, 1960s, 1960s, 1960s, 1960s, 1960…
## $ year <int> 1960, 1960, 1960, 1960, 1960, 1961, 1961, 1961, 1961,…
## $ infant_mortality <dbl> 20.3, 129.4, 190.0, 165.0, 22.9, 20.0, 126.1, 161.0, …
## $ life_expectancy <dbl> 70.87, 55.27, 30.53, 40.39, 71.02, 71.14, 55.78, 32.9…
## $ fertility <dbl> 3.45, 6.21, 3.99, 6.35, 2.69, 3.55, 6.19, 3.28, 6.35,…
## $ population <dbl> 10292328, 72493585, 644450173, 45211614, 52410496, 10…
## $ gdp <dbl> 96677859364, 105343379555, 70348527260, 12836410903, …
## $ gdp_cap_ppp <dbl> 16100, 3910, 1220, 1500, 15800, 15900, 4150, 986, 151…
Gapminder Download One
This data set is basic welfare index based on 7 indicators, infant
mortality rate, life expectancy, kilocalories per person per day,
literacy, mean years of schooling, educational equality and health
equality and downloaded from Gapminder.org/data.
basic_tidy %>% glimpse()
## Rows: 7,200
## Columns: 3
## $ country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "A…
## $ Year <int> 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 19…
## $ Score <dbl> 16.2, 16.2, 16.1, 16.1, 15.9, 15.8, 15.8, 15.8, 15.8, 18.7, 18…
EDA Work
Requirements are a minimum of two (2) Exploratory Data Analysis
graphs; and a minimum of two (2) sets of Summary Statistics which
generally should directly complement, support, or relate to the EDA
visualizations.
Question 1—global trend
This graph is to show the current status of BWI in the world.
BWI_global <- gapminder_dat %>%
left_join(basic_tidy, by = c("year" = "Year", "country")) %>%
tibble() %>%
select(country, continent, region, decade, year, gdp_cap_ppp, life_expectancy, Score)
BWI_dat_1975 <- basic_tidy %>%
filter(Year == 1975) %>%
complete(country = world_map2$country,
fill = (list(number = NA )) ) %>%
left_join(world_map2, by = "country") %>%
replace_na(list(Year = 1975))
BWI_dat_1975
## # A tibble: 99,442 × 11
## country Year Score long lat group order code_2 code_3 code_num form_name
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <chr>
## 1 Afghani… 1975 16.2 74.9 37.2 2 12 AF AFG 4 Islamic …
## 2 Afghani… 1975 16.2 74.8 37.2 2 13 AF AFG 4 Islamic …
## 3 Afghani… 1975 16.2 74.8 37.2 2 14 AF AFG 4 Islamic …
## 4 Afghani… 1975 16.2 74.7 37.3 2 15 AF AFG 4 Islamic …
## 5 Afghani… 1975 16.2 74.7 37.3 2 16 AF AFG 4 Islamic …
## 6 Afghani… 1975 16.2 74.7 37.3 2 17 AF AFG 4 Islamic …
## 7 Afghani… 1975 16.2 74.6 37.2 2 18 AF AFG 4 Islamic …
## 8 Afghani… 1975 16.2 74.4 37.2 2 19 AF AFG 4 Islamic …
## 9 Afghani… 1975 16.2 74.4 37.1 2 20 AF AFG 4 Islamic …
## 10 Afghani… 1975 16.2 74.5 37.1 2 21 AF AFG 4 Islamic …
## # … with 99,432 more rows
BWI_map_1975 <- BWI_dat_1975 %>%
filter(code_3 != "ATA") %>%
ggplot(aes(x = long,
y = lat,
group = group,
label = country)) +
geom_polygon(aes(fill = Score) )+
scale_fill_viridis_c(option = "C") +
labs(fill = "Index",
title = "Basic Welfare Index for 1975") +
theme_void()
plotly::ggplotly(BWI_map_1975)
BWI_map_1975

BWI_dat_2019 <- basic_tidy %>%
filter(Year == 2019) %>%
complete(country = world_map2$country,
fill = (list(number = NA )) ) %>%
left_join(world_map2, by = "country") %>%
replace_na(list(Year = 2019))
BWI_dat_2019
## # A tibble: 99,442 × 11
## country Year Score long lat group order code_2 code_3 code_num form_name
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <chr>
## 1 Afghani… 2019 42 74.9 37.2 2 12 AF AFG 4 Islamic …
## 2 Afghani… 2019 42 74.8 37.2 2 13 AF AFG 4 Islamic …
## 3 Afghani… 2019 42 74.8 37.2 2 14 AF AFG 4 Islamic …
## 4 Afghani… 2019 42 74.7 37.3 2 15 AF AFG 4 Islamic …
## 5 Afghani… 2019 42 74.7 37.3 2 16 AF AFG 4 Islamic …
## 6 Afghani… 2019 42 74.7 37.3 2 17 AF AFG 4 Islamic …
## 7 Afghani… 2019 42 74.6 37.2 2 18 AF AFG 4 Islamic …
## 8 Afghani… 2019 42 74.4 37.2 2 19 AF AFG 4 Islamic …
## 9 Afghani… 2019 42 74.4 37.1 2 20 AF AFG 4 Islamic …
## 10 Afghani… 2019 42 74.5 37.1 2 21 AF AFG 4 Islamic …
## # … with 99,432 more rows
BWI_map_2019 <- BWI_dat_2019 %>%
filter(code_3 != "ATA") %>%
ggplot(aes(x = long,
y = lat,
group = group,
label = country)) +
geom_polygon(aes(fill = Score) )+
scale_fill_viridis_c(option = "C") +
labs(fill = "Index",
title = "Basic Welfare Index for 1975") +
theme_void()
plotly::ggplotly(BWI_map_2019)
BWI_map_2019

BWI_global %>%
group_by(continent) %>%
ggplot( aes(x = continent, y = Score, color = continent)) +
geom_boxplot() +
stat_summary(fun.y = mean, shape = 8) +
guides(color = "none") +
facet_wrap(~decade) +
labs(x = "",
y = "Index",
title = "Basic Welfare Index by decades")

BWI_global
## # A tibble: 10,545 × 8
## country continent region decade year gdp_cap_ppp life_expectancy Score
## <chr> <fct> <fct> <fct> <int> <dbl> <dbl> <dbl>
## 1 Albania Europe South… 1960s 1960 2660 62.9 NA
## 2 Algeria Africa North… 1960s 1960 6480 47.5 NA
## 3 Angola Africa Middl… 1960s 1960 2300 36.0 NA
## 4 Antigua and … Americas Carib… 1960s 1960 3630 63.0 NA
## 5 Argentina Americas South… 1960s 1960 10200 65.4 NA
## 6 Armenia Asia Weste… 1960s 1960 6670 66.9 NA
## 7 Aruba Americas Carib… 1960s 1960 NA 65.7 NA
## 8 Australia Oceania Austr… 1960s 1960 16100 70.9 NA
## 9 Austria Europe Weste… 1960s 1960 12000 68.8 NA
## 10 Azerbaijan Asia Weste… 1960s 1960 5500 61.3 NA
## # … with 10,535 more rows
Summary Stats
Should relate to your plot.
these_3 <- c("1970s", "1980s", "1990s", "2000s", "2010s")
BWI_global %>%
group_by(year, country) %>%
filter(decade%in%these_3) %>%
dplyr::summarize(Avg_Ind = mean(Score, na.rm = TRUE),
SD_Ind = sd(Score, na.rm = TRUE),
Min_Ind = min(Score, na.rm = TRUE),
Max_Ind = max(Score, na.rm = TRUE))
## # A tibble: 8,695 × 6
## # Groups: year [47]
## year country Avg_Ind SD_Ind Min_Ind Max_Ind
## <int> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1970 Albania NaN NA Inf -Inf
## 2 1970 Algeria NaN NA Inf -Inf
## 3 1970 Angola NaN NA Inf -Inf
## 4 1970 Antigua and Barbuda NaN NA Inf -Inf
## 5 1970 Argentina NaN NA Inf -Inf
## 6 1970 Armenia NaN NA Inf -Inf
## 7 1970 Aruba NaN NA Inf -Inf
## 8 1970 Australia NaN NA Inf -Inf
## 9 1970 Austria NaN NA Inf -Inf
## 10 1970 Azerbaijan NaN NA Inf -Inf
## # … with 8,685 more rows
EDA notes
My third plot shows me that (1) Americas and Asia are statistically
similar. The mean and median of the two continents are close but Asia’s
data is scattered compared with Americas. (2) The index of Africa is the
lowest and that of Europe keeps highest. (3) Oceania ranks second but
owns the most scattered data. Moreover, all of them keep
progressing.
Question 2—case study in five countries
In this section, we have a case study in 5 countries.
these_nation <- c("Australia", "China", "Nigeria", "Brazil", "United Kingdom")
five_2000s <- BWI_global%>%
filter(decade == "2000s", country%in%these_nation)
anova_one <- aov(Score ~ country, five_2000s )
results_one <- TukeyHSD(anova_one) %>%
broom::tidy() %>%
dplyr::arrange(desc(abs(estimate))) %>%
mutate(across(where(is.numeric), round, 4))
results_one$decade <- "2000s"
five_1980s <- BWI_global %>%
filter(decade == "1980s", country%in%these_nation)
anova_two <- aov(Score ~ country, five_1980s )
results_two <- TukeyHSD(anova_two ) %>%
broom::tidy() %>%
dplyr::arrange(desc(abs(estimate))) %>%
mutate(across(where(is.numeric), round, 4))
results_two$decade <- "1980s"
results_five <- results_one %>%
bind_rows(results_two) %>%
select(decade, everything())
tuk_gg <- results_five %>%
ggplot(aes(estimate, contrast ,
color = contrast) ) +
geom_point() +
geom_errorbarh(aes(xmin = conf.low, xmax = conf.high)) +
geom_vline(xintercept = 0, alpha = 0.3, linetype = 2) +
facet_wrap(~decade) +
labs(title = "Basic Welfare Index Means: 1980s & 2000s",
subtitle = "ANOVA Results for Five Countries. 95% CI.",
y = "", x = "Mean Difference Estimate") +
guides(color = "none") +
theme(axis.text.y = element_text(size=7))
tuk_gg <- plotly::ggplotly(tuk_gg , tooltip = c("text" ))
tuk_gg %>% layout(showlegend = FALSE)
Output
results_five
## # A tibble: 20 × 8
## decade term contrast null.value estimate conf.low conf.high adj.p.value
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2000s country Nigeria-Au… 0 -55.5 -58.8 -52.2 0
## 2 2000s country United Kin… 0 53.2 49.9 56.5 0
## 3 2000s country Nigeria-Ch… 0 -30.2 -33.6 -26.9 0
## 4 2000s country Nigeria-Br… 0 -28.7 -32.1 -25.4 0
## 5 2000s country Brazil-Aus… 0 -26.8 -30.1 -23.4 0
## 6 2000s country China-Aust… 0 -25.3 -28.6 -21.9 0
## 7 2000s country United Kin… 0 24.5 21.1 27.8 0
## 8 2000s country United Kin… 0 23.0 19.6 26.3 0
## 9 2000s country United Kin… 0 -2.3 -5.62 1.02 0.298
## 10 2000s country China-Braz… 0 1.51 -1.81 4.83 0.697
## 11 1980s country Nigeria-Au… 0 -48.5 -50.9 -46.1 0
## 12 1980s country United Kin… 0 46.8 44.4 49.2 0
## 13 1980s country Brazil-Aus… 0 -32.3 -34.7 -29.9 0
## 14 1980s country United Kin… 0 30.6 28.2 33.0 0
## 15 1980s country China-Aust… 0 -25.3 -27.7 -22.9 0
## 16 1980s country United Kin… 0 23.6 21.3 26.0 0
## 17 1980s country Nigeria-Ch… 0 -23.2 -25.6 -20.8 0
## 18 1980s country Nigeria-Br… 0 -16.2 -18.6 -13.8 0
## 19 1980s country China-Braz… 0 6.99 4.60 9.38 0
## 20 1980s country United Kin… 0 -1.65 -4.04 0.736 0.300
The relationship between economy and BWI seems very close. We will
prove it by this plot.
lm_5B <- BWI_global %>%
filter(country%in%these_nation, between(year, 1975, 2019))%>%
mutate(across(where(is.numeric), round, 2))%>%
ggplot( aes(x = year,
y = Score,
size = gdp_cap_ppp,
color = country)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", show.legend = FALSE) +
labs(title = "Five Countries",
y ="Index",
x = "Year + Per Capita GDP (PPP) + Country",
color = "Country",
text = "Per Capita (PPP)",
subtitle = "Basic Welfare Index ~ gdp_cap_ppp + country + year")
lm_5B

Summary Stats
Should relate to your plot.
my_fm <- as.formula("Score ~ year + gdp_cap_ppp +country")
mod_5B <- lm(formula = my_fm , data = BWI_global)
clean__mod_5B <- broom::tidy(mod_5B, conf.int = TRUE) %>%
mutate(across(where(is.numeric), round, 4))
clean__mod_5B
## # A tibble: 156 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -1131. 7.62 -149. 0 -1146. -1.12e+3
## 2 year 0.596 0.0038 156. 0 0.588 6.03e-1
## 3 gdp_cap_ppp 0.0001 0 12.8 0 0.0001 1 e-4
## 4 countryAlgeria -12.5 0.662 -18.9 0 -13.8 -1.12e+1
## 5 countryAngola -35.9 0.662 -54.3 0 -37.2 -3.46e+1
## 6 countryArgentina 0.913 0.668 1.37 0.172 -0.396 2.22e+0
## 7 countryArmenia -3.26 0.758 -4.31 0 -4.75 -1.78e+0
## 8 countryAustralia 17.1 0.695 24.6 0 15.8 1.85e+1
## 9 countryAustria 16.4 0.709 23.2 0 15.0 1.78e+1
## 10 countryAzerbaijan -16.3 0.758 -21.5 0 -17.8 -1.48e+1
## # … with 146 more rows
mod_stats <- broom::glance(mod_5B)
mod_stats
## # A tibble: 1 × 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.978 0.977 3.03 1703. 0 155 -15350. 31013. 32068.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
EDA notes
The result shows that all of the 5 countries keep progressing which
match the global trend. The differences are that the index of Australia
and United Kingdom that are developed countries is the highest, that of
China and Brazil that are developing countries is in the middle, and
that of Nigeria that is less developed countries is the lowest. The
adjusted R-squared value for the linear model: 0.977401 which shows that
the relationship between economy and BWI is close. It is a positive
correlation. However, from 1970s to 1980s, this correlation does not
happen between China and Nigeria, which is that China’s Per Capita GDP
(PPP) is lower than Nigeria, but the index of China is higher than
Nigeria. We conjecture that one of the reasons is that at that time
China was experiencing disasters both from human and the nature, leading
to economic depression.
Question 3—other factors
In this section, we will take life expectancy as an example to see
whether it will influence BWI.
five_2000s <- BWI_global%>%
filter(decade == "2000s", country%in%these_nation)
anova_one <- aov(life_expectancy ~ country, five_2000s )
results_one <- TukeyHSD(anova_one) %>%
broom::tidy() %>%
dplyr::arrange(desc(abs(estimate))) %>%
mutate(across(where(is.numeric), round, 4))
results_one$decade <- "2000s"
five_1980s <- BWI_global %>%
filter(decade == "1980s", country%in%these_nation)
anova_two <- aov(life_expectancy ~ country, five_1980s )
results_two <- TukeyHSD(anova_two ) %>%
broom::tidy() %>%
dplyr::arrange(desc(abs(estimate))) %>%
mutate(across(where(is.numeric), round, 4))
results_two$decade <- "1980s"
results_five_2 <- results_one %>%
bind_rows(results_two) %>%
select(decade, everything())
tuk_g <- results_five_2 %>%
ggplot(aes(estimate, contrast ,
color = contrast) ) +
geom_point() +
geom_errorbarh(aes(xmin = conf.low, xmax = conf.high)) +
geom_vline(xintercept = 0, alpha = 0.3, linetype = 2) +
facet_wrap(~decade) +
labs(title = "Life Expectancy Means: 1980s & 2000s",
subtitle = "ANOVA Results for Five Countries. 95% CI.",
y = "", x = "Mean Difference Estimate") +
guides(color = "none") +
theme(axis.text.y = element_text(size=7))
tuk_g
### Output
results_five_2
## # A tibble: 20 × 8
## decade term contrast null.value estimate conf.low conf.high adj.p.value
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2000s country Nigeria-Au… 0 -24.0 -25.5 -22.4 0
## 2 2000s country United Kin… 0 21.9 20.3 23.4 0
## 3 2000s country Nigeria-Br… 0 -15.2 -16.7 -13.7 0
## 4 2000s country Nigeria-Ch… 0 -14.8 -16.3 -13.2 0
## 5 2000s country China-Aust… 0 -9.2 -10.7 -7.66 0
## 6 2000s country Brazil-Aus… 0 -8.76 -10.3 -7.22 0
## 7 2000s country United Kin… 0 7.13 5.59 8.67 0
## 8 2000s country United Kin… 0 6.69 5.15 8.23 0
## 9 2000s country United Kin… 0 -2.07 -3.61 -0.529 0.0036
## 10 2000s country China-Braz… 0 -0.44 -1.98 1.10 0.926
## 11 1980s country Nigeria-Au… 0 -22.4 -23.4 -21.3 0
## 12 1980s country United Kin… 0 21.3 20.2 22.4 0
## 13 1980s country Nigeria-Br… 0 -12.3 -13.4 -11.3 0
## 14 1980s country Nigeria-Ch… 0 -12.3 -13.4 -11.2 0
## 15 1980s country China-Aust… 0 -10.0 -11.1 -8.97 0
## 16 1980s country Brazil-Aus… 0 -10.0 -11.1 -8.95 0
## 17 1980s country United Kin… 0 8.98 7.90 10.1 0
## 18 1980s country United Kin… 0 8.96 7.88 10.0 0
## 19 1980s country United Kin… 0 -1.07 -2.15 0.0118 0.0538
## 20 1980s country China-Braz… 0 -0.02 -1.10 1.06 1
lm_5B_lif <- BWI_global %>%
filter(between(year, 1975, 2019), country%in%these_nation)%>%
ggplot( aes(x = year,
y = Score,
size = life_expectancy,
color = country) ) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", show.legend = FALSE) +
labs(title = "Five Nations",
y ="Index",
x = "Year + Life Expectancy + Countries",
color = "Nation",
text = "Per Capita (PPP)",
size = "PPP",
subtitle = "Basic Welfare Index ~ life expectancy + country + year")
lm_5B_lif
### Output
my_fm_lif <- as.formula("Score ~ year + life_expectancy +country")
mod_5B_lif <- lm(formula = my_fm_lif , data = BWI_global)
clean_mod_5B_lif <- broom::tidy(mod_5B_lif, conf.int = TRUE) %>%
mutate(across(where(is.numeric), round, 4))
clean_mod_5B_lif
## # A tibble: 156 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -882. 7.38 -120. 0 -897. -868.
## 2 year 0.447 0.004 112. 0 0.439 0.455
## 3 life_expectancy 0.653 0.0112 58.4 0 0.631 0.674
## 4 countryAlgeria -9.48 0.537 -17.7 0 -10.5 -8.43
## 5 countryAngola -21.4 0.593 -36.0 0 -22.5 -20.2
## 6 countryArgentina 2.71 0.535 5.06 0 1.66 3.76
## 7 countryArmenia -0.340 0.615 -0.554 0.580 -1.55 0.864
## 8 countryAustralia 17.1 0.537 31.9 0 16.1 18.2
## 9 countryAustria 17.9 0.536 33.5 0 16.9 19.0
## 10 countryAzerbaijan -10.8 0.619 -17.5 0 -12.0 -9.60
## # … with 146 more rows
mod_stats_lif <- broom::glance(mod_5B_lif)
mod_stats_lif
## # A tibble: 1 × 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.986 0.985 2.45 2628. 0 155 -14050. 28414. 29469.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
Notes
The first plot tells us that both in 1980s and 2000s, meaningful
differences exist between the countries being compared. Thus, according
to this result, difference life expectancy exists between different
countries we chose. The second plot tells us that a linear positive
correlation exists in these five countries. With the increasing life
expectancy, BWI also keeps increasing. Similar to the result between per
capita GDP(ppp) and BWI, the five countries divide into three, Australia
and United Kingdom, China and Brazil, and Nigeria. Thus, life expectancy
is also the factor that influence BWI.
Main data set
BWI_global %>%
filter(country%in%these_nation, between(year, 1975, 2019))%>%
DT::datatable(., class = "compact")
Save your results
dash_results <- c("anova_one", "anova_two", "basic_map_1997", "basic_map_2016", "basic_tidy", "BWI_dat_1975", "BWI_dat_2019", "BWI_global", "BWI_map_1975", "BWI_map_2019", "clean__mod_5B", "clean_mod_5B_lif", "country_ISO_codes2", "five_1980s", "five_2000s", "five_df", "gapminder_dat", "lm_5B", "lm_5B_lif", "mod_5B", "mod_5B_lif", "mod_stats", "mod_stats_lif", "my_fm", "my_fm_lif", "results_five", "results_five_2", "results_one", "results_two", "these_3", "these_nation", "tuk_g", "tuk_gg", "world_map2" )
save(list = dash_results, file = here::here("data", "dashboard_data", "dash_results.rda") )
save.image( file = here::here("data", "dashboard_data","picture.RData"))