Data
## Selection
load(here::here("data", "tidy_data", "countries_df.rda" ))
## Map data
load(here::here("data", "tidy_data", "world_map2_project.rda" ))
load(here::here("data", "tidy_data", "gapminder_dat.rda" ))
## Gapminder One
load(here::here("data", "dashboard_data", "murder_tidy.rda" ))
## Gapminder Two
load(here::here("data", "dashboard_data", "basic_tidy.rda" ))
Gapminder Case Study
I randomly chose five countries for my case study. They are Colombia,
Russia, South Africa, United States, and Venezuela.
# glimpse()
mine_df %>% glimpse()
## Rows: 285
## Columns: 11
## $ country <chr> "Colombia", "Russia", "South Africa", "United States"…
## $ continent <fct> Americas, Europe, Africa, Americas, Americas, America…
## $ region <fct> South America, Eastern Europe, Southern Africa, North…
## $ decade <fct> 1960s, 1960s, 1960s, 1960s, 1960s, 1960s, 1960s, 1960…
## $ year <int> 1960, 1960, 1960, 1960, 1960, 1961, 1961, 1961, 1961,…
## $ infant_mortality <dbl> 89.3, NA, NA, 25.9, 59.9, 87.6, NA, NA, 25.4, 58.1, 8…
## $ life_expectancy <dbl> 58.03, 68.61, 49.01, 69.91, 60.08, 58.63, 68.85, 49.4…
## $ fertility <dbl> 6.81, 2.56, 6.17, 3.67, 6.62, 6.80, 2.46, 6.14, 3.63,…
## $ population <dbl> 16480384, 119860289, 17396367, 186176524, 8146845, 16…
## $ gdp <dbl> 1.901766e+10, NA, 3.833607e+10, 2.479391e+12, 4.11187…
## $ gdp_cap_ppp <dbl> 4580, 6390, 7960, 20800, 13900, 4650, 6870, 8050, 209…
Gapminder Download One
This data set is about the total number of estimated deaths from
interpersonal violence and were downloaded from Gapminder.org/data.
# glimpse()
murder_tidy %>% glimpse()
## Rows: 3,880
## Columns: 3
## $ country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "A…
## $ Year <int> 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 20…
## $ Deaths <dbl> 3270, 3350, 3450, 3590, 3790, 3990, 4310, 4570, 4750, 4850, 49…
Gapminder Download Two
This data set is basic welfare index based on 7 indicators, infant
mortality rate, life expectancy, kilocalories per person per day,
literacy, mean years of schooling, educational equality and health
equality and downloaded from Gapminder.org/data.
# glimpse()
basic_tidy %>% glimpse()
## Rows: 3,200
## Columns: 3
## $ country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "A…
## $ Year <int> 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 20…
## $ Score <dbl> 25.8, 25.8, 26.3, 29.5, 29.5, 30.7, 30.7, 30.9, 34.0, 34.0, 34…
EDA Work
Requirements are a minimum of two (2) Exploratory Data Analysis
graphs; and a minimum of two (2) sets of Summary Statistics which
generally should directly complement, support, or relate to the EDA
visualizations.
Graph One—The death toll by murder
I want to use histogram to see the trends of each country
directly.
these_five <- c("Colombia", "Russia", "United States", "Venezuela", "South Africa")
murder_tidy %>%
filter(country%in%these_five ) %>%
ggplot( aes(x = Year, y = Deaths, fill = country)) +
geom_col() +
facet_wrap(~country)

labs(x = "year",
y = "Death toll",
color = "Country",
title = "Death Toll in Total by Years")
## $x
## [1] "year"
##
## $y
## [1] "Death toll"
##
## $colour
## [1] "Country"
##
## $title
## [1] "Death Toll in Total by Years"
##
## attr(,"class")
## [1] "labels"
Summary Stats
## Because we loaded plotly, we need to say which version of summarize
## Use dplyr::summarize()
murder_tidy %>%
group_by(Year, country) %>%
filter(country%in%these_five ) %>%
dplyr::summarize(Avg_Deaths = mean(Deaths, na.rm = TRUE),
SD_Deaths = sd(Deaths, na.rm = TRUE),
Min_Deaths = min(Deaths, na.rm = TRUE),
Max_Deaths = max(Deaths, na.rm = TRUE))
## # A tibble: 100 × 6
## # Groups: Year [20]
## Year country Avg_Deaths SD_Deaths Min_Deaths Max_Deaths
## <int> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1997 Colombia 24800 NA 24800 24800
## 2 1997 Russia 43400 NA 43400 43400
## 3 1997 South Africa 19800 NA 19800 19800
## 4 1997 United States 19300 NA 19300 19300
## 5 1997 Venezuela 4870 NA 4870 4870
## 6 1998 Colombia 24700 NA 24700 24700
## 7 1998 Russia 42500 NA 42500 42500
## 8 1998 South Africa 20400 NA 20400 20400
## 9 1998 United States 18700 NA 18700 18700
## 10 1998 Venezuela 5330 NA 5330 5330
## # … with 90 more rows
EDA notes
My plot tells me that the death toll of Colombia, Russia, and South
Africa is decreasing while Venezuela, on the contary, is increasing. The
death toll of United States does not have great change during the two
decades. Besides, in many years, the average of Russia is the 1st out of
these 5 countries, but it is keeping decreasing.
Graph Two—Basic welfare index
The same as the plot above.
## turn off scientific exponents?
# options(scipen = 99) # make hard to use
these_five <- c("Colombia", "Russia", "United States", "Venezuela", "South Africa")
basic_tidy %>%
filter(country%in%these_five ) %>%
ggplot( aes(x = Year, y = Score, fill = country)) +
geom_col() +
facet_wrap(~country)

labs(x = "Year",
y = "Score",
color = "Country",
title = "Basic Welfare Index by Years")
## $x
## [1] "Year"
##
## $y
## [1] "Score"
##
## $colour
## [1] "Country"
##
## $title
## [1] "Basic Welfare Index by Years"
##
## attr(,"class")
## [1] "labels"
Summary Stats
basic_tidy %>%
group_by(Year, country) %>%
filter(country%in%these_five ) %>%
dplyr::summarize(Avg_Score = mean(Score, na.rm = TRUE),
SD_Score = sd(Score, na.rm = TRUE),
Min_Score = min(Score, na.rm = TRUE),
Max_Score = max(Score, na.rm = TRUE))
## # A tibble: 100 × 6
## # Groups: Year [20]
## Year country Avg_Score SD_Score Min_Score Max_Score
## <int> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1997 Colombia 57.6 NA 57.6 57.6
## 2 1997 Russia 60.2 NA 60.2 60.2
## 3 1997 South Africa 43.7 NA 43.7 43.7
## 4 1997 United States 74.3 NA 74.3 74.3
## 5 1997 Venezuela 59.9 NA 59.9 59.9
## 6 1998 Colombia 57.5 NA 57.5 57.5
## 7 1998 Russia 60.3 NA 60.3 60.3
## 8 1998 South Africa 43.8 NA 43.8 43.8
## 9 1998 United States 74.3 NA 74.3 74.3
## 10 1998 Venezuela 59.9 NA 59.9 59.9
## # … with 90 more rows
EDA notes
Except Venezuela, all the index of countries have slightly increased
during the two decades while Venezuela has slightly decreased. With the
result in last part, is there any relationship between death toll by
murder and basic welfare index?
Same for BWI
basic_global <- gapminder_dat %>%
left_join(basic_tidy, by = c("year" = "Year", "country")) %>%
tibble() %>%
select(country, continent, region, decade, year, Score)
basic_global
## # A tibble: 10,545 × 6
## country continent region decade year Score
## <chr> <fct> <fct> <fct> <int> <dbl>
## 1 Albania Europe Southern Europe 1960s 1960 NA
## 2 Algeria Africa Northern Africa 1960s 1960 NA
## 3 Angola Africa Middle Africa 1960s 1960 NA
## 4 Antigua and Barbuda Americas Caribbean 1960s 1960 NA
## 5 Argentina Americas South America 1960s 1960 NA
## 6 Armenia Asia Western Asia 1960s 1960 NA
## 7 Aruba Americas Caribbean 1960s 1960 NA
## 8 Australia Oceania Australia and New Zealand 1960s 1960 NA
## 9 Austria Europe Western Europe 1960s 1960 NA
## 10 Azerbaijan Asia Western Asia 1960s 1960 NA
## # … with 10,535 more rows
basic_global %>%
na.omit() %>%
ggplot( aes(x = Score, fill = continent)) +
geom_density(alpha = 0.4) +
facet_wrap(~decade, scales = "free") +
labs(title = "Basic Welfare Index in continents",
x = "Scores") +
theme(legend.position = "top")

basic_global <- gapminder_dat %>%
left_join(basic_tidy, by = c("year" = "Year", "country")) %>%
tibble() %>%
select(country, continent, region, decade, year, Score)
basic_global
## # A tibble: 10,545 × 6
## country continent region decade year Score
## <chr> <fct> <fct> <fct> <int> <dbl>
## 1 Albania Europe Southern Europe 1960s 1960 NA
## 2 Algeria Africa Northern Africa 1960s 1960 NA
## 3 Angola Africa Middle Africa 1960s 1960 NA
## 4 Antigua and Barbuda Americas Caribbean 1960s 1960 NA
## 5 Argentina Americas South America 1960s 1960 NA
## 6 Armenia Asia Western Asia 1960s 1960 NA
## 7 Aruba Americas Caribbean 1960s 1960 NA
## 8 Australia Oceania Australia and New Zealand 1960s 1960 NA
## 9 Austria Europe Western Europe 1960s 1960 NA
## 10 Azerbaijan Asia Western Asia 1960s 1960 NA
## # … with 10,535 more rows
Three_decades <- c("1990s", "2000s", "2010s")
basic_global_stats <- basic_global %>%
filter(decade %in% Three_decades ) %>%
ggplot( aes(x = continent, y = Score, fill = continent)) +
geom_col(alpha = 0.5) +
facet_wrap(~decade) +
labs(title = "Basic Welfare Index around the World",
y = "BWI in continents",
x = "") +
coord_flip() +
guides(fill = "none")
basic_global_stats

EDA note
For BWI, it seems that it has similar trend to Death toll.
Summary Stats
murder_global %>%
group_by(decade) %>%
count(continent)
## # A tibble: 30 × 3
## # Groups: decade [6]
## decade continent n
## <fct> <fct> <int>
## 1 1960s Africa 510
## 2 1960s Americas 360
## 3 1960s Asia 470
## 4 1960s Europe 390
## 5 1960s Oceania 120
## 6 1970s Africa 510
## 7 1970s Americas 360
## 8 1970s Asia 470
## 9 1970s Europe 390
## 10 1970s Oceania 120
## # … with 20 more rows
## Okay, but
murder_global_stats <- murder_global %>%
na.omit() %>%
group_by(decade, continent, country)%>%
dplyr::summarize(Avg_Mur = mean(Deaths, na.rm = TRUE),
Median_Mur = mean(Deaths, na.rm = TRUE),
SD_Mur = sd(Deaths, na.rm = TRUE),
Min_Mur = min(Deaths, na.rm = TRUE),
Max_Mur = max(Deaths, na.rm = TRUE))
murder_global_stats
## # A tibble: 540 × 8
## # Groups: decade, continent [15]
## decade continent country Avg_Mur Median_Mur SD_Mur Min_Mur Max_Mur
## <fct> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1990s Africa Algeria 462. 462. 8.96 452 468
## 2 1990s Africa Angola 636. 636. 47.2 582 667
## 3 1990s Africa Benin 168 168 6 162 174
## 4 1990s Africa Botswana 110. 110. 41.1 69.8 152
## 5 1990s Africa Burkina Faso 102. 102. 3.05 98.9 105
## 6 1990s Africa Burundi 62.2 62.2 0.473 61.8 62.7
## 7 1990s Africa Cameroon 525 525 18.7 505 542
## 8 1990s Africa Cape Verde 48.2 48.2 1.39 47 49.7
## 9 1990s Africa Central African R… 212 212 11 201 223
## 10 1990s Africa Chad 192 192 12.1 181 205
## # … with 530 more rows
Same for BWI
## Okay, but
basic_global %>%
group_by(decade) %>%
count(continent)
## # A tibble: 30 × 3
## # Groups: decade [6]
## decade continent n
## <fct> <fct> <int>
## 1 1960s Africa 510
## 2 1960s Americas 360
## 3 1960s Asia 470
## 4 1960s Europe 390
## 5 1960s Oceania 120
## 6 1970s Africa 510
## 7 1970s Americas 360
## 8 1970s Asia 470
## 9 1970s Europe 390
## 10 1970s Oceania 120
## # … with 20 more rows
basic_global_stats <- basic_global %>%
na.omit() %>%
group_by(decade, continent, country)%>%
dplyr::summarize(Avg_Sco = mean(Score, na.rm = TRUE),
Median_Sco = mean(Score, na.rm = TRUE),
SD_Sco = sd(Score, na.rm = TRUE),
Min_Sco = min(Score, na.rm = TRUE),
Max_Sco = max(Score, na.rm = TRUE))
basic_global_stats
## # A tibble: 461 × 8
## # Groups: decade, continent [15]
## decade continent country Avg_Sco Median_Sco SD_Sco Min_Sco Max_Sco
## <fct> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1990s Africa Algeria 50.5 50.5 0.153 50.3 50.6
## 2 1990s Africa Angola 18.3 18.3 3.93 16 22.8
## 3 1990s Africa Benin 31.0 31.0 0.115 30.9 31.1
## 4 1990s Africa Botswana 42.1 42.1 0.252 41.9 42.4
## 5 1990s Africa Burkina Faso 27.7 27.7 1.56 26.8 29.5
## 6 1990s Africa Burundi 27.3 27.3 0 27.3 27.3
## 7 1990s Africa Cameroon 29.6 29.6 1.56 28.7 31.4
## 8 1990s Africa Cape Verde 48.7 48.7 1.68 46.8 50
## 9 1990s Africa Central African R… 23.1 23.1 0 23.1 23.1
## 10 1990s Africa Chad 24.3 24.3 0.300 24 24.6
## # … with 451 more rows
EDA notes
The results show that either in the five countries or in the world
the global trends of Death toll and Basic welfare index are similar.
User Viz
The requirement is minimum of one (1) user-interactive Geo-Spatial
Mapping of the data. If for whatever reason this seems irrelevant to
your project, please substitute with one (1) user-interactive Timeline
or Time Series analysis.
You are NOT limited to one (1) only, but please use the Interactive
feature for enlightenment not enterainment.
Likewise, please note that maps are often more useful when paired:
say, the global state of 1990 vs. 2020, for example. Or, whichever years
make the most sense for your data and questions.
Choropleth One
# Choose two good years for my data
murder_dat_1997 <- murder_tidy %>%
filter(Year == 1997) %>%
complete(country = world_map2$country,
fill = (list(number = NA )) ) %>%
left_join(world_map2, by = "country") %>%
replace_na(list(Year = 1997))
murder_dat_1997
## # A tibble: 99,442 × 11
## country Year Deaths long lat group order code_2 code_3 code_num form_name
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <chr>
## 1 Afghan… 1997 3270 74.9 37.2 2 12 AF AFG 4 Islamic …
## 2 Afghan… 1997 3270 74.8 37.2 2 13 AF AFG 4 Islamic …
## 3 Afghan… 1997 3270 74.8 37.2 2 14 AF AFG 4 Islamic …
## 4 Afghan… 1997 3270 74.7 37.3 2 15 AF AFG 4 Islamic …
## 5 Afghan… 1997 3270 74.7 37.3 2 16 AF AFG 4 Islamic …
## 6 Afghan… 1997 3270 74.7 37.3 2 17 AF AFG 4 Islamic …
## 7 Afghan… 1997 3270 74.6 37.2 2 18 AF AFG 4 Islamic …
## 8 Afghan… 1997 3270 74.4 37.2 2 19 AF AFG 4 Islamic …
## 9 Afghan… 1997 3270 74.4 37.1 2 20 AF AFG 4 Islamic …
## 10 Afghan… 1997 3270 74.5 37.1 2 21 AF AFG 4 Islamic …
## # … with 99,432 more rows
murder_map_1997 <- murder_dat_1997 %>%
filter(code_3 != "ATA") %>%
ggplot(aes(x = long,
y = lat,
group = group,
label = country)) +
geom_polygon(aes(fill = Deaths) )+
scale_fill_viridis_c(option = "C") +
labs(fill = "Deaths",
title = "Death Toll by Murder for 1997") +
theme_void()
# interactive version
plotly::ggplotly(murder_map_1997)
murder_map_1997

murder_dat_2016 <- murder_tidy %>%
filter(Year == 2016) %>%
complete(country = world_map2$country,
fill = (list(number = NA )) ) %>%
left_join(world_map2, by = "country") %>%
replace_na(list(Year = 2016))
murder_dat_2016
## # A tibble: 99,442 × 11
## country Year Deaths long lat group order code_2 code_3 code_num form_name
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <chr>
## 1 Afghan… 2016 6270 74.9 37.2 2 12 AF AFG 4 Islamic …
## 2 Afghan… 2016 6270 74.8 37.2 2 13 AF AFG 4 Islamic …
## 3 Afghan… 2016 6270 74.8 37.2 2 14 AF AFG 4 Islamic …
## 4 Afghan… 2016 6270 74.7 37.3 2 15 AF AFG 4 Islamic …
## 5 Afghan… 2016 6270 74.7 37.3 2 16 AF AFG 4 Islamic …
## 6 Afghan… 2016 6270 74.7 37.3 2 17 AF AFG 4 Islamic …
## 7 Afghan… 2016 6270 74.6 37.2 2 18 AF AFG 4 Islamic …
## 8 Afghan… 2016 6270 74.4 37.2 2 19 AF AFG 4 Islamic …
## 9 Afghan… 2016 6270 74.4 37.1 2 20 AF AFG 4 Islamic …
## 10 Afghan… 2016 6270 74.5 37.1 2 21 AF AFG 4 Islamic …
## # … with 99,432 more rows
murder_map_2016 <- murder_dat_2016 %>%
filter(code_3 != "ATA") %>%
ggplot(aes(x = long,
y = lat,
group = group,
label = country)) +
geom_polygon(aes(fill = Deaths) )+
scale_fill_viridis_c(option = "C") +
labs(fill = "Deaths",
title = "Death Toll by Murder, 2016") +
theme_void()
# interactive version
plotly::ggplotly(murder_map_2016)
murder_map_2016

## remove Antarctica "ATA" if you wish
## run ggplot version to test
# plotly::ggplotly(nuke_1975_map)
Notes: The main global trend of Death toll is decreasing. It is
improving.
Choropleth Two
# Similar to above
basic_dat_1997 <- basic_tidy %>%
filter(Year == 1997) %>%
complete(country = world_map2$country,
fill = (list(nuclear_per = NA)) ) %>%
left_join(world_map2, by = "country") %>%
replace_na(list(Year = 1997))
## remove Antarctica "ATA"
basic_map_1997 <- basic_dat_1997 %>%
filter(code_3 != "ATA") %>%
ggplot(aes(x = long,
y = lat,
group = group,
label = country)) +
geom_polygon(aes(fill = Score) ) +
scale_fill_viridis_c(option = "C") +
theme_void() +
labs(fill = "Score",
title = "Basic Welfare Index, 1997")
basic_map_1997

plotly::ggplotly(basic_map_1997)
basic_dat_2016 <- basic_tidy %>%
filter(Year == 2016) %>%
complete(country = world_map2$country,
fill = (list(nuclear_per = NA)) ) %>%
left_join(world_map2, by = "country") %>%
replace_na(list(Year = 2016))
## remove Antarctica "ATA"
basic_map_2016 <- basic_dat_2016 %>%
filter(code_3 != "ATA") %>%
ggplot(aes(x = long,
y = lat,
group = group,
label = country)) +
geom_polygon(aes(fill = Score) ) +
scale_fill_viridis_c(option = "C") +
theme_void() +
labs(fill = "Score",
title = "Basic Welfare Index, 2016")
basic_map_2016

plotly::ggplotly(basic_map_2016)
Notes: Also improving.
Timeline
OPTIONAL. One numeric variable over a range of years. Use color to
show country (or region, etc).
# Save if good.
big_five_1 <- bind_rows(mine_df)
murder_five <- big_five_1 %>%
left_join(murder_tidy, by = c("country", "year" = "Year") )
# I will save my plotly graph if I like
# I have added an additional color command
five_murder_pg <- murder_five %>%
plot_ly(x = ~year,
y = ~Deaths,
type = "scatter",
mode = "lines+markers",
color = ~country,
colors = "Set3",
showlegend = TRUE ) %>%
layout( xaxis = list(title = "Year"),
yaxis = list(title = "Death Toll by Murder per Nation"),
title = "Big Five: Death Toll by Murder per Nation" )
# Run it
five_murder_pg
Notes: It is strange that the death toll of Venezuela is rising while
other four is falling.
big_five_2 <- bind_rows(mine_df)
BWI_five <- big_five_2 %>%
left_join(basic_tidy, by = c("country", "year" = "Year") )
# I will save my plotly graph if I like
# I have added an additional color command
five_BWI_pg <- BWI_five %>%
plot_ly(x = ~year,
y = ~Score,
type = "scatter",
mode = "lines+markers",
color = ~country,
colors = "Set3",
showlegend = TRUE ) %>%
layout( xaxis = list(title = "Year"),
yaxis = list(title = "Basic Welfare Index per Nation"),
title = "Big Five: Basic Welfare Index per Nation" )
# Run it
five_BWI_pg
Notes: The index of Venezuela is also different from the other four
countries.
Model or Hypothesis Test
This part has three requirements:
-
A useful visualization which suggests either your inference or
hypothesis;
-
-
The model or test output in a readable format (not the base output);
-
An explanation of what the model or test output reveals.
lm Viz
A ggplot version is fine.
## GGplot version
lm_5M <- murder_five %>%
filter(between(year, 1997, 2016))%>%
ggplot( aes(x = year,
y = Deaths,
size = gdp_cap_ppp,
color = country) ) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", show.legend = FALSE) +
labs(title = "Five Nations",
y ="Year",
x = "Death Toll by Murder + Nations",
color = "Nation",
text = "Per Capita (PPP)",
size = "PPP",
subtitle = "Death Toll ~ gdp_cap_ppp + country + year")
lm_5M

# clean up the x axis?
lm_5B <- BWI_five %>%
filter(between(year, 1997, 2016))%>%
ggplot( aes(x = year,
y = Score,
size = gdp_cap_ppp,
color = country) ) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", show.legend = FALSE) +
labs(title = "Five Nations",
y ="Year",
x = "Basic Welfare Index + Nations",
color = "Nation",
text = "Per Capita (PPP)",
size = "PPP",
subtitle = "Basic Welfare Index ~ gdp_cap_ppp + country + year")
lm_5B

A plotly version requires more coding and care.
## GPPLot to plot
## Add text for hover
## Round down numbers for better viewing
## Add custom axis breaks?
## Remove the geom_smooth which does not play well in plotly
lm_graph_5M <- murder_five %>%
filter(between(year, 1997, 2016))%>%
mutate(across(where(is.numeric), round, 2))%>%
ggplot( aes(x = year,
y = Deaths,
size = gdp_cap_ppp,
color = country,
text = paste0("Nation: ", country,
"<br />Death Toll: ", Deaths,
"<br />Per Capita GDP (PPP): ", gdp_cap_ppp,
"<br />Year:", year )) ) +
geom_point(alpha = 0.7) +
labs(title = "Five Nations",
y ="Death Toll",
x = "Year + Per Capita GDP (PPP) + Nation",
color = "Nation",
text = "Per Capita (PPP)") +
scale_x_continuous() +
theme_minimal() +
guides(size = "none")
#lm_graph_5ey
lm_graph_5M <- plotly::ggplotly(lm_graph_5M, tooltip = "text" )
lm_graph_5M
Notes: The result of United States and Russia proves my “negative”
assumption may be wrong. The death toll of Russia is larger that United
States but Russia’s GDP is smaller than the US’s.
lm_graph_5B <- BWI_five %>%
filter(between(year, 1997, 2016))%>%
mutate(across(where(is.numeric), round, 2))%>%
ggplot( aes(x = year,
y = Score,
size = gdp_cap_ppp,
color = country,
text = paste0("Nation: ", country,
"<br />Score: ", Score,
"<br />Per Capita GDP (PPP): ", gdp_cap_ppp,
"<br />Year:", year )) ) +
geom_point(alpha = 0.7) +
labs(title = "Five Nations",
y ="Score",
x = "Year + Per Capita GDP (PPP) + Nation",
color = "Nation",
text = "Per Capita (PPP)") +
scale_x_continuous() +
theme_minimal() +
guides(size = "none")
#lm_graph_5ey
lm_graph_5B <- plotly::ggplotly(lm_graph_5B, tooltip = "text" )
lm_graph_5B
lm Output
my_fm <- as.formula("Deaths ~ gdp_cap_ppp + country + year")
# model
mod_5M <- lm(formula = my_fm , data = murder_five)
# summary(mod_5ey) ## the standard but messy output
# output to tidy
clean__mod_5M <- broom::tidy(mod_5M, conf.int = TRUE) %>%
mutate(across(where(is.numeric), round, 4))
# output -- use DT later
clean__mod_5M
## # A tibble: 7 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -47311. 2.27e+5 -0.208 0.835 -4.98e5 4.04e+5
## 2 gdp_cap_ppp -0.912 2.08e-1 -4.38 0 -1.33e0 -4.98e-1
## 3 countryRussia 23498. 2.31e+3 10.2 0 1.89e4 2.81e+4
## 4 countrySouth Africa 1185. 1.17e+3 1.01 0.315 -1.14e3 3.51e+3
## 5 countryUnited States 36179. 8.88e+3 4.08 0.0001 1.86e4 5.38e+4
## 6 countryVenezuela -3977. 1.94e+3 -2.05 0.0436 -7.84e3 -1.16e+2
## 7 year 38.9 1.14e+2 0.340 0.734 -1.88e2 2.66e+2
# see key stats
mod_stats <- broom::glance(mod_5M)
mod_stats
## # A tibble: 1 × 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.856 0.846 3695. 91.8 7.29e-37 6 -960. 1935. 1956.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
my_fm <- as.formula("Score ~ gdp_cap_ppp + country + year")
# model
mod_5B <- lm(formula = my_fm , data = BWI_five)
# summary(mod_5ey) ## the standard but messy output
# output to tidy
clean__mod_5B <- broom::tidy(mod_5B, conf.int = TRUE) %>%
mutate(across(where(is.numeric), round, 4))
# output -- use DT later
clean__mod_5B
## # A tibble: 7 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -516. 94.9 -5.44 0 -705. -328.
## 2 gdp_cap_ppp 0.0004 0.0001 4.17 0.0001 0.0002 0.0005
## 3 countryRussia 1.97 0.966 2.04 0.0437 0.0567 3.89
## 4 countrySouth Africa -16.2 0.490 -33.1 0 -17.2 -15.2
## 5 countryUnited States 0.284 3.71 0.0765 0.939 -7.08 7.65
## 6 countryVenezuela -3.05 0.812 -3.75 0.0003 -4.66 -1.44
## 7 year 0.286 0.0477 6.00 0 0.192 0.381
# see key stats
mod_stats <- broom::glance(mod_5B)
mod_stats
## # A tibble: 1 × 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.981 0.979 1.54 785. 2.41e-77 6 -182. 379. 400.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
lm Notes
My assumption of the relationships between death toll and economy,
index and economy seems wrong and needs further exploration.
Save Your Work
As a general rule, I would use the dashboard for “serving” not
“cooking.” You can make some final touches for the presentation. But I
do not recommend building all the plots from stratch in the
dashboard.
If you created a native Plotly graph, save that as it. If you created
a GGplot graph that you want to use plotly::ggploty() on,
save it as a ggplot. Do the touch-up work in the dash.
Likewise, save “as is” your tidy model output results, summary stats,
and at least one larger data set. Add the touch-up code,
DT::datatable() or reactable in the
dashboard.
Get your results
Run ls() to see what’s in your environment.
ls()
## [1] "basic_dat_1997" "basic_dat_2016" "basic_global"
## [4] "basic_global_stats" "basic_map_1997" "basic_map_2016"
## [7] "basic_tidy" "big_five_1" "big_five_2"
## [10] "BWI_five" "clean__mod_5B" "clean__mod_5M"
## [13] "country_ISO_codes2" "five_BWI_pg" "five_murder_pg"
## [16] "gapminder_dat" "lm_5B" "lm_5M"
## [19] "lm_graph_5B" "lm_graph_5M" "mine_df"
## [22] "mod_5B" "mod_5M" "mod_stats"
## [25] "murder_dat_1997" "murder_dat_2016" "murder_five"
## [28] "murder_global" "murder_global_stats" "murder_map_1997"
## [31] "murder_map_2016" "murder_tidy" "my_fm"
## [34] "these_five" "Three_decades" "world_map2"
Now, choose the items you want to keep and save them as a vector. Or,
be lazy and just save everything! But this might slow you down when
knitting the dashboard.
Save your results
dash_results <- c( "basic_dat_1997", "basic_dat_2016", "basic_global", "basic_global_stats", "basic_map_1997", "basic_map_2016", "basic_tidy", "big_five_1", "big_five_2", "BWI_five", "clean__mod_5B", "clean__mod_5M", "country_ISO_codes2", "five_BWI_pg", "gapminder_dat", "lm_5B", "lm_5M", "lm_graph_5B", "lm_graph_5M", "mine_df", "mod_5B", "mod_5M", "mod_stats", "murder_dat_1997", "murder_dat_2016", "murder_five", "murder_global", "murder_global_stats", "murder_map_1997", "murder_map_2016", "murder_tidy", "my_fm", "these_five", "Three_decades", "world_map2" )
save(list = dash_results, file = here::here("data", "dashboard_data", "dash_results.rda"))
#### OR
save.image( file = here::here("data", "dashboard_data","big_mess_images.RData"))
Either way, be sure to have a list of your objects for each section
of your dashboard. Consider mapping out your dashboard by hand first.
Then code it. Design before coding to tell a story.