Final Project Worksheet

Name: Hazel Here

ID:2019221022 Here

Date: June 08, 2022

library(tidyverse)
library(here)
library(visdat)
library(plotly)

Data

## Selection
load(here::here("data", "tidy_data", "countries_df.rda" ))

## Map data
load(here::here("data", "tidy_data", "world_map2_project.rda" ))
load(here::here("data", "tidy_data", "gapminder_dat.rda" ))

## Gapminder One
load(here::here("data", "dashboard_data", "murder_tidy.rda" ))

## Gapminder Two
load(here::here("data", "dashboard_data", "basic_tidy.rda" ))

Gapminder Case Study

I randomly chose five countries for my case study. They are Colombia, Russia, South Africa, United States, and Venezuela.

# glimpse()
mine_df %>% glimpse()

## Rows: 285
## Columns: 11
## $ country          <chr> "Colombia", "Russia", "South Africa", "United States"…
## $ continent        <fct> Americas, Europe, Africa, Americas, Americas, America…
## $ region           <fct> South America, Eastern Europe, Southern Africa, North…
## $ decade           <fct> 1960s, 1960s, 1960s, 1960s, 1960s, 1960s, 1960s, 1960…
## $ year             <int> 1960, 1960, 1960, 1960, 1960, 1961, 1961, 1961, 1961,…
## $ infant_mortality <dbl> 89.3, NA, NA, 25.9, 59.9, 87.6, NA, NA, 25.4, 58.1, 8…
## $ life_expectancy  <dbl> 58.03, 68.61, 49.01, 69.91, 60.08, 58.63, 68.85, 49.4…
## $ fertility        <dbl> 6.81, 2.56, 6.17, 3.67, 6.62, 6.80, 2.46, 6.14, 3.63,…
## $ population       <dbl> 16480384, 119860289, 17396367, 186176524, 8146845, 16…
## $ gdp              <dbl> 1.901766e+10, NA, 3.833607e+10, 2.479391e+12, 4.11187…
## $ gdp_cap_ppp      <dbl> 4580, 6390, 7960, 20800, 13900, 4650, 6870, 8050, 209…

Gapminder Download One

This data set is about the total number of estimated deaths from interpersonal violence and were downloaded from Gapminder.org/data.

# glimpse()
murder_tidy %>% glimpse()

## Rows: 3,880
## Columns: 3
## $ country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "A…
## $ Year    <int> 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 20…
## $ Deaths  <dbl> 3270, 3350, 3450, 3590, 3790, 3990, 4310, 4570, 4750, 4850, 49…

Gapminder Download Two

This data set is basic welfare index based on 7 indicators, infant mortality rate, life expectancy, kilocalories per person per day, literacy, mean years of schooling, educational equality and health equality and downloaded from Gapminder.org/data.

# glimpse()
basic_tidy %>% glimpse()

## Rows: 3,200
## Columns: 3
## $ country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "A…
## $ Year    <int> 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 20…
## $ Score   <dbl> 25.8, 25.8, 26.3, 29.5, 29.5, 30.7, 30.7, 30.9, 34.0, 34.0, 34…

Research Questions

First, I want to know the trends of death toll because of murder and basic welfare index in the recent two decades in the five nations and then in the whole world. By doing this, I can have an overview about these two. Second, I want to explore the relationship between death toll and index. I also want to figure out how economy influences them. Third, I want to explore the relationship between death toll and economy, between index and economy. My assumption is that the former is negative and the latter is positive.

EDA Work

Requirements are a minimum of two (2) Exploratory Data Analysis graphs; and a minimum of two (2) sets of Summary Statistics which generally should directly complement, support, or relate to the EDA visualizations.

Graph One—The death toll by murder

I want to use histogram to see the trends of each country directly.

these_five <- c("Colombia", "Russia", "United States", "Venezuela", "South Africa")

murder_tidy %>%
  filter(country%in%these_five ) %>%
  ggplot(  aes(x = Year, y = Deaths, fill = country)) +
  geom_col() +
  facet_wrap(~country)

  labs(x = "year", 
       y = "Death toll",
       color = "Country",
       title = "Death Toll in Total by Years")

## $x
## [1] "year"
## 
## $y
## [1] "Death toll"
## 
## $colour
## [1] "Country"
## 
## $title
## [1] "Death Toll in Total by Years"
## 
## attr(,"class")
## [1] "labels"

Summary Stats

## Because we loaded plotly, we need to say which version of summarize
##  Use dplyr::summarize()
murder_tidy %>%
  group_by(Year, country) %>%
  filter(country%in%these_five ) %>%
  dplyr::summarize(Avg_Deaths = mean(Deaths, na.rm = TRUE),
                   SD_Deaths = sd(Deaths, na.rm = TRUE),
                   Min_Deaths = min(Deaths, na.rm = TRUE),
                   Max_Deaths = max(Deaths, na.rm = TRUE))

## # A tibble: 100 × 6
## # Groups:   Year [20]
##     Year country       Avg_Deaths SD_Deaths Min_Deaths Max_Deaths
##    <int> <chr>              <dbl>     <dbl>      <dbl>      <dbl>
##  1  1997 Colombia           24800        NA      24800      24800
##  2  1997 Russia             43400        NA      43400      43400
##  3  1997 South Africa       19800        NA      19800      19800
##  4  1997 United States      19300        NA      19300      19300
##  5  1997 Venezuela           4870        NA       4870       4870
##  6  1998 Colombia           24700        NA      24700      24700
##  7  1998 Russia             42500        NA      42500      42500
##  8  1998 South Africa       20400        NA      20400      20400
##  9  1998 United States      18700        NA      18700      18700
## 10  1998 Venezuela           5330        NA       5330       5330
## # … with 90 more rows

EDA notes

My plot tells me that the death toll of Colombia, Russia, and South Africa is decreasing while Venezuela, on the contary, is increasing. The death toll of United States does not have great change during the two decades. Besides, in many years, the average of Russia is the 1st out of these 5 countries, but it is keeping decreasing.

Graph Two—Basic welfare index

The same as the plot above.

## turn off scientific exponents?
# options(scipen = 99) # make hard to use
these_five <- c("Colombia", "Russia", "United States", "Venezuela", "South Africa")

basic_tidy %>%
  filter(country%in%these_five ) %>%
  ggplot(  aes(x = Year, y = Score, fill = country)) +
  geom_col() +
  facet_wrap(~country)

  labs(x = "Year", 
       y = "Score",
       color = "Country",
       title = "Basic Welfare Index by Years")

## $x
## [1] "Year"
## 
## $y
## [1] "Score"
## 
## $colour
## [1] "Country"
## 
## $title
## [1] "Basic Welfare Index by Years"
## 
## attr(,"class")
## [1] "labels"

Summary Stats

basic_tidy %>%
  group_by(Year, country) %>%
  filter(country%in%these_five ) %>%
  dplyr::summarize(Avg_Score = mean(Score, na.rm = TRUE),
                   SD_Score = sd(Score, na.rm = TRUE),
                   Min_Score = min(Score, na.rm = TRUE),
                   Max_Score = max(Score, na.rm = TRUE))

## # A tibble: 100 × 6
## # Groups:   Year [20]
##     Year country       Avg_Score SD_Score Min_Score Max_Score
##    <int> <chr>             <dbl>    <dbl>     <dbl>     <dbl>
##  1  1997 Colombia           57.6       NA      57.6      57.6
##  2  1997 Russia             60.2       NA      60.2      60.2
##  3  1997 South Africa       43.7       NA      43.7      43.7
##  4  1997 United States      74.3       NA      74.3      74.3
##  5  1997 Venezuela          59.9       NA      59.9      59.9
##  6  1998 Colombia           57.5       NA      57.5      57.5
##  7  1998 Russia             60.3       NA      60.3      60.3
##  8  1998 South Africa       43.8       NA      43.8      43.8
##  9  1998 United States      74.3       NA      74.3      74.3
## 10  1998 Venezuela          59.9       NA      59.9      59.9
## # … with 90 more rows

EDA notes

Except Venezuela, all the index of countries have slightly increased during the two decades while Venezuela has slightly decreased. With the result in last part, is there any relationship between death toll by murder and basic welfare index?

Graph Extra—The global trends of both two data

In this part, I want to figure out the trends of these two data in the world.

murder_global <- gapminder_dat  %>%
  left_join(murder_tidy, by = c("year" = "Year", "country")) %>%
  tibble() %>%
  select(country, continent, region, decade, year, Deaths)


murder_global

## # A tibble: 10,545 × 6
##    country             continent region                    decade  year Deaths
##    <chr>               <fct>     <fct>                     <fct>  <int>  <dbl>
##  1 Albania             Europe    Southern Europe           1960s   1960     NA
##  2 Algeria             Africa    Northern Africa           1960s   1960     NA
##  3 Angola              Africa    Middle Africa             1960s   1960     NA
##  4 Antigua and Barbuda Americas  Caribbean                 1960s   1960     NA
##  5 Argentina           Americas  South America             1960s   1960     NA
##  6 Armenia             Asia      Western Asia              1960s   1960     NA
##  7 Aruba               Americas  Caribbean                 1960s   1960     NA
##  8 Australia           Oceania   Australia and New Zealand 1960s   1960     NA
##  9 Austria             Europe    Western Europe            1960s   1960     NA
## 10 Azerbaijan          Asia      Western Asia              1960s   1960     NA
## # … with 10,535 more rows

murder_global %>%
  na.omit() %>%
  ggplot( aes(x = Deaths, fill = continent)) +
  geom_density(alpha = 0.4) +
  facet_wrap(~decade, scales = "free") +
  scale_x_log10() +
  labs(title = "Death Toll by Murder",
       x = "Death toll(log10)") +
  theme(legend.position = "top")

murder_global <- gapminder_dat  %>%
  left_join(murder_tidy, by = c("year" = "Year", "country")) %>%
  tibble() %>%
  select(country, continent, region, decade, year, Deaths)


murder_global

## # A tibble: 10,545 × 6
##    country             continent region                    decade  year Deaths
##    <chr>               <fct>     <fct>                     <fct>  <int>  <dbl>
##  1 Albania             Europe    Southern Europe           1960s   1960     NA
##  2 Algeria             Africa    Northern Africa           1960s   1960     NA
##  3 Angola              Africa    Middle Africa             1960s   1960     NA
##  4 Antigua and Barbuda Americas  Caribbean                 1960s   1960     NA
##  5 Argentina           Americas  South America             1960s   1960     NA
##  6 Armenia             Asia      Western Asia              1960s   1960     NA
##  7 Aruba               Americas  Caribbean                 1960s   1960     NA
##  8 Australia           Oceania   Australia and New Zealand 1960s   1960     NA
##  9 Austria             Europe    Western Europe            1960s   1960     NA
## 10 Azerbaijan          Asia      Western Asia              1960s   1960     NA
## # … with 10,535 more rows

Three_decades <- c("1990s", "2000s", "2010s")

murder_global_stats <- murder_global %>% 
  filter(decade %in% Three_decades ) %>%
  ggplot( aes(x = continent, y = log10(Deaths), fill = continent)) +
  geom_col(alpha = 0.5) +
  facet_wrap(~decade) +
  labs(title = "Death toll by murder",
       y = "Death Toll around the World",
       x = "") +
  coord_flip() +
  guides(fill = "none")

murder_global_stats

### EDA note

In this part, it is found that death toll in the world increased between 1990s and 2000s, and then decreased between 2000s and 2010s.

Same for BWI

basic_global <- gapminder_dat  %>%
  left_join(basic_tidy, by = c("year" = "Year", "country")) %>%
  tibble() %>%
  select(country, continent, region, decade, year, Score)


basic_global

## # A tibble: 10,545 × 6
##    country             continent region                    decade  year Score
##    <chr>               <fct>     <fct>                     <fct>  <int> <dbl>
##  1 Albania             Europe    Southern Europe           1960s   1960    NA
##  2 Algeria             Africa    Northern Africa           1960s   1960    NA
##  3 Angola              Africa    Middle Africa             1960s   1960    NA
##  4 Antigua and Barbuda Americas  Caribbean                 1960s   1960    NA
##  5 Argentina           Americas  South America             1960s   1960    NA
##  6 Armenia             Asia      Western Asia              1960s   1960    NA
##  7 Aruba               Americas  Caribbean                 1960s   1960    NA
##  8 Australia           Oceania   Australia and New Zealand 1960s   1960    NA
##  9 Austria             Europe    Western Europe            1960s   1960    NA
## 10 Azerbaijan          Asia      Western Asia              1960s   1960    NA
## # … with 10,535 more rows

basic_global %>%
  na.omit() %>%
  ggplot( aes(x = Score, fill = continent)) +
  geom_density(alpha = 0.4) +
  facet_wrap(~decade, scales = "free") +
  labs(title = "Basic Welfare Index in continents",
       x = "Scores") +
  theme(legend.position = "top")

basic_global <- gapminder_dat  %>%
  left_join(basic_tidy, by = c("year" = "Year", "country")) %>%
  tibble() %>%
  select(country, continent, region, decade, year, Score)


basic_global

## # A tibble: 10,545 × 6
##    country             continent region                    decade  year Score
##    <chr>               <fct>     <fct>                     <fct>  <int> <dbl>
##  1 Albania             Europe    Southern Europe           1960s   1960    NA
##  2 Algeria             Africa    Northern Africa           1960s   1960    NA
##  3 Angola              Africa    Middle Africa             1960s   1960    NA
##  4 Antigua and Barbuda Americas  Caribbean                 1960s   1960    NA
##  5 Argentina           Americas  South America             1960s   1960    NA
##  6 Armenia             Asia      Western Asia              1960s   1960    NA
##  7 Aruba               Americas  Caribbean                 1960s   1960    NA
##  8 Australia           Oceania   Australia and New Zealand 1960s   1960    NA
##  9 Austria             Europe    Western Europe            1960s   1960    NA
## 10 Azerbaijan          Asia      Western Asia              1960s   1960    NA
## # … with 10,535 more rows

Three_decades <- c("1990s", "2000s", "2010s")

basic_global_stats <- basic_global %>% 
  filter(decade %in% Three_decades ) %>%
  ggplot( aes(x = continent, y = Score, fill = continent)) +
  geom_col(alpha = 0.5) +
  facet_wrap(~decade) +
  labs(title = "Basic Welfare Index around the World",
       y = "BWI in continents",
       x = "") +
  coord_flip() +
  guides(fill = "none")

basic_global_stats

EDA note

For BWI, it seems that it has similar trend to Death toll.

Summary Stats

murder_global %>%
  group_by(decade) %>%
  count(continent)

## # A tibble: 30 × 3
## # Groups:   decade [6]
##    decade continent     n
##    <fct>  <fct>     <int>
##  1 1960s  Africa      510
##  2 1960s  Americas    360
##  3 1960s  Asia        470
##  4 1960s  Europe      390
##  5 1960s  Oceania     120
##  6 1970s  Africa      510
##  7 1970s  Americas    360
##  8 1970s  Asia        470
##  9 1970s  Europe      390
## 10 1970s  Oceania     120
## # … with 20 more rows

  ## Okay, but 
murder_global_stats <- murder_global %>%
  na.omit() %>%
   group_by(decade, continent, country)%>%
  dplyr::summarize(Avg_Mur = mean(Deaths, na.rm = TRUE),
                   Median_Mur = mean(Deaths, na.rm = TRUE),
                   SD_Mur = sd(Deaths, na.rm = TRUE),
                   Min_Mur = min(Deaths, na.rm = TRUE),
                   Max_Mur = max(Deaths, na.rm = TRUE))
murder_global_stats

## # A tibble: 540 × 8
## # Groups:   decade, continent [15]
##    decade continent country            Avg_Mur Median_Mur SD_Mur Min_Mur Max_Mur
##    <fct>  <fct>     <chr>                <dbl>      <dbl>  <dbl>   <dbl>   <dbl>
##  1 1990s  Africa    Algeria              462.       462.   8.96    452     468  
##  2 1990s  Africa    Angola               636.       636.  47.2     582     667  
##  3 1990s  Africa    Benin                168        168    6       162     174  
##  4 1990s  Africa    Botswana             110.       110.  41.1      69.8   152  
##  5 1990s  Africa    Burkina Faso         102.       102.   3.05     98.9   105  
##  6 1990s  Africa    Burundi               62.2       62.2  0.473    61.8    62.7
##  7 1990s  Africa    Cameroon             525        525   18.7     505     542  
##  8 1990s  Africa    Cape Verde            48.2       48.2  1.39     47      49.7
##  9 1990s  Africa    Central African R…   212        212   11       201     223  
## 10 1990s  Africa    Chad                 192        192   12.1     181     205  
## # … with 530 more rows

Same for BWI

  ## Okay, but 

basic_global %>%
  group_by(decade) %>%
  count(continent)

## # A tibble: 30 × 3
## # Groups:   decade [6]
##    decade continent     n
##    <fct>  <fct>     <int>
##  1 1960s  Africa      510
##  2 1960s  Americas    360
##  3 1960s  Asia        470
##  4 1960s  Europe      390
##  5 1960s  Oceania     120
##  6 1970s  Africa      510
##  7 1970s  Americas    360
##  8 1970s  Asia        470
##  9 1970s  Europe      390
## 10 1970s  Oceania     120
## # … with 20 more rows

basic_global_stats <- basic_global %>%
  na.omit() %>%
   group_by(decade, continent, country)%>%
  dplyr::summarize(Avg_Sco = mean(Score, na.rm = TRUE),
                   Median_Sco = mean(Score, na.rm = TRUE),
                   SD_Sco = sd(Score, na.rm = TRUE),
                   Min_Sco = min(Score, na.rm = TRUE),
                   Max_Sco = max(Score, na.rm = TRUE))
basic_global_stats

## # A tibble: 461 × 8
## # Groups:   decade, continent [15]
##    decade continent country            Avg_Sco Median_Sco SD_Sco Min_Sco Max_Sco
##    <fct>  <fct>     <chr>                <dbl>      <dbl>  <dbl>   <dbl>   <dbl>
##  1 1990s  Africa    Algeria               50.5       50.5  0.153    50.3    50.6
##  2 1990s  Africa    Angola                18.3       18.3  3.93     16      22.8
##  3 1990s  Africa    Benin                 31.0       31.0  0.115    30.9    31.1
##  4 1990s  Africa    Botswana              42.1       42.1  0.252    41.9    42.4
##  5 1990s  Africa    Burkina Faso          27.7       27.7  1.56     26.8    29.5
##  6 1990s  Africa    Burundi               27.3       27.3  0        27.3    27.3
##  7 1990s  Africa    Cameroon              29.6       29.6  1.56     28.7    31.4
##  8 1990s  Africa    Cape Verde            48.7       48.7  1.68     46.8    50  
##  9 1990s  Africa    Central African R…    23.1       23.1  0        23.1    23.1
## 10 1990s  Africa    Chad                  24.3       24.3  0.300    24      24.6
## # … with 451 more rows

EDA notes

The results show that either in the five countries or in the world the global trends of Death toll and Basic welfare index are similar.

User Viz

The requirement is minimum of one (1) user-interactive Geo-Spatial Mapping of the data. If for whatever reason this seems irrelevant to your project, please substitute with one (1) user-interactive Timeline or Time Series analysis.

You are NOT limited to one (1) only, but please use the Interactive feature for enlightenment not enterainment.

Likewise, please note that maps are often more useful when paired: say, the global state of 1990 vs. 2020, for example. Or, whichever years make the most sense for your data and questions.

Choropleth One

# Choose two good years for my data
murder_dat_1997 <- murder_tidy %>%
  filter(Year == 1997) %>% 
  complete(country = world_map2$country, 
           fill = (list(number = NA )) ) %>%
  left_join(world_map2, by = "country") %>%
  replace_na(list(Year = 1997))

murder_dat_1997

## # A tibble: 99,442 × 11
##    country  Year Deaths  long   lat group order code_2 code_3 code_num form_name
##    <chr>   <int>  <dbl> <dbl> <dbl> <dbl> <dbl> <chr>  <chr>     <dbl> <chr>    
##  1 Afghan…  1997   3270  74.9  37.2     2    12 AF     AFG           4 Islamic …
##  2 Afghan…  1997   3270  74.8  37.2     2    13 AF     AFG           4 Islamic …
##  3 Afghan…  1997   3270  74.8  37.2     2    14 AF     AFG           4 Islamic …
##  4 Afghan…  1997   3270  74.7  37.3     2    15 AF     AFG           4 Islamic …
##  5 Afghan…  1997   3270  74.7  37.3     2    16 AF     AFG           4 Islamic …
##  6 Afghan…  1997   3270  74.7  37.3     2    17 AF     AFG           4 Islamic …
##  7 Afghan…  1997   3270  74.6  37.2     2    18 AF     AFG           4 Islamic …
##  8 Afghan…  1997   3270  74.4  37.2     2    19 AF     AFG           4 Islamic …
##  9 Afghan…  1997   3270  74.4  37.1     2    20 AF     AFG           4 Islamic …
## 10 Afghan…  1997   3270  74.5  37.1     2    21 AF     AFG           4 Islamic …
## # … with 99,432 more rows

murder_map_1997 <- murder_dat_1997 %>%
  filter(code_3 != "ATA") %>%
   ggplot(aes(x = long, 
             y = lat, 
             group = group, 
             label = country)) +
  geom_polygon(aes(fill = Deaths) )+
  scale_fill_viridis_c(option = "C") +
  labs(fill = "Deaths",
       title = "Death Toll by Murder for 1997") +
  theme_void()

# interactive version
plotly::ggplotly(murder_map_1997)

murder_map_1997

murder_dat_2016 <- murder_tidy %>%
  filter(Year == 2016) %>% 
  complete(country = world_map2$country, 
           fill = (list(number = NA )) ) %>%
  left_join(world_map2, by = "country") %>%
  replace_na(list(Year = 2016))

murder_dat_2016

## # A tibble: 99,442 × 11
##    country  Year Deaths  long   lat group order code_2 code_3 code_num form_name
##    <chr>   <int>  <dbl> <dbl> <dbl> <dbl> <dbl> <chr>  <chr>     <dbl> <chr>    
##  1 Afghan…  2016   6270  74.9  37.2     2    12 AF     AFG           4 Islamic …
##  2 Afghan…  2016   6270  74.8  37.2     2    13 AF     AFG           4 Islamic …
##  3 Afghan…  2016   6270  74.8  37.2     2    14 AF     AFG           4 Islamic …
##  4 Afghan…  2016   6270  74.7  37.3     2    15 AF     AFG           4 Islamic …
##  5 Afghan…  2016   6270  74.7  37.3     2    16 AF     AFG           4 Islamic …
##  6 Afghan…  2016   6270  74.7  37.3     2    17 AF     AFG           4 Islamic …
##  7 Afghan…  2016   6270  74.6  37.2     2    18 AF     AFG           4 Islamic …
##  8 Afghan…  2016   6270  74.4  37.2     2    19 AF     AFG           4 Islamic …
##  9 Afghan…  2016   6270  74.4  37.1     2    20 AF     AFG           4 Islamic …
## 10 Afghan…  2016   6270  74.5  37.1     2    21 AF     AFG           4 Islamic …
## # … with 99,432 more rows

murder_map_2016 <- murder_dat_2016 %>%
  filter(code_3 != "ATA") %>%
   ggplot(aes(x = long, 
             y = lat, 
             group = group, 
             label = country)) +
  geom_polygon(aes(fill = Deaths) )+
  scale_fill_viridis_c(option = "C") +
  labs(fill = "Deaths",
       title = "Death Toll by Murder, 2016") +
  theme_void()

# interactive version
plotly::ggplotly(murder_map_2016)

murder_map_2016

## remove Antarctica "ATA" if you wish




## run ggplot version to test


# plotly::ggplotly(nuke_1975_map)

Notes: The main global trend of Death toll is decreasing. It is improving.

Choropleth Two

# Similar to above

basic_dat_1997 <- basic_tidy %>%
  filter(Year == 1997) %>% 
  complete(country = world_map2$country, 
           fill = (list(nuclear_per = NA)) ) %>%
  left_join(world_map2, by = "country") %>%
  replace_na(list(Year = 1997))


## remove Antarctica "ATA"

basic_map_1997 <- basic_dat_1997   %>%
  filter(code_3 != "ATA") %>%
  ggplot(aes(x = long, 
             y = lat, 
             group = group, 
             label = country)) +
  geom_polygon(aes(fill = Score) ) +
  scale_fill_viridis_c(option = "C") +
  theme_void() +
  labs(fill = "Score",
       title = "Basic Welfare Index, 1997")



basic_map_1997

plotly::ggplotly(basic_map_1997)

basic_dat_2016 <- basic_tidy %>%
  filter(Year == 2016) %>% 
  complete(country = world_map2$country, 
           fill = (list(nuclear_per = NA)) ) %>%
  left_join(world_map2, by = "country") %>%
  replace_na(list(Year = 2016))


## remove Antarctica "ATA"

basic_map_2016 <- basic_dat_2016   %>%
  filter(code_3 != "ATA") %>%
  ggplot(aes(x = long, 
             y = lat, 
             group = group, 
             label = country)) +
  geom_polygon(aes(fill = Score) ) +
  scale_fill_viridis_c(option = "C") +
  theme_void() +
  labs(fill = "Score",
       title = "Basic Welfare Index, 2016")



basic_map_2016

plotly::ggplotly(basic_map_2016)

Notes: Also improving.

Timeline

OPTIONAL. One numeric variable over a range of years. Use color to show country (or region, etc).

# Save if good.
big_five_1 <- bind_rows(mine_df)

murder_five <- big_five_1 %>% 
  left_join(murder_tidy, by = c("country", "year" = "Year") )


# I will save my plotly graph if I like 
# I have added an additional color command

five_murder_pg <- murder_five %>%
  plot_ly(x = ~year, 
          y = ~Deaths, 
          type = "scatter",
          mode = "lines+markers",
          color = ~country,
          colors = "Set3",
          showlegend = TRUE ) %>%
  layout( xaxis = list(title = "Year"),
    yaxis = list(title = "Death Toll by Murder per Nation"),
    title = "Big Five: Death Toll by Murder per Nation" )

# Run it
five_murder_pg

Notes: It is strange that the death toll of Venezuela is rising while other four is falling.

big_five_2 <- bind_rows(mine_df)

BWI_five <- big_five_2 %>% 
  left_join(basic_tidy, by = c("country", "year" = "Year") )


# I will save my plotly graph if I like 
# I have added an additional color command

five_BWI_pg <- BWI_five %>%
  plot_ly(x = ~year, 
          y = ~Score, 
          type = "scatter",
          mode = "lines+markers",
          color = ~country,
          colors = "Set3",
          showlegend = TRUE ) %>%
  layout( xaxis = list(title = "Year"),
    yaxis = list(title = "Basic Welfare Index per Nation"),
    title = "Big Five: Basic Welfare Index per Nation" )

# Run it
five_BWI_pg

Notes: The index of Venezuela is also different from the other four countries.

Model or Hypothesis Test

This part has three requirements:

A useful visualization which suggests either your inference or hypothesis;
The model or test output in a readable format (not the base output);
An explanation of what the model or test output reveals.

lm Viz

A ggplot version is fine.

## GGplot version
lm_5M <- murder_five %>% 
  filter(between(year, 1997, 2016))%>%
  ggplot( aes(x = year, 
              y = Deaths,
              size = gdp_cap_ppp,
              color = country) ) + 
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", show.legend = FALSE) +
  labs(title = "Five Nations",
       y ="Year", 
       x = "Death Toll by Murder + Nations",
       color = "Nation",
       text = "Per Capita (PPP)",
       size = "PPP",
       subtitle = "Death Toll ~ gdp_cap_ppp + country + year") 

lm_5M

# clean up the x axis?

lm_5B <- BWI_five %>% 
  filter(between(year, 1997, 2016))%>%
  ggplot( aes(x = year, 
              y = Score,
              size = gdp_cap_ppp,
              color = country) ) + 
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", show.legend = FALSE) +
  labs(title = "Five Nations",
       y ="Year", 
       x = "Basic Welfare Index + Nations",
       color = "Nation",
       text = "Per Capita (PPP)",
       size = "PPP",
       subtitle = "Basic Welfare Index ~ gdp_cap_ppp + country + year") 

lm_5B

A plotly version requires more coding and care.

## GPPLot to plot
## Add text for hover
## Round down numbers for better viewing
## Add custom axis breaks?
## Remove the geom_smooth which does not play well in plotly

lm_graph_5M <- murder_five %>% 
  filter(between(year, 1997, 2016))%>%
  mutate(across(where(is.numeric), round, 2))%>%
  ggplot( aes(x = year, 
              y = Deaths,
              size = gdp_cap_ppp,
              color = country, 
              text = paste0("Nation: ", country,
                           "<br />Death Toll: ", Deaths,  
                           "<br />Per Capita GDP (PPP): ", gdp_cap_ppp,
                           "<br />Year:", year )) ) + 
  geom_point(alpha = 0.7) +
  labs(title = "Five Nations",
       y ="Death Toll", 
       x = "Year + Per Capita GDP (PPP) + Nation",
       color = "Nation",
       text = "Per Capita (PPP)") +
  scale_x_continuous() +  
  theme_minimal() +
  guides(size = "none")

#lm_graph_5ey



lm_graph_5M <- plotly::ggplotly(lm_graph_5M, tooltip = "text" )

lm_graph_5M

Notes: The result of United States and Russia proves my “negative” assumption may be wrong. The death toll of Russia is larger that United States but Russia’s GDP is smaller than the US’s.

lm_graph_5B <- BWI_five %>% 
  filter(between(year, 1997, 2016))%>%
  mutate(across(where(is.numeric), round, 2))%>%
  ggplot( aes(x = year, 
              y = Score,
              size = gdp_cap_ppp,
              color = country, 
              text = paste0("Nation: ", country,
                           "<br />Score: ", Score,  
                           "<br />Per Capita GDP (PPP): ", gdp_cap_ppp,
                           "<br />Year:", year )) ) + 
  geom_point(alpha = 0.7) +
  labs(title = "Five Nations",
       y ="Score", 
       x = "Year + Per Capita GDP (PPP) + Nation",
       color = "Nation",
       text = "Per Capita (PPP)") +
  scale_x_continuous() +  
  theme_minimal() +
  guides(size = "none")

#lm_graph_5ey



lm_graph_5B <- plotly::ggplotly(lm_graph_5B, tooltip = "text" )

lm_graph_5B

lm Output

my_fm <- as.formula("Deaths ~ gdp_cap_ppp + country + year")

# model
mod_5M <- lm(formula = my_fm , data = murder_five)

# summary(mod_5ey) ## the standard but messy output

# output to tidy
clean__mod_5M <- broom::tidy(mod_5M, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), round, 4))

# output -- use DT later
clean__mod_5M

## # A tibble: 7 × 7
##   term                   estimate std.error statistic p.value conf.low conf.high
##   <chr>                     <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
## 1 (Intercept)          -47311.      2.27e+5    -0.208  0.835   -4.98e5   4.04e+5
## 2 gdp_cap_ppp              -0.912   2.08e-1    -4.38   0       -1.33e0  -4.98e-1
## 3 countryRussia         23498.      2.31e+3    10.2    0        1.89e4   2.81e+4
## 4 countrySouth Africa    1185.      1.17e+3     1.01   0.315   -1.14e3   3.51e+3
## 5 countryUnited States  36179.      8.88e+3     4.08   0.0001   1.86e4   5.38e+4
## 6 countryVenezuela      -3977.      1.94e+3    -2.05   0.0436  -7.84e3  -1.16e+2
## 7 year                     38.9     1.14e+2     0.340  0.734   -1.88e2   2.66e+2

# see key stats
mod_stats <- broom::glance(mod_5M)
mod_stats

## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.856         0.846 3695.      91.8 7.29e-37     6  -960. 1935. 1956.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

my_fm <- as.formula("Score ~ gdp_cap_ppp + country + year")

# model
mod_5B <- lm(formula = my_fm , data = BWI_five)

# summary(mod_5ey) ## the standard but messy output

# output to tidy
clean__mod_5B <- broom::tidy(mod_5B, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), round, 4))

# output -- use DT later
clean__mod_5B

## # A tibble: 7 × 7
##   term                  estimate std.error statistic p.value  conf.low conf.high
##   <chr>                    <dbl>     <dbl>     <dbl>   <dbl>     <dbl>     <dbl>
## 1 (Intercept)          -516.       94.9      -5.44    0      -705.     -328.    
## 2 gdp_cap_ppp             0.0004    0.0001    4.17    0.0001    0.0002    0.0005
## 3 countryRussia           1.97      0.966     2.04    0.0437    0.0567    3.89  
## 4 countrySouth Africa   -16.2       0.490   -33.1     0       -17.2     -15.2   
## 5 countryUnited States    0.284     3.71      0.0765  0.939    -7.08      7.65  
## 6 countryVenezuela       -3.05      0.812    -3.75    0.0003   -4.66     -1.44  
## 7 year                    0.286     0.0477    6.00    0         0.192     0.381

# see key stats
mod_stats <- broom::glance(mod_5B)
mod_stats

## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.981         0.979  1.54      785. 2.41e-77     6  -182.  379.  400.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

lm Notes

My assumption of the relationships between death toll and economy, index and economy seems wrong and needs further exploration.

Data Table

Never DUMP the data on the user, but always try to make it available. For a large data set, a source link is fine. For the data sets we are using, we can do so in the report dashboard. The libraries DT and reactable are good choices.

Main data set

Using DT.

# Trim it a bit  example below
##  data %>% select(- c(continent, region, gdp)) %>%  DT::datatable()
mine_df %>% 
  select(- c(continent, region, gdp)) %>% 
  rename(infant_mort = infant_mortality, life_exp = life_expectancy, ppp = gdp_cap_ppp) %>%
  DT::datatable(., class = "compact")

Save Your Work

As a general rule, I would use the dashboard for “serving” not “cooking.” You can make some final touches for the presentation. But I do not recommend building all the plots from stratch in the dashboard.

If you created a native Plotly graph, save that as it. If you created a GGplot graph that you want to use plotly::ggploty() on, save it as a ggplot. Do the touch-up work in the dash.

Likewise, save “as is” your tidy model output results, summary stats, and at least one larger data set. Add the touch-up code, DT::datatable() or reactable in the dashboard.

Get your results

Run ls() to see what’s in your environment.

ls()

##  [1] "basic_dat_1997"      "basic_dat_2016"      "basic_global"       
##  [4] "basic_global_stats"  "basic_map_1997"      "basic_map_2016"     
##  [7] "basic_tidy"          "big_five_1"          "big_five_2"         
## [10] "BWI_five"            "clean__mod_5B"       "clean__mod_5M"      
## [13] "country_ISO_codes2"  "five_BWI_pg"         "five_murder_pg"     
## [16] "gapminder_dat"       "lm_5B"               "lm_5M"              
## [19] "lm_graph_5B"         "lm_graph_5M"         "mine_df"            
## [22] "mod_5B"              "mod_5M"              "mod_stats"          
## [25] "murder_dat_1997"     "murder_dat_2016"     "murder_five"        
## [28] "murder_global"       "murder_global_stats" "murder_map_1997"    
## [31] "murder_map_2016"     "murder_tidy"         "my_fm"              
## [34] "these_five"          "Three_decades"       "world_map2"

Now, choose the items you want to keep and save them as a vector. Or, be lazy and just save everything! But this might slow you down when knitting the dashboard.

Save your results

dash_results <- c( "basic_dat_1997", "basic_dat_2016", "basic_global", "basic_global_stats", "basic_map_1997", "basic_map_2016", "basic_tidy", "big_five_1", "big_five_2", "BWI_five", "clean__mod_5B", "clean__mod_5M", "country_ISO_codes2", "five_BWI_pg", "gapminder_dat", "lm_5B", "lm_5M", "lm_graph_5B", "lm_graph_5M", "mine_df", "mod_5B", "mod_5M", "mod_stats", "murder_dat_1997", "murder_dat_2016", "murder_five", "murder_global", "murder_global_stats", "murder_map_1997", "murder_map_2016", "murder_tidy", "my_fm", "these_five", "Three_decades", "world_map2" )
save(list = dash_results, file = here::here("data", "dashboard_data", "dash_results.rda"))





####  OR
save.image( file = here::here("data", "dashboard_data","big_mess_images.RData"))

Either way, be sure to have a list of your objects for each section of your dashboard. Consider mapping out your dashboard by hand first. Then code it. Design before coding to tell a story.

Final Project Worksheet

2022-06-08

Data

Gapminder Case Study

Gapminder Download One

Gapminder Download Two

Research Questions

EDA Work

Graph One—The death toll by murder

Summary Stats

EDA notes

Graph Two—Basic welfare index

Summary Stats

EDA notes

Graph Extra—The global trends of both two data

Same for BWI

EDA note

Summary Stats

Same for BWI

EDA notes

User Viz

Choropleth One

Choropleth Two

Timeline

Model or Hypothesis Test

lm Viz

lm Output

lm Notes

Data Table

Main data set

Save Your Work

Get your results

Save your results