HOMEWORK #9: Interactive Graphic
My goal was to make an interactive world map featuring the cause of death data set from Kaggle
-The code below is some wrangling to get the cause of death data on a world map.
-I then used plotly to put the data as a hoverinfo feature.
-Then I decided to make a Shiny App using this data and the world map. My objective was to have the user choose the year and the specific cause of death, and then a choropleth of the world would appear showing how many people died in that particular year as a result of the disease/condition. The purpose of such an app would be to show global trends in cause of deaths. For example, we can see that deaths as a result of nutritional deficiencies increased between 1990 and 2019:
This Shiny app was
successful in my local environment. However, when I published this app
including the data from all years (1990-2019) joined with the world map
data, which produced a dataset of almost 3 million rows, I ran into an
“out of memory” error.
After some troubleshooting I was still not able to get the app to function correctly. I was tempted to upgrade my account in order to get memory > 1 gb, however I suspect the issue is more likely my inexperience working with shiny, and inefficient code.
I was able to publish a paired down version of the app, which prompts the users to select a cause of death of interest, and it displays a world choropleth of that data in the year 2000.
In order to make this app more polished as a portfolio piece, I will:
-transform the data to show the number of deaths per 100k in each country, as opposed to a total death count (in the current dataset, China appears red quite often due to population size.)
-Include data for each year in a way that doesn’t prevent the Shiny App from functioning as a web app.
-Give the user an option of how they would like data graphically represented (choropleth, bar graph, scatterplot)
This journey into the world of Shiny has been challenging but extremely rewarding.
*DATA WRANGLING:
#reading in dataset
cod = read_csv("cause_of_deaths.csv")
## Rows: 6120 Columns: 34
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Country/Territory, Code
## dbl (32): Year, Meningitis, Alzheimer's Disease and Other Dementias, Parkins...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#world pop data for year 2000
pop = read_csv("world_population.csv")
## Rows: 234 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): CCA3, Country/Territory, Capital, Continent
## dbl (13): Rank, 2022 Population, 2020 Population, 2015 Population, 2010 Popu...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#renaming to make region column the same
pop = pop %>%
rename(region = "Country/Territory") %>% select(region, `2000 Population`)
pop$region = gsub("Republic of the Congo", "Democratic Republic of the Congo", pop$region)
#rename country column to 'region' (for joining later to map data)
cod = cod %>%
rename(region = "Country/Territory")
#making sure "United States" exists
cod %>%
filter(region == "United States")
cod %>%
filter(region == "Congo")
#changing congo to full name
cod$region = gsub("Congo", "Democratic Republic of the Congo", cod$region)
cod
cod2000 = cod %>% filter(Year == 2000)
cod2000
world_map = map_data("world")
#testing world map
world_map%>%
ggplot(aes(map_id = region)) +
geom_map(map = world_map)+
expand_limits(x = world_map$long, y = world_map$lat)
#checking names of countries
world_map %>%
filter(region == "United States")
#renaming USA to United States
world_map$region = gsub("USA", "United States", world_map$region)
world_map %>%
filter(region == "Democratic Republic of the Congo")
#only 1 year: 2000
ds = world_map %>% left_join(cod2000)
## Joining, by = "region"
ds = ds %>% left_join(pop)
## Joining, by = "region"
ds
#now cause of death data can be mapped with geom_polygon in ggplot
#joining data set with all the years...this is BIG ONE, but I'd like the shiny app to let the user pick the year. I will omit population data because I dont have populations for each year in original data set.
#ds5 = cod %>% inner_join(world_map)
#ds5
#ds5 has almost 3 million rows...my joining every year and every country with all the map data is just too big. I've deployed a shiny app with it, but it crashes with an "out of memory" error. Seems like there would be a cleaner way to handle this task.
cod_allyears = world_map %>% left_join(cod)
## Joining, by = "region"
cod_allyears
ggplot(cod_allyears %>% filter(Year == 2000), aes(x = long, y = lat, fill = Meningitis)) +
geom_polygon(aes(group = group), color = "coral1") +
expand_limits(x =cod_allyears$long, y = cod_allyears$lat) +
coord_map("mercator", xlim = c(-180,180)) +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.line.y.left = element_blank(),
axis.line.x.bottom = element_blank(),
axis.ticks = element_blank())
plot = ds %>%
ggplot(aes(long, lat)) +
geom_polygon(aes(group = group, fill = region), color = "black") +
expand_limits(x =ds$long, y = ds$lat) +
coord_map("mercator", xlim = c(-180,180)) +
theme(legend.position = "none")
ggplotly(plot)
this.year = 2019
plot = cod_allyears %>%
filter(Year == this.year) %>%
ggplot(aes(x = long, y = lat, group = group,
label = `Parkinson's Disease`,
label1 = `Alzheimer's Disease and Other Dementias`,
label2 = `Meningitis`,
label3 = `Nutritional Deficiencies`,
label4 = `Interpersonal Violence`,
label5 = `Drug Use Disorders`,
label6 = `Lower Respiratory Infections`,
label7 = `Self-harm`,
label8 = `Environmental Heat and Cold Exposure`,
label9 = `Diabetes Mellitus`,
label10 = `Protein-Energy Malnutrition`,
label11 = `Cirrhosis and Other Chronic Liver Diseases`,
label12 = `Acute Hepatitis`,
label13 = `Malaria`,
label14 = `Maternal Disorders`,
label15 = `Tuberculosis`,
label16 = `Neonatal Disorders`,
label17 = Neoplasms,
label18 = `Chronic Kidney Disease`,
label19 = `Road Injuries`,
label20 = `Digestive Diseases`,
label21 = `Parkinson's Disease`,
label22 = `Drowning`,
label23 = `HIV/AIDS`,
label24 = `Cardiovascular Diseases`,
label25 = `Alcohol Use Disorders`,
label26 = `Diarrheal Diseases`,
label27 = `Conflict and Terrorism`,
label28 = Poisonings,
label29 = `Chronic Respiratory Diseases`,
label30 = `Fire, Heat, and Hot Substances`
)) +
geom_polygon(aes(fill = region ), color = "black") +
expand_limits(x =cod_allyears$long, y = cod_allyears$lat) +
coord_map("mercator", xlim = c(-180,180)) +
theme(legend.position = "none")
ggplotly(plot)