Transit Data Exploration

Load Dependencies

library(tidyverse)
library(janitor)
library(leaflet)
library(htmlwidgets)
library(htmltools)

Loading the data

Start with the Transit Data

1. Download from the TidyTuesday site, if you haven’t already done so and saved to a CSV.

# Get the transit_cost data. Save it as a clean file for future re-runs

tuesdata <- tidytuesdayR::tt_load('2021-01-05')
tuesdata <- tidytuesdayR::tt_load(2021, week = 2)

transit_cost <- tuesdata$transit_cost %>%
  drop_na(country) %>% #drop the trailing statistical information
  rename(code_2 = country)

dim(transit_cost) #537 20

transit_cost <- transit_cost %>%
  distinct()

dim(transit_cost)  #535 20

2. Save as a csv so that this step can be skipped in future efforts.

write_csv(transit_cost, "transit_cost.csv")

3. Load the saved dataset and convert percent tunneled to a numeric decimal.

transit_cost <- read_csv("transit_cost.csv")

#convert tunnel_per to a numeric so it can be aggregated...
transit_cost$tunnel_per <- as.numeric(sub("%","",transit_cost$tunnel_per))/100

dim(transit_cost) #535 20

## # A tibble: 6 x 20
##       e code_2 city     line  start_year end_year    rr length tunnel_per tunnel
##   <dbl> <chr>  <chr>    <chr> <chr>      <chr>    <dbl>  <dbl>      <dbl>  <dbl>
## 1  7136 CA     Vancouv~ Broa~ 2020       2025         0    5.7      0.877    5  
## 2  7137 CA     Toronto  Vaug~ 2009       2017         0    8.6      1        8.6
## 3  7138 CA     Toronto  Scar~ 2020       2030         0    7.8      1        7.8
## 4  7139 CA     Toronto  Onta~ 2020       2030         0   15.5      0.57     8.8
## 5  7144 CA     Toronto  Yong~ 2020       2030         0    7.4      1        7.4
## 6  7145 NL     Amsterd~ Nort~ 2003       2018         0    9.7      0.73     7.1
## # ... with 10 more variables: stations <dbl>, source1 <chr>, cost <dbl>,
## #   currency <chr>, year <dbl>, ppp_rate <dbl>, real_cost <dbl>,
## #   cost_km_millions <dbl>, source2 <chr>, reference <chr>

Loading additional supportive data

I decided, after looking at the transit data, that I wanted to explore the question of how committed countries around the world were to development and growth of public transit. I had a theory that the United States, which appears to spend the most in raw dollars, was among the least committed to building the infrastructure, and that it could be shown that others who have made that commitment have found ways to efficiently and more cost-effectively build more transit that serves a larger portion of its geographic area. I also wanted to explore how this differentiated in different regions of the world.

To do that, I had to find some supplemental sources, and found ways to add the area of cities (although I am not confident that this data measures what it should, as many cities report a variety of area in squared kilometers, including some that report land and water, some that report urban and metro, and no one seems to put together a comprehensive list. For that, I had to put together several sources.

1. Country and Continent data to match country codes with names and add region information.
Source: https://www.kaggle.com/statchaitya/country-to-continent/version/1

country_continent <- read_csv("countryContinent.csv") %>%
  clean_names()

## # A tibble: 6 x 9
##   country code_2 code_3 country_code iso_3166_2 continent sub_region region_code
##   <chr>   <chr>  <chr>         <dbl> <chr>      <chr>     <chr>            <dbl>
## 1 "Afgha~ AF     AFG               4 ISO 3166-~ Asia      Southern ~         142
## 2 "\xc5l~ AX     ALA             248 ISO 3166-~ Europe    Northern ~         150
## 3 "Alban~ AL     ALB               8 ISO 3166-~ Europe    Southern ~         150
## 4 "Alger~ DZ     DZA              12 ISO 3166-~ Africa    Northern ~           2
## 5 "Ameri~ AS     ASM              16 ISO 3166-~ Oceania   Polynesia            9
## 6 "Andor~ AD     AND              20 ISO 3166-~ Europe    Southern ~         150
## # ... with 1 more variable: sub_region_code <dbl>

List of cities with geocodes and population data.
Source: https://simplemaps.com/data/world-cities

worldcities <- read_csv("worldcities.csv") %>%
  clean_names()

## # A tibble: 6 x 11
##   city  city_ascii   lat   lng country iso2  iso3  admin_name capital population
##   <chr> <chr>      <dbl> <dbl> <chr>   <chr> <chr> <chr>      <chr>        <dbl>
## 1 Tokyo Tokyo      35.7  140.  Japan   JP    JPN   Tokyo      primary   37977000
## 2 Jaka~ Jakarta    -6.21 107.  Indone~ ID    IDN   Jakarta    primary   34540000
## 3 Delhi Delhi      28.7   77.2 India   IN    IND   Delhi      admin     29617000
## 4 Mumb~ Mumbai     19.0   72.8 India   IN    IND   Maharasht~ admin     23355000
## 5 Mani~ Manila     14.6  121.  Philip~ PH    PHL   Manila     primary   23088000
## 6 Shan~ Shanghai   31.2  121.  China   CN    CHN   Shanghai   admin     22120000
## # ... with 1 more variable: id <dbl>

List of cities with info on their geographic area
Source: http://www.citymayors.com/statistics/largest-cities-area-250.html

landmass <- read_csv("land_mass.csv") %>%
  clean_names() %>%
  arrange(country, city)

## # A tibble: 6 x 5
##    rank city         country   land_area pop_density
##   <dbl> <chr>        <chr>         <dbl>       <dbl>
## 1    21 Buenos Aires Argentina      2266        4950
## 2   112 Adelaide     Australia       729        1350
## 3    39 Brisbane     Australia      1603         950
## 4   243 Gold Coast   Australia       383        1100
## 5    25 Melbourne    Australia      2080        1500
## 6    76 Perth        Australia       964        1200

Clean the data and create a complete dataset with each project as level of analysis

Make changes to the datasets to facilitate matches

Change erroneous country codes in the transit data (non-ISO compliant)

transit_cost$code_2[transit_cost$code_2 == "UK"] <- "GB"

Rename cities in the transit data to match the names in the list of cities.

transit_cost$city[transit_cost$city == "Tel Aviv"] <- "Tel Aviv-Yafo"
transit_cost$city[transit_cost$city == "Ahmadabad"] <- "Ahmedabad"

Rename inaccurate city names in the list of cities to match with transit data.

worldcities$city_ascii[worldcities$city_ascii == "Tongshan"] <- "Xuzhou"

Modify the names of cities in the city list with geographic area in order to match with transit.

landmass$city[landmass$city == "Tokyo/Yokohama"] <- "Tokyo"
landmass$city[landmass$city == "New York Metro"] <- "New York"
landmass$city[landmass$city == "Montreal."] <- "Montreal"
landmass$city[landmass$city == "Ottawa/Hull"] <- "Ottawa"
landmass$city[landmass$city == "Tel Aviv"] <- "Tel Aviv-Yafo"
landmass$city[landmass$city == "Osaka/Kobe/Kyoto"] <- "Osaka"
landmass$city[landmass$city == "Seoul/Incheon"] <- "Seoul"
landmass$city[landmass$city == "Kuwait"] <- "Kuwait City"
landmass$city[landmass$city == "St Petersburg"] <- "Saint Petersburg"
landmass$city[landmass$city == "Nizhni Novgorod"] <- "Nizhniy Novgorod"
landmass$city[landmass$city == "San Francisco//Oakland"] <- "San Francisco"

Join info from suplemental datasets to transit cost data and fill in holes

Add country name and region/continent info to the transit data.

transit_cost_cont <- left_join(transit_cost, 
                           country_continent,
                           by = c("code_2")) %>%
  select(e, code_2, country, continent, sub_region, everything())

sum(is.na(transit_cost_cont$country)) 
# should be 0.  If it is 3, re-run recoding of UK change to GB
# and then, re-run the join

dim(transit_cost_cont) #535 28

Add info from the city list to the transit list.

transit_cost_geom <- left_join(transit_cost_cont,
                           worldcities,
                           by = c("city" = "city_ascii")) %>%
  distinct(city, line, start_year, end_year, .keep_all = TRUE) %>%
  rename(country = country.x) %>%
  select(-code_2,
         -code_3,
         -city.y,
         -country.y) %>%
  select(city, country, iso2, lat, lng, population, everything()) %>%
  arrange(country, city)

Note that Bahrain is not actually a city, but is listed as one in the transit data.
This is because the projects span several Bahrain cities. Keep Bahrain in the city list and set the geocode to the capital, Manana.

transit_cost_geom$iso2[transit_cost_geom$city == "Bahrain"] <- "BH"
transit_cost_geom$iso3[transit_cost_geom$city == "Bahrain"] <- "BHR"
transit_cost_geom$lat[transit_cost_geom$city == "Bahrain"] <- 26.2250
transit_cost_geom$lng[transit_cost_geom$city == "Bahrain"] <- 50.5775
transit_cost_geom$population[transit_cost_geom$city == "Bahrain"] <- 1501600

#rename dataset as exists to this point to "data"
data <- transit_cost_geom

Add the city geographic area information to the transit data.

data_2 <- left_join(data, 
                    landmass, 
                    by = "city") %>%
  rename(country = country.x) %>%
  select(-country.y,
         -rank)

dim(data_2) #535 36

Many of the cities were not included in the list of Geographic area of cities. That data needs to be added for each city, taken from the Wikipedia page for the city.

data_2$land_area[data_2$city == "Bahrain"] <- 785
data_2$land_area[data_2$city == "Dhaka"] <- 706
data_2$land_area[data_2$city == "Salvador"] <- 693
data_2$land_area[data_2$city == "Ad Dammam"] <- 800
data_2$land_area[data_2$city == "Salvador"] <- 693
data_2$land_area[data_2$city == "Ahmedabad"] <- 1866
data_2$land_area[data_2$city == "Amsterdam"] <- 219
data_2$land_area[data_2$city == "Bilbao"] <- 42
data_2$land_area[data_2$city == "Bucharest"] <- 1811
data_2$land_area[data_2$city == "Bursa"] <- 10422
data_2$land_area[data_2$city == "Busan"] <- 770
data_2$land_area[data_2$city == "Changchun"] <- 1855
data_2$land_area[data_2$city == "Changsha"] <- 3915
data_2$land_area[data_2$city == "Chengdu"] <- 4558
data_2$land_area[data_2$city == "Chiang Mai"] <- 2905
data_2$land_area[data_2$city == "Salvador"] <- 693
data_2$land_area[data_2$city == "Chongqing"] <- 5473
data_2$land_area[data_2$city == "Cologne"] <- 405
data_2$land_area[data_2$city == "Doha"] <- 132
data_2$land_area[data_2$city == "Dongguan"] <- 2465
data_2$land_area[data_2$city == "Dusseldorf"] <- 217
data_2$land_area[data_2$city == "Fuzhou"] <- 1768
data_2$land_area[data_2$city == "Geneva"] <- 16
data_2$land_area[data_2$city == "Guangzhou"] <- 3843
data_2$land_area[data_2$city == "Guiyang"] <- 2403
data_2$land_area[data_2$city == "Gurgaon"] <- 250
data_2$land_area[data_2$city == "Guangzhou"] <- 3843
data_2$land_area[data_2$city == "Hangzhou"] <- 8260
data_2$land_area[data_2$city == "Hanoi"] <- 320
data_2$land_area[data_2$city == "Hefei"] <- 839
data_2$land_area[data_2$city == "Izmir"] <- 2259
data_2$land_area[data_2$city == "Kaohsiung"] <- 363
data_2$land_area[data_2$city == "Karlsruhe"] <- 173
data_2$land_area[data_2$city == "Kharkiv"] <- 350
data_2$land_area[data_2$city == "Kochi"] <- 440
data_2$land_area[data_2$city == "Kunming"] <- 4013
data_2$land_area[data_2$city == "Kyiv"] <- 839
data_2$land_area[data_2$city == "Lanzhou"] <- 2433
data_2$land_area[data_2$city == "Leipzig"] <- 297
data_2$land_area[data_2$city == "Lucerne"] <- 37
data_2$land_area[data_2$city == "Malaga"] <- 827
data_2$land_area[data_2$city == "Malmo"] <- 332
data_2$land_area[data_2$city == "Mecca"] <- 760
data_2$land_area[data_2$city == "Leipzig"] <- 297
data_2$land_area[data_2$city == "Nagpur"] <- 394
data_2$land_area[data_2$city == "Nanchang"] <- 686
data_2$land_area[data_2$city == "Nanjing"] <- 1399
data_2$land_area[data_2$city == "Nanning"] <- 6559
data_2$land_area[data_2$city == "Nuremberg"] <- 186
data_2$land_area[data_2$city == "Oslo"] <- 480
data_2$land_area[data_2$city == "Panama City"] <- 275
data_2$land_area[data_2$city == "Prague"] <- 496
data_2$land_area[data_2$city == "Seville"] <- 140
data_2$land_area[data_2$city == "Sofia"] <- 5723
data_2$land_area[data_2$city == "Suzhou"] <- 2944
data_2$land_area[data_2$city == "Tainan"] <- 259
data_2$land_area[data_2$city == "Taiyuan"] <- 1460
data_2$land_area[data_2$city == "Thessaloniki"] <- 1286
data_2$land_area[data_2$city == "Wenzhou"] <- 1243
data_2$land_area[data_2$city == "Wuhan"] <- 1528
data_2$land_area[data_2$city == "Xi'an"] <- 5808
data_2$land_area[data_2$city == "Xiamen"] <- 1700
data_2$land_area[data_2$city == "Xuzhou"] <- 3037
data_2$land_area[data_2$city == "Zhengzhou"] <- 4271
data_2$land_area[data_2$city == "Zurich"] <- 88

Change names of countries with long awkward names to shortened version

project_data <- data_2

project_data$country[project_data$country == "Korea (Republic of)"] <- "South Korea"
project_data$country[project_data$country == "Iran (Islamic Republic of)"] <- "Iran"
project_data$country[project_data$country == "Russian Federation"] <- "Russia"
project_data$country[project_data$country == "Taiwan, Province of China"] <- "Taiwan"
project_data$country[project_data$country == "United Kingdom of Great Britain and Northern Ireland"] <- "United Kingdom"
project_data$country[project_data$country == "United States of America"] <- "United States"

The project dataset is now complete.

## # A tibble: 6 x 36
##   city     country iso2    lat   lng population     e continent sub_region line 
##   <chr>    <chr>   <chr> <dbl> <dbl>      <dbl> <dbl> <chr>     <chr>      <chr>
## 1 Buenos ~ Argent~ AR    -34.6 -58.4   16157000  7593 Americas  South Ame~ RER  
## 2 Melbour~ Austra~ AU    -37.8 145.     5078193  7280 Oceania   Australia~ Metr~
## 3 Perth    Austra~ AU    -32.0 116.     2059484  7587 Oceania   Australia~ Forr~
## 4 Sydney   Austra~ AU    -33.9 151.     5312163  7274 Oceania   Australia~ Metr~
## 5 Sydney   Austra~ AU    -33.9 151.     5312163  7275 Oceania   Australia~ Metr~
## 6 Vienna   Austria AT     48.2  16.4    1911191  7360 Europe    Western E~ Line~
## # ... with 26 more variables: start_year <chr>, end_year <chr>, rr <dbl>,
## #   length <dbl>, tunnel_per <dbl>, tunnel <dbl>, stations <dbl>,
## #   source1 <chr>, cost <dbl>, currency <chr>, year <dbl>, ppp_rate <dbl>,
## #   real_cost <dbl>, cost_km_millions <dbl>, source2 <chr>, reference <chr>,
## #   country_code <dbl>, iso_3166_2 <chr>, region_code <dbl>,
## #   sub_region_code <dbl>, iso3 <chr>, admin_name <chr>, capital <chr>,
## #   id <dbl>, land_area <dbl>, pop_density <dbl>

Create 2 dataframes for visualizations:

There will be two visualizations included in this project.

One is a map of all the cities that have transit projects reported. The circle size indicate their standing in an index I created by dividing the total length of the projects they have engaged (completed and not) by the area of the city (scaled from 0-100). The colors represent 5 bins of money the city spent or anticipates spending on the project pers square kilometer of the projects. As you click each city, more information about the city and the projects is available.

The second visualization is a bar graph that separates out the projects by sub-region, and represents average length of project by bar size (in 4 bins), average cost per project in color, and the percentage of projects completed by region at the time the data was publishedin white text on the bar.

Create dataframe with cities data to use for map visualization.

Determine the denominator for creation of a special variable for each city to create an index from 0-1, to be later scaled 0-100, for the total length of the city’s projects combined, divided by the square kilometers of the city. A value of 100 will represent the city with the greatest number of project kilometers per square kilometers of the city.

project_data %>%
  group_by(city) %>%
  mutate(km_index = sum(length)/land_area) %>%
  summary(km_index) 
#max = 1.01850 so divide the formula by that to create a 0-1 index

Note that the maximum value for project kilometers/square miles is 1.01850, so that will be the denominator in creating the final index.

Create the dataframe with city as unit of observation with data about all projects for that city on that row.

cities_data <- project_data %>%
  group_by(city) %>% 
  mutate(km_index = sum(length) / land_area / 1.01850,
         tot_length = sum(length),
         projects = n(),
         min_start = min(start_year),
         max_end = max(end_year),
         cost_usd = sum(as.numeric(real_cost)),
         tot_stations = sum(stations),
         percent_done = mean(as.numeric(tunnel_per), na.rm = T),
         cost_per_km = cost_usd / tot_length) %>%
  select(city, 
         country, 
         lat, 
         lng, 
         population, 
         continent, 
         sub_region, 
         km_index, 
         tot_length, 
         projects, 
         min_start, 
         max_end,
         cost_usd,
         land_area,
         tot_stations,
         percent_done,
         cost_per_km) %>%
  distinct()

Create friendly display names for all of the NA values, in the city data that may appear weird when added to visualization (map): min_start, max_end, percent_done, percent_completed, tot_stations.

#replace na in years with "Year?"
sum(is.na(cities_data$min_start)) #14
cities_data$start <- ifelse(is.na(cities_data$min_start), 
                                "Year?", 
                                cities_data$min_start)

sum(is.na(cities_data$max_end)) #22
cities_data$end <- ifelse(is.na(cities_data$max_end), 
                                "Year?", 
                                cities_data$max_end)

sum(is.na(cities_data$percent_done)) #10

# need to make % presentable before adding character values that will
# make it impossible to modify later. Create new variable for this

cities_data$percent_completed <- as.character(round(cities_data$percent_done * 100, 2))
cities_data$percent_completed <- ifelse(is.na(cities_data$percent_done), 
                              "unknown", 
                              cities_data$percent_completed)

cities_data$percent_completed

#replace na in "stations" with "unknown"

sum(is.na(cities_data$tot_stations)) #8
cities_data$stations <- ifelse(is.na(cities_data$tot_stations), 
                                   "unknown", 
                                   cities_data$tot_stations)

Clean up any wrong values for geocodes for cities (ie, Seville is not near Toronto)… Correct it in both the cities dataset and the project dataset.

cities_data$lat[cities_data$city == "Seville"] <- 37.3891
cities_data$lng[cities_data$city == "Seville"] <- -5.9845
project_data$lat[project_data$city == "Seville"] <- 37.3891
project_data$lng[project_data$city == "Seville"] <- -5.9845

## # A tibble: 6 x 21
## # Groups:   city [6]
##   city   country   lat   lng population continent sub_region km_index tot_length
##   <chr>  <chr>   <dbl> <dbl>      <dbl> <chr>     <chr>         <dbl>      <dbl>
## 1 Bueno~ Argent~ -34.6 -58.4   16157000 Americas  South Ame~  0.00867       20  
## 2 Melbo~ Austra~ -37.8 145.     5078193 Oceania   Australia~  0.00425        9  
## 3 Perth  Austra~ -32.0 116.     2059484 Oceania   Australia~  0.00866        8.5
## 4 Sydney Austra~ -33.9 151.     5312163 Oceania   Australia~  0.0384        66  
## 5 Vienna Austria  48.2  16.4    1911191 Europe    Western E~  0.0388        17.9
## 6 Bahra~ Bahrain  26.2  50.6    1501600 Asia      Western A~  0.127        101. 
## # ... with 12 more variables: projects <int>, min_start <chr>, max_end <chr>,
## #   cost_usd <dbl>, land_area <dbl>, tot_stations <dbl>, percent_done <dbl>,
## #   cost_per_km <dbl>, start <chr>, end <chr>, percent_completed <chr>,
## #   stations <chr>

Create dataframe with subregion data for the bar chart

Create the dataframe with subregion as unit of analysis

subregion_data <- project_data %>%
  group_by(sub_region) %>%
  mutate(tot_projects = n(),
         length = round(mean(length), 6),
         cost_usd = mean(as.numeric(real_cost)),
         land_area = mean(land_area),
         cost_per_km = sum(cost_usd) / sum(length),
         percent_done = mean(as.numeric(tunnel_per), na.rm = T)) %>%
  select(sub_region, 
         continent,
         tot_projects, 
         length, 
         cost_usd, 
         land_area, 
         cost_per_km,
         percent_done) %>%
  distinct() %>%
  arrange(desc(length))

Add a new variable containing binned data for cost per kilometer.

# Determine the cut point values to guide labeling
quantile(subregion_data$cost_per_km, probs = c(0.0, 0.25, 0.5, 0.75, 1.0))

# Bin the data into new variable
subregion_data$cost_per_km_bin <-
  cut_number(subregion_data$cost_per_km, 
             n = 4,
             labels = c("$98 - 203 Million", 
                        "$203 - 219 Million",
                        "$219 - 303 Million",
                        "$303 - 596 Million"))

## # A tibble: 6 x 9
## # Groups:   sub_region [6]
##   sub_region        continent tot_projects length cost_usd land_area cost_per_km
##   <chr>             <chr>            <int>  <dbl>    <dbl>     <dbl>       <dbl>
## 1 Western Asia      Asia                38   39.0    8418.     1303.        216.
## 2 Southern Asia     Asia                36   33.7    7024.      612.        208.
## 3 South America     Americas             8   25.1    5072.     1158.        202.
## 4 South-Eastern As~ Asia                23   23.8    8754.     1034.        368.
## 5 Eastern Asia      Asia               286   22.8    4119.     2513.        181.
## 6 Central America   Americas             5   21.5    4512.      699.        210.
## # ... with 2 more variables: percent_done <dbl>, cost_per_km_bin <fct>

Create plots

Create world map highlighting the city data.

Create base map with view of entire world.

map <- leaflet() %>%
  addTiles()

Add markers and popups to create Interactive Map

# Create color bins for cost_per_km based on 5 quantiles (leaflet-specific code)
qpal <- colorQuantile(c("#350480", #Purple
                        "#004EFF", #Blue
                        "#ffa500", #Gold
                        "#CC5500", #Orange
                        "#ff0000"), #Red
                      cities_data$cost_per_km, 
                      n = 5)

# Create HTML content for the popup 
content <- 
  paste("<center><b><u><span style = 'color:",qpal(cities_data$cost_per_km),"; font-size:16px'>", 
        cities_data$city, ", ", cities_data$country,
        "</span></u></b><br/> Population: ", format(cities_data$population, big.mark = ","),
        "<br/> Area: ", cities_data$land_area, "km<sup>2</sup>",        
        "<hr><i><b><span style = 'color:", qpal(cities_data$cost_per_km),"; font-size:14px'>", 
        cities_data$start, " - ", cities_data$end, "</b></i></span></center>",
        "Projects: ", cities_data$projects,
        "<br/> Total Length: ", cities_data$tot_length, " km",
        "<br/>", cities_data$stations, " stations",
        "<br/> Percent completed (1/2021): ", cities_data$percent_completed,"%",
        "<hr><center><span><b> Length Per Area Index (1-100): 
        </center></span><span><center>", round(cities_data$km_index * 100, 2), "</b></center></span>",
        "<hr><center><i><b><span style = 'color:",qpal(cities_data$cost_per_km), "; font-size:14px'>",
        "Cost Data</span></i></b></center>",
        "Cost: $", format(round(cities_data$cost_usd, digits = 0), big.mark = ","),
        " million",
        "<br/>Cost per km: $", round(cities_data$cost_per_km, 0), " million")

# add circle markers and popups to add overview information about projects by city
# base the radius of the circle markers on the index I generated of kilometers of
# projects divided by square kilometers of city.
# color based on 
interactive_map <- map %>%
  addCircleMarkers(data = cities_data, 
                   lng = ~lng,
                   lat = ~lat,
                   radius = ~ sqrt(km_index * 100), 
                   color = ~qpal(cost_per_km),
                   opacity = 0.5,
                   label = ~city,
                   popup = content)

Add legend and title to finalize the map

## Create label titles for colors on map

# Determine quantiles for the color index for the map

quantile(cities_data$cost_per_km, probs = c(0.0, 0.2, 0.4, 0.6, 0.8, 1.0))

# use cut point values to create text for labels to be added to map
cost_vector <- c("under $139: ","$139-172: ","$172-214: ","$214-315: ", "over $315: ")

## Add HTML for Title Line and Footer
title_text <- tags$div(
  HTML('<H1 style = "margin:0;"> Planned Transit Projects By City: </H1>
        <H2 style = "margin-top:10px; margin-bottom:5px; color:red;"> Cost per Kilometer and Kilometers per City Size </H2>
        <center><div style = "margin-top:5px;"><b>Click on a city to see data on city and planned transit projects.
        <div style = "margin:0; font-size: 10px; text-align:right">Transit Data from #tidyTuesday 1/5/2021</div>'))

footer_text <- tags$div(
  HTML('<div style ="margin:0; font-size:12px"><b>Circle size represents "Length per Area Index", a relative comparison of project length (in km) to city\'s geographic area (in km<sup>2</sup>)</b></div>'))

# Create final map by adding legend and title.
final_map <- interactive_map %>%
  addControl(title_text, position = "topright") %>%
  addControl(footer_text, position = "bottomleft") %>%
  addLegend(pal = qpal, 
            values = ~cost_per_km,
            position = "bottomleft",
            opacity = 1,
            data = cities_data,
            labFormat = labelFormat(cost_vector),
            title = "Cost per kilometer in $millions US")

Display the final interactive map

Click to jump to full sized version for better interactive experience.

Create bar chart with subregional data

# Create text-friendly "percent_completed" variable
subregion_data$percent_completed <- 
  paste0(as.character(round((subregion_data$percent_done * 100), 0)),"%")

# Make region a Factor to create order on bar chart
subregion_data$sub_region <- factor(subregion_data$sub_region,
                                    levels =
                                      c("Northern America",
                                        "Central America",
                                        "South America",
                                        "Western Asia",
                                        "Southern Asia",
                                        "South-Eastern Asia",
                                        "Eastern Asia",
                                        "Central Asia",
                                        "Northern Europe",
                                        "Southern Europe",
                                        "Western Europe",
                                        "Eastern Europe",
                                        "Northern Africa",
                                        "Australia and New Zealand"))

# Construct the bar plot 
bar_chart <- subregion_data %>%
  ggplot() +
  geom_col(mapping = aes(x = sub_region,
                         y = length,
                         fill = cost_per_km_bin)) +
  scale_fill_manual(values = c("#004EFF", #Blue
                               "#ffa500", #Gold
                               "#CC5500", #Orange
                               "#ff0000")) + #Red +
  scale_x_discrete(limits = rev(levels(subregion_data$sub_region))) +
  geom_text(aes(x = sub_region, 
                y = 0, 
                label = percent_completed),
            size = 5,
            color = "white", hjust = 0, nudge_y = 0.3) +
  coord_flip() +
  labs(y = "Average Length of Projects (km)",
       x = "Sub-Regions",
       title = "Transit Projects by Region ",
       subtitle = "Cost, Length and Percent Completed",
       fill = "Average cost/km:",
       caption = "Transit Data from #tidyTuesday 1/5/2021") +
  theme(axis.text = element_text(size = 14),
        axis.text.y = element_text(angle = 7, face = "italic"),
        axis.title = element_text(size = 18, face = "bold"), 
        plot.title = element_text(size = 28, face = "bold"), 
        plot.subtitle = element_text(size = 20, color = "red"),
        plot.background = element_rect(color = "black", size = 2, linetype = "solid"),
        plot.margin = margin(1, 1, 1, 1, "cm"),
        legend.text = element_text(size = 8), 
        legend.title = element_text(size = 12),
        legend.background = element_rect(fill = "yellow"),
        legend.box.background = element_rect(linetype = "solid"),
        legend.position = "bottom",
        legend.box = "horizontal"
        )

Display the bar chart

Save map as html and bar-chart as jpeg

saveWidget(final_map, "transit_projects.html")
ggsave(file = "bar_chart.jpeg", plot = bar_chart, width = 12, height = 8)