PHW251 Bonus Problem Set

Part 1

Question 1

Below is a function that ought to return “I’m an even number!” if x is an even number. However, we’re having trouble receiving a value despite x == 4, which we know is an even number. Fix the code chunk and explain why this error is occurring. You will have to change the eval=FALSE option in the code chunk header to get the chunk to knit in your PDF.

NOTE: %% is the “modulo” operator, which returns the remainder when you divide the left number by the right number. For example, try 2 %% 2 (should equal 0 as 2/2 = 1 with no remainder) and 5 %% 2 (should equal 1 as 5/2 = 2 with a remainder of 1).

return_even <- function(x){
  if (x %% 2 == 0) {
    return("I'm an even number!")
  }
}

x <- 4
return_even(x)

EXPLAIN THE ISSUE HERE: The error occurs because the function return_even is defined to take an argument x, but in the line return_even(), no argument is passed. When no value is supplied for x, R throws an error because the function cannot evaluate x %% 2 without a valid value for x

Question 2

R functions are not able to access global variables unless we provide them as inputs.

Below is a function that determines if a number is odd and adds 1 to that number. The function ought to return that value, but we can’t seem to access the value. Debug the code and explain why this error is occurring. Does it make sense to try and call odd_add_1 after running the function?

return_odd <- function(y){
  if (y %% 2 != 0) {
    odd_add_1 <- y + 1
    return(odd_add_1)  # Ensure the value is returned
  }
}

result <- return_odd(3)  # Save the output
result  # Access the returned value

When return_odd(3) is called, the function calculates odd_add_1 but does not return it or save it outside the function. Consequently, odd_add_1 is not available in the global environment, leading to an error when trying to access it.

Question 3

BMI calculations and conversions: - metric: \(BMI = weight (kg) / [height (m)]^2\) - imperial: \(BMI = 703 * weight (lbs) / [height (in)]^2\) - 1 foot = 12 inches - 1 cm = 0.01 meter

Below is a function bmi_imperial() that calculates BMI and assumes the weight and height inputs are from the imperial system (height in feet and weight in pounds).

df_colorado <- read_csv("data/colorado_data.csv")

## Rows: 24 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): location, gender
## dbl (3): height, weight, date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

bmi_imperial <- function(height, weight){
  bmi = (703 * weight)/(height * 12)^2
  return(bmi)
}

# calculate bmi for the first observation
bmi_imperial(df_colorado$height[1], df_colorado$weight[1])

## [1] 42.62802

Write a function called bmi_metric() that calculates BMI based on the metric system. You can test your function with the Taiwan data set excel file in the data folder, which has height in cm and weight in kg.

library(readr)
library(readxl)  
getwd()  # Check the current working directory

## [1] "/home/rstudio/PHW251_2024/problem_sets/problem_set_bonus"

list.files("/home/rstudio/PHW251_2024/problem_sets/problem_set_bonus/data")

##  [1] "bll_nyc_per_1000.csv"           "challenge_choropleth.png"      
##  [3] "colorado_data.csv"              "doctors.csv"                   
##  [5] "patients.csv"                   "rate_graph.png"                
##  [7] "schedule.csv"                   "Suicide Mortality by State.csv"
##  [9] "table_1.png"                    "table_2.png"                   
## [11] "taiwan_data.xlsx"               "visits.csv"

install.packages("readxl")  # Install the package if you haven’t already

## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)

           # Load the library

df_taiwan <- read_excel("/home/rstudio/PHW251_2024/problem_sets/problem_set_bonus/data/taiwan_data.xlsx")

head(df_taiwan)  # View the first few rows

str(df_taiwan)   # Check the structure and column names

## tibble [35 × 5] (S3: tbl_df/tbl/data.frame)
##  $ location: chr [1:35] "Taiwan" "Taiwan" "Taiwan" "Taiwan" ...
##  $ gender  : chr [1:35] "Female" "Male" "Female" "Male" ...
##  $ height  : num [1:35] 163 190 156 183 162 169 196 199 164 185 ...
##  $ weight  : num [1:35] 57 80 52 105 68 88 69 92 71 102 ...
##  $ date    : chr [1:35] "10-Oct-18" "02-Apr-18" "12-Jul-18" "10-Oct-18" ...

bmi_metric <- function(height, weight) {
  # Convert height from cm to meters
  height_m <- height / 100
  # Calculate BMI
  bmi <- weight / (height_m^2)
  return(bmi)
}

bmi_result <- bmi_metric(df_taiwan$height[1], df_taiwan$weight[1])
print(bmi_result)

## [1] 21.45357

# uncomment the line below to test the bmi calculation on the first row
bmi_metric(df_taiwan$height[1], df_taiwan$weight[1])

## [1] 21.45357

Question 4

Can you write a function called calculate_bmi() that combines both of the BMI functions? You will need to figure out a way to determine which calculation to perform based on the values in the data.

calculate_bmi <- function(height, weight) {
  if (max(height, na.rm = TRUE) > 10) {
    # Assume height is in cm, use metric system
    height_m <- height / 100
    bmi <- weight / (height_m^2)
  } else {
    # Assume height is in feet, use imperial system
    height_in_inches <- height * 12
    bmi <- (703 * weight) / (height_in_inches^2)
  }
  return(bmi)
}


# test
calculate_bmi(df_colorado$height[1], df_colorado$weight[1])

## [1] 42.62802

calculate_bmi(df_taiwan$height[1], df_taiwan$weight[1])

## [1] 21.45357

Question 5

Use your function calculate_bmi() to answer the following questions:

What is the average BMI of the individuals in the Colorado data set?

# Calculate BMI for all individuals in the Colorado dataset
df_colorado$bmi <- calculate_bmi(df_colorado$height, df_colorado$weight)

# Calculate the average BMI
average_bmi_colorado <- mean(df_colorado$bmi, na.rm = TRUE)

# Print the result
print(paste("The average BMI of individuals in the Colorado dataset is:", round(average_bmi_colorado, 2)))

## [1] "The average BMI of individuals in the Colorado dataset is: 45.61"

What is the average BMI of the individuals in the Taiwan data set?

# Assuming df_taiwan is already loaded and `calculate_bmi()` is defined

# Calculate BMI for all individuals in the Taiwan dataset
df_taiwan$bmi <- calculate_bmi(df_taiwan$height, df_taiwan$weight)

# Calculate the average BMI
average_bmi_taiwan <- mean(df_taiwan$bmi, na.rm = TRUE)

# Print the result
print(paste("The average BMI of individuals in the Taiwan dataset is:", round(average_bmi_taiwan, 2)))

## [1] "The average BMI of individuals in the Taiwan dataset is: 22.99"

Question 6

Combine the Colorado and Taiwan data sets into one data frame and calculate the BMI for every row using your calculate_bmi() function. Print the first six rows and the last six rows of that new data set.

df_colorado$location <- "Colorado"
df_taiwan$location <- "Taiwan"


df_colorado <- df_colorado[, c("location", "gender", "height", "weight", "date")]
df_taiwan <- df_taiwan[, c("location", "gender", "height", "weight", "date")]


df_combined <- rbind(df_colorado, df_taiwan)

df_combined$bmi <- calculate_bmi(df_combined$height, df_combined$weight)

print("First six rows:")

## [1] "First six rows:"

print(head(df_combined))

## # A tibble: 6 × 6
##   location gender height weight date         bmi
##   <chr>    <chr>   <dbl>  <dbl> <chr>      <dbl>
## 1 Colorado Male     4.82   203. 20173101  87318.
## 2 Colorado Female   5.22   176. 20170902  64738.
## 3 Colorado Male     4.59   284. 20171802 135015.
## 4 Colorado Male     4.72   320. 20172702 143516.
## 5 Colorado Male     5.02   267. 20170803 105875.
## 6 Colorado Female   5.15   337. 20171703 127201.

# Print the last six rows
print("Last six rows:")

## [1] "Last six rows:"

print(tail(df_combined))

## # A tibble: 6 × 6
##   location gender height weight date        bmi
##   <chr>    <chr>   <dbl>  <dbl> <chr>     <dbl>
## 1 Taiwan   Male      176     77 09-Jan-20  24.9
## 2 Taiwan   Female    181     80 21-Jan-20  24.4
## 3 Taiwan   Male      198    109 22-Mar-20  27.8
## 4 Taiwan   Female    178     65 09-Jan-18  20.5
## 5 Taiwan   Male      167     64 09-Jan-18  22.9
## 6 Taiwan   Female    164     59 22-Mar-20  21.9

Question 7

Make a boxplot that shows the BMI distribution of the combined data, separated by location on the x-axis. Use a theme of your choice, put a title on your graph, and hide the y-axis title.

NOTE: These data are for practice only and are not representative populations, which is why we aren’t comparing them with statistical tests. It would not be responsible to draw any conclusions from this graph!

library(ggplot2)

ggplot(df_combined, aes(x = location, y = bmi)) +
  geom_boxplot(fill = "skyblue", color = "black") +
  theme_minimal() +
  ggtitle("BMI Distribution by Location (Corrected)") +
  xlab("Location") +
  ylab(NULL) +
  theme(plot.title = element_text(hjust = 0.5))

Part 2

Question 8

Recall the patient data from a healthcare facility that we used in Part 2 of Problem Set 7.

We had four tables that were relational to each other and the following keys linking the tables together:

patient_id: patients, schedule
visit_id: schedule, visits
doctor_id: visits, doctors

Use a join to find out which patients have no visits on the schedule.

library(dplyr)

# Perform a left join to find patients without visits on the schedule
patients_without_visits <- patients %>%
  left_join(schedule, by = "patient_id") %>%
  filter(is.na(visit_id))

# View the result
patients_without_visits

Question 9

With this data, can you tell if those patients with no visits on the schedule have been assigned to a doctor? Why or why not?

# Left join patients_no_visits with visits to check for doctor assignments
patients_no_visits <- patients %>%
  left_join(schedule, by = "patient_id") %>%
  filter(is.na(visit_id))  # Patients with no visits on the schedule

# Join with the visits table using visit_id
patients_with_no_doctor <- patients_no_visits %>%
  left_join(visits, by = "visit_id")

# View the result
print("Patients with no visits and their doctor assignment (if any):")

## [1] "Patients with no visits and their doctor assignment (if any):"

print(patients_with_no_doctor)

## # A tibble: 7 × 10
##   patient_id   age race_ethnicity   gender_identity height weight visit_id date 
##        <dbl> <dbl> <chr>            <chr>            <dbl>  <dbl>    <dbl> <chr>
## 1       1013    32 <NA>             man             180       58        NA <NA> 
## 2       1015    71 White            man             151       62        NA <NA> 
## 3       1017    72 Asian            woman           191       62        NA <NA> 
## 4       1023    48 Asian            man               4.82   203.       NA <NA> 
## 5       1033    64 Asian            woman             6.43   256.       NA <NA> 
## 6       1038    38 African America… transgender       4.86   342.       NA <NA> 
## 7       1042    66 White            man               4.95   251.       NA <NA> 
## # ℹ 2 more variables: follow_up <chr>, doctor_id <dbl>

# Check if doctor_id is available
if (all(is.na(patients_with_no_doctor$doctor_id))) {
  print("None of these patients have been assigned to a doctor based on the current data.")
} else {
  print("Some patients without scheduled visits are linked to a doctor.")
}

## [1] "None of these patients have been assigned to a doctor based on the current data."

Based on the dataset and its relational structure, it is not possible to determine whether patients without scheduled visits have been assigned to a doctor. This is because the assignment of doctors is managed through the visits table, which requires a valid visit_id. Patients without a visit_id in the schedule table have no corresponding records in the visits table, leaving their doctor_id values as NA. Since the relationship between patients and doctors is indirect and dependent on scheduled visits, there is no data available to confirm doctor assignment for patients without visits. To address this limitation, additional data or a separate table directly linking patients to doctors (independent of visits) would be needed

Question 10

Assume those patients need primary care and haven’t been assigned a doctor yet. Which primary care doctors have the least amount of visits? Rank them from least to most visits.

library(dplyr)

primary_care_doctors <- doctors %>%
  filter(speciality == "primary care")

doctor_visit_counts <- visits %>%
  group_by(doctor_id) %>%
  summarize(total_visits = n(), .groups = "drop") %>%
  right_join(primary_care_doctors, by = "doctor_id") %>%
  replace_na(list(total_visits = 0))  # Replace NA with 0 for doctors with no visits

ranked_doctors <- doctor_visit_counts %>%
  arrange(total_visits)

ranked_doctors

Part 3

Recall in Problem Set 5, Part 2, we were working with data from New York City that tested children under 6 years old for elevated blood lead levels (BLL). [You can read more about the data on their website]).

About the data:

All NYC children are required to be tested for lead poisoning at around age 1 and age 2, and to be screened for risk of lead poisoning, and tested if at risk, up until age 6. These data are an indicator of children younger that 6 years of age tested in NYC in a given year with blood lead levels (BLL) of 5 mcg/dL or greater. In 2012, CDC established that a blood lead level of 5 mcg/dL is the reference level for exposure to lead in children. This level is used to identify children who have blood lead levels higher than most children’s levels. The reference level is determined by measuring the NHANES blood lead distribution in US children ages 1 to 5 years, and is reviewed every 4 years.

Question 11

Load in a cleaned-up version of the blood lead levels data:

bll_nyc_per_1000 <- read_csv("data/bll_nyc_per_1000.csv")

## Rows: 20 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): borough_id
## dbl (2): time_period, bll_5plus_1k
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(bll_nyc_per_1000)

library(dplyr)

# Filter and arrange data for 2013–2016
bll_nyc_filtered <- bll_nyc_per_1000 %>%
  filter(time_period >= 2013 & time_period <= 2016) %>%
  arrange(borough_id, time_period) %>%
  group_by(borough_id) %>%
  mutate(
    lag_bll = lag(bll_5plus_1k),  # Previous year's value
    change_indicator = case_when(
      is.na(lag_bll) ~ "No change",       # First year has no previous value
      bll_5plus_1k > lag_bll ~ "Increase",  # BLL increased
      bll_5plus_1k < lag_bll ~ "Decrease",  # BLL decreased
      TRUE ~ "No change"                   # No change
    )
  )

# Preview the processed data
head(bll_nyc_filtered)

library(tidyr)

# Reshape data for 2013–2016 into wide format
bll_nyc_wide <- bll_nyc_filtered %>%
  select(borough_id, time_period, bll_5plus_1k, change_indicator) %>%
  pivot_wider(
    names_from = time_period,
    values_from = c(bll_5plus_1k, change_indicator),
    names_glue = "{.value}_Year_{time_period}"
  )


str(bll_nyc_wide)

## gropd_df [5 × 9] (S3: grouped_df/tbl_df/tbl/data.frame)
##  $ borough_id                : chr [1:5] "Bronx" "Brooklyn" "Manhattan" "Queens" ...
##  $ bll_5plus_1k_Year_2013    : num [1:5] 20.1 30.2 15.2 18.2 17.6
##  $ bll_5plus_1k_Year_2014    : num [1:5] 18.7 26.8 14.1 18.5 17.1
##  $ bll_5plus_1k_Year_2015    : num [1:5] 15.7 22.6 10.6 15.4 12
##  $ bll_5plus_1k_Year_2016    : num [1:5] 15 22.3 8.1 14.3 14.8
##  $ change_indicator_Year_2013: chr [1:5] "No change" "No change" "No change" "No change" ...
##  $ change_indicator_Year_2014: chr [1:5] "Decrease" "Decrease" "Decrease" "Increase" ...
##  $ change_indicator_Year_2015: chr [1:5] "Decrease" "Decrease" "Decrease" "Decrease" ...
##  $ change_indicator_Year_2016: chr [1:5] "Decrease" "Decrease" "Decrease" "Decrease" ...
##  - attr(*, "groups")= tibble [5 × 2] (S3: tbl_df/tbl/data.frame)
##   ..$ borough_id: chr [1:5] "Bronx" "Brooklyn" "Manhattan" "Queens" ...
##   ..$ .rows     : list<int> [1:5] 
##   .. ..$ : int 1
##   .. ..$ : int 2
##   .. ..$ : int 3
##   .. ..$ : int 4
##   .. ..$ : int 5
##   .. ..@ ptype: int(0) 
##   ..- attr(*, ".drop")= logi TRUE

# Preview the reshaped data
head(bll_nyc_wide)

Create a formattable table (example below) that shows the elevated blood lead levels per 1000 tested across 2013-2016. If the BLL increases from the previous year, turn the text red. If the BLL decreases from the previous year, turn the text green. To accomplish this color changing, you may want to create three indicator variables that check the value between years (e.g. use if_else). If you’ve have used conditional formatting on excel/google sheets, the concept is the same, but with R.

Note: If you are using if_else (hint hint) and checking by the year, you will likely need to use the left quote, actute, backtip, to reference the variable.

We have also provided you a function that you can use within your formattable table to reference this indicator variable to help reduce the code. However, you do not have to use this, and feel free to change the hex colors.

formattable(
  bll_nyc_wide %>%
    select(borough_id, starts_with("bll_5plus_1k_")),  # Select only the required columns
  list(
    `bll_5plus_1k_Year_2013` = formatter("span", style = function(x) {
      ifelse(
        bll_nyc_wide$change_indicator_Year_2013 == "Increase", 
        "color: #fd626e;",  # Red for increase
        ifelse(
          bll_nyc_wide$change_indicator_Year_2013 == "Decrease", 
          "color: #03d584;",  # Green for decrease
          "color: black;"     # Black for no change
        )
      )
    }),
    `bll_5plus_1k_Year_2014` = formatter("span", style = function(x) {
      ifelse(
        bll_nyc_wide$change_indicator_Year_2014 == "Increase", 
        "color: #fd626e;", 
        ifelse(
          bll_nyc_wide$change_indicator_Year_2014 == "Decrease", 
          "color: #03d584;", 
          "color: black;"
        )
      )
    }),
    `bll_5plus_1k_Year_2015` = formatter("span", style = function(x) {
      ifelse(
        bll_nyc_wide$change_indicator_Year_2015 == "Increase", 
        "color: #fd626e;", 
        ifelse(
          bll_nyc_wide$change_indicator_Year_2015 == "Decrease", 
          "color: #03d584;", 
          "color: black;"
        )
      )
    }),
    `bll_5plus_1k_Year_2016` = formatter("span", style = function(x) {
      ifelse(
        bll_nyc_wide$change_indicator_Year_2016 == "Increase", 
        "color: #fd626e;", 
        ifelse(
          bll_nyc_wide$change_indicator_Year_2016 == "Decrease", 
          "color: #03d584;", 
          "color: black;"
        )
      )
    })
  )
)

borough_id	bll_5plus_1k_Year_2013	bll_5plus_1k_Year_2014	bll_5plus_1k_Year_2015	bll_5plus_1k_Year_2016
Bronx	20.1	18.7	15.7	15.0
Brooklyn	30.2	26.8	22.6	22.3
Manhattan	15.2	14.1	10.6	8.1
Queens	18.2	18.5	15.4	14.3
Staten Island	17.6	17.1	12.0	14.8

Question 12

Starting with the data frame bll_nyc_per_1000 create a table with the DT library showing elevated blood lead levels per 1000 tested in 2013-2016 by borough. Below is an example of the table to replicate.

library(dplyr)
library(DT)

# Step 1: Filter and prepare the data
bll_nyc_filtered <- bll_nyc_per_1000 %>%
  filter(time_period >= 2013 & time_period <= 2016) %>%
  select(borough_id, time_period, bll_5plus_1k) %>%
  rename(Borough = borough_id, Year = time_period, `BLL > 5` = bll_5plus_1k)

# Step 2: Create the DT table
datatable(
  bll_nyc_filtered,
  options = list(
    pageLength = 10,  # Show 10 rows per page
    autoWidth = TRUE, # Adjust column widths automatically
    dom = 't',        # Remove search and pagination (optional, adjust as needed)
    columnDefs = list(list(className = 'dt-center', targets = "_all")) # Center align columns
  ),
  rownames = FALSE,  # Remove row numbers
  caption = "New York City: Elevated Blood Lead Levels 2013-2016 by Borough"
)

Part 4

Question 13

For this question, we will use suicide rates data that comes from the CDC.

Replicate the graph below using plotly.

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

## Rows: 300 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): STATE, NAME, URL
## dbl (2): YEAR, RATE
## num (1): DEATHS
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 300 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): STATE, NAME, URL
## dbl (2): YEAR, RATE
## num (1): DEATHS
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## Rows: 300
## Columns: 6
## $ YEAR   <dbl> 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 201…
## $ STATE  <chr> "AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", "GA", "HI…
## $ NAME   <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colo…
## $ RATE   <dbl> 16.5, 24.6, 19.2, 18.3, 10.9, 21.9, 10.6, 11.4, 15.2, 14.6, 11.…
## $ DEATHS <dbl> 823, 184, 1438, 554, 4491, 1282, 419, 113, 3567, 1569, 176, 417…
## $ URL    <chr> "/nchs/pressroom/states/alabama/al.htm", "/nchs/pressroom/state…

## Loading required package: viridisLite

## Warning: `arrange_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `arrange()` instead.
## ℹ See vignette('programming') for more help
## ℹ The deprecated feature was likely used in the plotly package.
##   Please report the issue at <https://github.com/ropensci/plotly/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Question 14

Create an interactive (choropleth) map with plotly similar to the one presented on the CDC website. On the CDC map you can hover over each state and see the state name in bold, the death rate, and deaths. We can do much of this with plotly. As a challenge, use only the 2018 data to create an interactive US map colored by the suicide rates of that state. When you hover over the state, you should see the state name in bold, the death rate, and the number of deaths.

Some key search terms that may help:

choropleth
hover text plotly r
hover text bold plotly r
plotly and html
html bold
html subtitle

Below is an image of an example final map when you hover over California.

Here is the shell of the map to get you started. Copy the plotly code into the chunk below this one and customize it.

# data pulled from CDC website described above 
df_suicide <- read_csv("data/Suicide Mortality by State.csv") %>%
  filter(YEAR == 2018)

plot_ly(df_suicide,
        type="choropleth",
        locationmode = "USA-states") %>%
  layout(geo=list(scope="usa"))

# Load necessary libraries
library(plotly)
library(dplyr)
library(readr)


df_suicide <- read_csv("data/Suicide Mortality by State.csv") %>%
  filter(YEAR == 2018)

## Rows: 300 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): STATE, NAME, URL
## dbl (2): YEAR, RATE
## num (1): DEATHS
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

df_suicide <- df_suicide %>%
  mutate(hover_text = paste0(
    "<b>", NAME, "</b><br>",
    "Rate: ", RATE, "<br>",
    "Deaths: ", DEATHS
  ))


plot_ly(
  data = df_suicide,
  type = "choropleth",
  locations = ~STATE,           
  locationmode = "USA-states",  
  z = ~RATE,                    
  text = ~hover_text,           
  hoverinfo = "text",           
  colorscale = "Blues",         
  colorbar = list(title = "Rate")  
) %>%
  layout(
    title = "Suicide Mortality Rates by State (2018)",
    geo = list(
      scope = "usa",            
      projection = list(type = "albers usa"), 
      showlakes = TRUE,         
      lakecolor = "rgb(255, 255, 255)"  
    )
  )