Be sure to change the author in the YAML to your name. Remember to keep it inside the quotes.
Questions that require the use of R will have an R code chunk below it.
Download the Registered Nurses dataset (titled nurses.csv) and nurses-salary.png from the Canvas page for this assignment and save these files to the folder where the RMD file is located.
There is less hand-holding in this assignment.
For this Challenge Problem assignment, you are going to be using the Data.World dataset on registered nurses in the United States between 1998 and 2020.1 The data includes total numbers of registered nurse and salary information at the state-level by year. The original documentation for this dataset can be found on its Tidy Tuesday page. [Note: I modified the names of the variables so there were no special characters or spaces, so be sure to inspect the names of the variables before working with the dataset.]
Practice Problems
Now let’s practice making tables and data visualizations to answer questions. Each question is supposed to be stand alone and not build from each other (unless specified).
Use the following posed question for Questions 1-4:
What is the general trend of total number of registered nurses across the years 1998 and 2020 in the United States?
nurses %>%
group_by(Year) %>%
summarise(total_rn = sum(Total_Employed_RN, na.rm = TRUE)) %>%
gt() %>%
tab_header(title = "Total Number of Registered Nurses (1998-2020)",
subtitle = "Total number of employed nurses by year") %>%
cols_label(total_rn = "Total Registered Nurses") %>%
fmt_number(columns = total_rn, decimals = 0)
| Total Number of Registered Nurses (1998-2020) | |
| Total number of employed nurses by year | |
| Year | Total Registered Nurses |
|---|---|
| 1998 | 2,043,250 |
| 1999 | 2,217,390 |
| 2000 | 2,203,940 |
| 2001 | 2,231,920 |
| 2002 | 2,253,860 |
| 2003 | 2,261,050 |
| 2004 | 2,327,450 |
| 2005 | 2,383,940 |
| 2006 | 2,433,610 |
| 2007 | 2,485,530 |
| 2008 | 2,560,710 |
| 2009 | 2,602,430 |
| 2010 | 2,673,150 |
| 2011 | 2,743,600 |
| 2012 | 2,652,290 |
| 2013 | 2,681,580 |
| 2014 | 2,705,890 |
| 2015 | 2,764,180 |
| 2016 | 2,875,400 |
| 2017 | 2,926,470 |
| 2018 | 2,972,150 |
| 2019 | 3,002,940 |
| 2020 | 3,006,970 |
nurses %>%
group_by(Year) %>%
summarise(total_rn = sum(Total_Employed_RN, na.rm = TRUE)) %>%
ggplot(aes(x = Year, y = total_rn)) +
geom_line(color = "red", linewidth = 1.2) +
geom_point(color = "black") +
labs(
title = "Total Registered Nurses (1998-2020)",
x = "Year", y = "Total Registered Nurses") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))
Use the following posed question for Questions 5-9:
For the subset of states in the midwest (Minnesota, Wisconsin, Iowa, Illinois), how have the median hourly wages changed between 2000 and 2020?

wages <- nurses %>%
filter(State %in% c("Minnesota", "Wisconsin", "Iowa", "Illinois"),
Year %in% c(2000, 2020)) %>%
select(State, Year, Hourly_Wage_Median) %>%
group_by(State) %>%
mutate(wage_order = max(Hourly_Wage_Median)) %>%
ungroup() %>%
arrange(desc(wage_order), State, Year) %>%
select(-wage_order)
wages %>%
gt(groupname_col = "State", rowname_col = "Year") %>%
tab_header(title = "Median Hourly Wages (2000-2020)") %>%
cols_label(Hourly_Wage_Median = "Registered Nurses Median Wage") %>%
fmt_currency(columns = Hourly_Wage_Median, currency = "USD")
| Median Hourly Wages (2000-2020) | |
| Registered Nurses Median Wage | |
|---|---|
| Minnesota | |
| 2000 | $23.37 |
| 2020 | $38.24 |
| Wisconsin | |
| 2000 | $20.81 |
| 2020 | $35.35 |
| Illinois | |
| 2000 | $21.56 |
| 2020 | $34.91 |
| Iowa | |
| 2000 | $17.46 |
| 2020 | $29.39 |
diff() function) for each state.[Hint: You may need to arrange the data so that 2000 rows come first and then 2020 rows come next]
wages_changes <- wages %>%
filter(State %in% c("Minnesota", "Wisconsin", "Iowa", "Illinois"),
Year %in% c(2000, 2020)) %>%
group_by(State, Year) %>%
summarise(Hourly_Wage_Median = median(Hourly_Wage_Median, na.rm = TRUE), .groups = "drop")
nurse_wage <- wages_changes %>%
pivot_wider(names_from = Year, values_from = Hourly_Wage_Median) %>%
mutate(Change = `2020` - `2000`) %>%
pivot_longer(cols = c(`2000`, `2020`, Change), names_to = "Year", values_to = "Hourly_Wage_Median") %>%
mutate(
State = factor(State, levels = c("Minnesota", "Wisconsin", "Illinois", "Iowa")),
Year = factor(Year, levels = c("2000", "2020", "Change"))) %>%
arrange(State, Year)
nurse_wage %>%
gt(groupname_col = "State", rowname_col = "Year") %>%
tab_header(
title = "Median Hourly Wage Changes (2000-2020)",
subtitle = "Includes Median Wages 2000 & 2020") %>%
cols_label(Hourly_Wage_Median = "Registered Nurses Median Wage") %>%
fmt_currency(columns = Hourly_Wage_Median, currency = "USD")
| Median Hourly Wage Changes (2000-2020) | |
| Includes Median Wages 2000 & 2020 | |
| Registered Nurses Median Wage | |
|---|---|
| Minnesota | |
| 2000 | $23.37 |
| 2020 | $38.24 |
| Change | $14.87 |
| Wisconsin | |
| 2000 | $20.81 |
| 2020 | $35.35 |
| Change | $14.54 |
| Illinois | |
| 2000 | $21.56 |
| 2020 | $34.91 |
| Change | $13.35 |
| Iowa | |
| 2000 | $17.46 |
| 2020 | $29.39 |
| Change | $11.93 |
median_wages <- wages %>%
filter(State %in% c("Minnesota", "Wisconsin", "Iowa", "Illinois"),
Year %in% c(2000, 2020))
ggplot(median_wages, aes(x = Year, y = Hourly_Wage_Median, group = State, color = State)) +
geom_line(linewidth = 1.2) +
geom_point(size = 3) +
geom_text(aes(label = round(Hourly_Wage_Median, 1)),
vjust = -0.5, size = 3,
position = position_nudge(y = 0.7)) +
scale_x_continuous(breaks = c(2000, 2020)) +
scale_color_manual(values = c("red", "purple", "darkgreen", "blue")) +
labs(
title = "Median Hourly Wage Changes (2000-2020)",
x = "Year",
y = "Median Hourly Wages",
color = "State") +
theme_minimal(base_size = 11) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.title = element_text(face = "bold"),
panel.grid.minor = element_blank()
)
Putting It All Together
Now let’s put the grammar of tables to use (in addition to data wrangling) to answer a question.
For nurses, how do the median annual salaries compare across the states pre-pandemic (2019) vs. post-pandemic (2020)?
Tips:
Because this problem requires some tricky data wrangling that we haven’t covered yet (but will later), I have actually started this for you
You may still need to arrange the data so that 2019 rows come first and then 2020 rows come next.
The color palette in the table uses the following arguments:
palette = c("#ffffff", "#f2fbd2", "#c9ecb4", "#93d3ab", "#35b0ab")
and domain = NULL.
Your end result should show all 54 rows and not a subset of the rows, as shown in the image.
# you may need to change the name of the data set to match yours
wrangle_nurses <- nurses %>%
filter(Year %in% c(2019, 2020)) %>%
mutate(total_rn = Total_Employed_RN/1000,
annual_median = Annual_Salary_Median/1000) %>%
select(State, Year, total_rn, annual_median) %>%
pivot_wider(names_from = Year, values_from = c(total_rn, annual_median)) %>%
mutate(pct_diff = annual_median_2020/annual_median_2019 - 1) %>%
select(State, total_rn_2019, total_rn_2020, annual_median_2019, annual_median_2020, pct_diff) %>%
arrange(desc(pct_diff))
wrangle_nurses %>%
gt() %>%
tab_header(
title = md("**For nurses, how do the median annual salaries compare across the states pre-pandemic (2019) vs. post-pandemic (2020)?**")) %>%
cols_label(
State = "State",
total_rn_2019 = html("2019<sup>1</sup>"),
total_rn_2020 = html("2020<sup>1</sup>"),
annual_median_2019 = html("2019<sup>2</sup>"),
annual_median_2020 = html("2020<sup>2</sup>"),
pct_diff = html("Relative Percent Change<sup>3</sup>")) %>%
fmt_number(columns = c(total_rn_2019, total_rn_2020), decimals = 2) %>%
fmt_currency(columns = c(annual_median_2019, annual_median_2020), currency = "USD", decimals = 2) %>%
fmt_percent(columns = pct_diff, decimals = 2) %>%
gt_color_rows(columns = pct_diff,
palette = c("#ffffff", "#f2fbd2", "#c9ecb4", "#93d3ab", "#35b0ab"),
domain = NULL) %>%
tab_source_note(
source_note = html("Data source: Registered Nurses dataset (Tidy Tuesday, Oct. 05, 2019)<br>
1 Total employed, registered nurses, (per a thousand)<br>
2 Median annual salary, (per a thousand)<br>
3 Median annual salary percent change = 2020/2019 − 1"))
## Warning: Domain not specified, defaulting to observed range within each
## specified column.
| For nurses, how do the median annual salaries compare across the states pre-pandemic (2019) vs. post-pandemic (2020)? | |||||
| State | 20191 | 20201 | 20192 | 20202 | Relative Percent Change3 |
|---|---|---|---|---|---|
| Washington | 58.00 | 59.30 | $83.49 | $89.65 | 7.38% |
| California | 302.77 | 307.06 | $110.62 | $118.41 | 7.04% |
| North Dakota | 9.75 | 9.97 | $64.68 | $68.80 | 6.37% |
| Wyoming | 5.12 | 5.01 | $66.29 | $70.45 | 6.28% |
| Guam | 0.68 | 0.69 | $56.36 | $58.98 | 4.65% |
| Nebraska | 23.80 | 24.06 | $65.01 | $68.01 | 4.61% |
| Maryland | 53.15 | 51.48 | $76.78 | $79.81 | 3.95% |
| Alaska | 6.21 | 6.24 | $90.55 | $94.07 | 3.89% |
| Oregon | 36.66 | 36.84 | $93.37 | $96.79 | 3.66% |
| South Carolina | 46.86 | 45.93 | $63.75 | $66.05 | 3.61% |
| Pennsylvania | 148.04 | 146.64 | $70.54 | $72.97 | 3.44% |
| North Carolina | 99.96 | 99.11 | $64.60 | $66.82 | 3.44% |
| Louisiana | 40.87 | 41.69 | $64.04 | $66.24 | 3.44% |
| Vermont | 7.02 | 6.81 | $67.37 | $69.67 | 3.41% |
| Arkansas | 25.21 | 25.30 | $60.34 | $62.33 | 3.30% |
| Oklahoma | 31.35 | 32.95 | $63.65 | $65.68 | 3.19% |
| Massachusetts | 81.02 | 84.03 | $87.54 | $90.29 | 3.14% |
| Virginia | 66.04 | 66.45 | $70.23 | $72.42 | 3.12% |
| Utah | 21.65 | 23.69 | $65.24 | $67.18 | 2.97% |
| New Mexico | 17.35 | 17.10 | $73.18 | $75.35 | 2.97% |
| New Hampshire | 14.32 | 13.84 | $72.69 | $74.84 | 2.96% |
| Kansas | 30.37 | 29.43 | $60.80 | $62.55 | 2.88% |
| New York | 178.32 | 178.55 | $87.33 | $89.84 | 2.87% |
| Arizona | 54.59 | 55.52 | $76.82 | $79.01 | 2.85% |
| Idaho | 14.11 | 12.80 | $69.32 | $71.28 | 2.83% |
| West Virginia | 19.83 | 19.80 | $61.45 | $63.18 | 2.82% |
| Missouri | 68.84 | 71.56 | $62.47 | $64.22 | 2.80% |
| Wisconsin | 61.93 | 63.63 | $71.56 | $73.54 | 2.77% |
| Iowa | 32.98 | 32.61 | $59.49 | $61.13 | 2.76% |
| Ohio | 125.47 | 129.09 | $65.79 | $67.58 | 2.72% |
| Texas | 218.09 | 219.33 | $73.40 | $75.32 | 2.62% |
| Florida | 181.67 | 183.13 | $65.83 | $67.51 | 2.55% |
| Virgin Islands | 0.46 | 0.52 | $70.44 | $72.12 | 2.39% |
| South Dakota | 12.95 | 13.13 | $58.67 | $60.00 | 2.27% |
| Mississippi | 29.55 | 28.80 | $58.59 | $59.85 | 2.15% |
| Indiana | 67.51 | 66.46 | $63.67 | $65.00 | 2.09% |
| Colorado | 52.51 | 52.33 | $75.10 | $76.50 | 1.86% |
| Georgia | 75.43 | 73.23 | $68.41 | $69.63 | 1.78% |
| Tennessee | 63.33 | 61.04 | $61.20 | $62.22 | 1.67% |
| Kentucky | 43.84 | 42.94 | $62.18 | $63.06 | 1.42% |
| Connecticut | 34.74 | 33.40 | $81.63 | $82.77 | 1.40% |
| Illinois | 129.53 | 127.45 | $71.62 | $72.61 | 1.38% |
| New Jersey | 80.14 | 78.59 | $83.92 | $84.99 | 1.28% |
| Minnesota | 71.00 | 70.82 | $78.56 | $79.54 | 1.25% |
| Montana | 10.31 | 9.98 | $67.92 | $68.74 | 1.21% |
| Michigan | 96.90 | 97.82 | $72.26 | $73.04 | 1.08% |
| Maine | 14.49 | 14.16 | $68.94 | $69.51 | 0.83% |
| Nevada | 22.94 | 23.42 | $87.40 | $87.96 | 0.64% |
| Puerto Rico | 19.51 | 19.25 | $33.07 | $33.15 | 0.24% |
| Rhode Island | 12.63 | 12.15 | $83.14 | $83.06 | −0.10% |
| Hawaii | 11.33 | 11.26 | $110.70 | $110.41 | −0.26% |
| Alabama | 49.19 | 48.85 | $58.80 | $58.63 | −0.29% |
| Delaware | 11.73 | 11.41 | $72.83 | $72.11 | −0.99% |
| District of Columbia | 10.89 | 10.32 | $91.47 | $89.44 | −2.22% |
| Data source: Registered Nurses dataset (Tidy Tuesday, Oct. 05, 2019) 1 Total employed, registered nurses, (per a thousand) 2 Median annual salary, (per a thousand) 3 Median annual salary percent change = 2020/2019 − 1 |
|||||