DATA608 FinalProject

#Intoduction

I am creating a visualization for my final project that highlights the global nursing shortage and its impact on healthcare systems worldwide. The nursing shortage has been declared a “global health emergency,” and it has worsened due to the COVID-19 pandemic. To ensure the reliability and relevance of the data, I will be using data sources such as the ICN’s Recover to Rebuild report, which provides evidence-based recommendations for investing in the nursing workforce, and data on nursing and midwifery personnel density from the World Bank and United Nations.

The data sets will include geography (global), time frame (various years), and data points such as the number of nurses and midwives per 1,000 people and nursing and midwifery personnel density per 1,000 population. The data shows that many countries around the world are experiencing a shortage of nurses and midwives, which can lead to inadequate healthcare services, increased workload and burnout for existing nurses, and ultimately, negative health outcomes for patients.

Creating this visualization is important because it sheds light on a critical issue in global healthcare and can inform policy decisions and investments in the nursing workforce. To create the visualization, I will be using ggplot and plotly, and I will be including interactive elements to allow the exploration of the data and understand the impact of the nursing shortage on healthcare systems worldwide.

My goal for this final project is to increase awareness of the nursing shortage and its impact on healthcare systems globally, and to provide insights and recommendations for addressing this critical issue. By using reliable data sources and robust visualization tools, I aim to present a compelling case for investing in the nursing workforce and improving healthcare outcomes for people around the world.

#Data Source

Link to Articles and Data Sources:

Nursing shortage branded a “global health emergency” - https://www.nursingtimes.net/news/global-nursing/nurse-shortage-branded-a-global-health-emergency-23-03-2023/
Recover to Rebuild: Investing in the Nursing Workforce for Health System Effectiveness - https://www.icn.ch/system/files/2023-03/ICN_Recover-to-Rebuild_report_EN.pdf
Nurses and midwives (per 1,000 people) - https://data.worldbank.org/indicator/SH.MED.NUMW.P3
Nursing and midwifery personnel density (per 1000 population) - http://data.un.org/Data.aspx?q=nursing+personnel&d=WHO&f=MEASURE_CODE:HRH_33

#Import libraries

Load all the necessary packages

library(ggplot2)
library(graphics)
library(stats)
library(plyr)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.1     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.1     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::arrange()   masks plyr::arrange()
## ✖ purrr::compact()   masks plyr::compact()
## ✖ dplyr::count()     masks plyr::count()
## ✖ dplyr::desc()      masks plyr::desc()
## ✖ dplyr::failwith()  masks plyr::failwith()
## ✖ dplyr::filter()    masks stats::filter()
## ✖ dplyr::id()        masks plyr::id()
## ✖ dplyr::lag()       masks stats::lag()
## ✖ dplyr::mutate()    masks plyr::mutate()
## ✖ dplyr::rename()    masks plyr::rename()
## ✖ dplyr::summarise() masks plyr::summarise()
## ✖ dplyr::summarize() masks plyr::summarize()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

library(knitr)
library(tidyr)
library(data.table)

## 
## Attaching package: 'data.table'
## 
## The following objects are masked from 'package:lubridate':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## 
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## 
## The following object is masked from 'package:purrr':
## 
##     transpose

library(lubridate)
library(purrr)
library(rtweet)

## 
## Attaching package: 'rtweet'
## 
## The following object is masked from 'package:purrr':
## 
##     flatten

library(tidytext)
library(lubridate)
library(dplyr)
library(plotly)

## 
## Attaching package: 'plotly'
## 
## The following objects are masked from 'package:plyr':
## 
##     arrange, mutate, rename, summarise
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

library(maps)

## 
## Attaching package: 'maps'
## 
## The following object is masked from 'package:purrr':
## 
##     map
## 
## The following object is masked from 'package:plyr':
## 
##     ozone

library(cowplot)

## 
## Attaching package: 'cowplot'
## 
## The following object is masked from 'package:lubridate':
## 
##     stamp

library(lubridate)
library(ggthemes)

## 
## Attaching package: 'ggthemes'
## 
## The following object is masked from 'package:cowplot':
## 
##     theme_map

library(scales)

## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor

library(htmlwidgets)
library(magrittr)

## 
## Attaching package: 'magrittr'
## 
## The following object is masked from 'package:purrr':
## 
##     set_names
## 
## The following object is masked from 'package:tidyr':
## 
##     extract

library(here)

## here() starts at C:/Users/Ivant/Desktop
## 
## Attaching package: 'here'
## 
## The following object is masked from 'package:plyr':
## 
##     here

library(reshape2)

## 
## Attaching package: 'reshape2'
## 
## The following objects are masked from 'package:data.table':
## 
##     dcast, melt
## 
## The following object is masked from 'package:tidyr':
## 
##     smiths

library(ggrepel)
library(ModelMetrics)

## 
## Attaching package: 'ModelMetrics'
## 
## The following object is masked from 'package:base':
## 
##     kappa

library(readr)
library(e1071)
library(corrplot)

## corrplot 0.92 loaded

library(FactoMineR)
library(VIFCP)
library(kableExtra)

## 
## Attaching package: 'kableExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows

library(Hmisc)

## 
## Attaching package: 'Hmisc'
## 
## The following object is masked from 'package:e1071':
## 
##     impute
## 
## The following object is masked from 'package:plotly':
## 
##     subplot
## 
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## 
## The following objects are masked from 'package:plyr':
## 
##     is.discrete, summarize
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, units

library(pROC)

## Type 'citation("pROC")' for a citation.
## 
## Attaching package: 'pROC'
## 
## The following object is masked from 'package:ModelMetrics':
## 
##     auc
## 
## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var

library(binr)

#Data Sample

The dataset provided in the following code consists of information about the number of nurses and midwives per 1,000 people in various countries.

First 5 rows of dataset provides a glimpse into the dataset’s structure and variables, such as country name, country code, and data for various years.

dataset <- read.csv("https://raw.githubusercontent.com/IvanGrozny88/DATA608-Final-Project/main/API_SH.MED.NUMW.P3_DS2_en_csv_v2_5359576.csv",skip=3)
head(dataset,5)

##                  Country.Name Country.Code
## 1                       Aruba          ABW
## 2 Africa Eastern and Southern          AFE
## 3                 Afghanistan          AFG
## 4  Africa Western and Central          AFW
## 5                      Angola          AGO
##                           Indicator.Name Indicator.Code X1960 X1961 X1962 X1963
## 1 Nurses and midwives (per 1,000 people) SH.MED.NUMW.P3    NA    NA    NA    NA
## 2 Nurses and midwives (per 1,000 people) SH.MED.NUMW.P3    NA    NA    NA    NA
## 3 Nurses and midwives (per 1,000 people) SH.MED.NUMW.P3    NA    NA    NA    NA
## 4 Nurses and midwives (per 1,000 people) SH.MED.NUMW.P3    NA    NA    NA    NA
## 5 Nurses and midwives (per 1,000 people) SH.MED.NUMW.P3    NA    NA    NA    NA
##   X1964 X1965 X1966 X1967 X1968 X1969 X1970 X1971 X1972 X1973 X1974 X1975 X1976
## 1    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 2    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 3    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 4    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 5    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
##   X1977 X1978 X1979 X1980 X1981 X1982 X1983 X1984 X1985 X1986 X1987 X1988 X1989
## 1    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 2    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 3    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 4    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 5    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
##   X1990 X1991 X1992 X1993 X1994 X1995 X1996  X1997 X1998 X1999 X2000 X2001
## 1    NA    NA    NA    NA    NA    NA    NA     NA    NA    NA    NA    NA
## 2    NA    NA    NA    NA    NA    NA    NA     NA    NA    NA    NA    NA
## 3    NA    NA    NA    NA    NA    NA    NA     NA    NA    NA    NA    NA
## 4    NA    NA    NA    NA    NA    NA    NA     NA    NA    NA    NA    NA
## 5    NA    NA    NA    NA    NA    NA    NA 0.9163    NA    NA    NA    NA
##   X2002 X2003  X2004 X2005 X2006  X2007  X2008  X2009     X2010 X2011 X2012
## 1    NA    NA     NA    NA    NA     NA     NA     NA        NA    NA    NA
## 2    NA    NA     NA    NA    NA     NA     NA     NA 0.5996272    NA    NA
## 3    NA    NA     NA 0.582  0.44 0.4956 0.4971 0.6078        NA    NA    NA
## 4    NA    NA     NA    NA    NA     NA     NA     NA 0.9105157    NA    NA
## 5    NA    NA 0.9854    NA    NA     NA     NA 1.3144        NA    NA    NA
##    X2013  X2014  X2015  X2016  X2017    X2018 X2019 X2020 X2021  X
## 1     NA     NA     NA     NA     NA       NA    NA    NA    NA NA
## 2     NA     NA     NA     NA     NA 1.292536    NA    NA    NA NA
## 3 0.2495 0.1476 0.1299 0.1482 0.1755 0.446100    NA    NA    NA NA
## 4     NA     NA     NA     NA     NA 1.197929    NA    NA    NA NA
## 5     NA     NA     NA     NA     NA 0.407500    NA    NA    NA NA

Display column Names and no. of rows.

The following code returns the column names of the dataset, which is important for understanding the data and performing data analysis. By displaying the column names, we can get an idea of what kind of information is included in the dataset and what each column represents.

colnames(dataset)

##  [1] "Country.Name"   "Country.Code"   "Indicator.Name" "Indicator.Code"
##  [5] "X1960"          "X1961"          "X1962"          "X1963"         
##  [9] "X1964"          "X1965"          "X1966"          "X1967"         
## [13] "X1968"          "X1969"          "X1970"          "X1971"         
## [17] "X1972"          "X1973"          "X1974"          "X1975"         
## [21] "X1976"          "X1977"          "X1978"          "X1979"         
## [25] "X1980"          "X1981"          "X1982"          "X1983"         
## [29] "X1984"          "X1985"          "X1986"          "X1987"         
## [33] "X1988"          "X1989"          "X1990"          "X1991"         
## [37] "X1992"          "X1993"          "X1994"          "X1995"         
## [41] "X1996"          "X1997"          "X1998"          "X1999"         
## [45] "X2000"          "X2001"          "X2002"          "X2003"         
## [49] "X2004"          "X2005"          "X2006"          "X2007"         
## [53] "X2008"          "X2009"          "X2010"          "X2011"         
## [57] "X2012"          "X2013"          "X2014"          "X2015"         
## [61] "X2016"          "X2017"          "X2018"          "X2019"         
## [65] "X2020"          "X2021"          "X"

Merging columns into one and display.

In this code block, I have merged multiple columns of the dataset. This transformation has allowed me to convert the dataset from a wide format to a long format, making it easier to work with and visualize.

df<-pivot_longer(dataset,c("X1960","X1961","X1962","X1963","X1964","X1965","X1966","X1967","X1968","X1969","X1970","X1971","X1972","X1973","X1974","X1975","X1976","X1977","X1978","X1979","X1980","X1981","X1982","X1983","X1984","X1985","X1986","X1987","X1988","X1989","X1990","X1991","X1992","X1993","X1994","X1995","X1996","X1997","X1998","X1999","X2000","X2001","X2002","X2003","X2004","X2005","X2006","X2007","X2008","X2009","X2010","X2011","X2012","X2013","X2014","X2015","X2016","X2017","X2018","X2019","X2020","X2021","X"), values_to = "Value")
head(df,5)

## # A tibble: 5 × 6
##   Country.Name Country.Code Indicator.Name            Indicator.Code name  Value
##   <chr>        <chr>        <chr>                     <chr>          <chr> <dbl>
## 1 Aruba        ABW          Nurses and midwives (per… SH.MED.NUMW.P3 X1960    NA
## 2 Aruba        ABW          Nurses and midwives (per… SH.MED.NUMW.P3 X1961    NA
## 3 Aruba        ABW          Nurses and midwives (per… SH.MED.NUMW.P3 X1962    NA
## 4 Aruba        ABW          Nurses and midwives (per… SH.MED.NUMW.P3 X1963    NA
## 5 Aruba        ABW          Nurses and midwives (per… SH.MED.NUMW.P3 X1964    NA

The below code is used to merge the name column and the years column into one column called “years” separated by a space. Then, the “Country.Name”, “Country.Code”, “Indicator.Name”, “Indicator.Code”, “years”, and “Value” columns are selected and arranged in the order shown. This code is useful for re-arranging and selecting specific columns in a data frame for further analysis.

df$years <- paste(df$name, df$years, sep = " ")

## Warning: Unknown or uninitialised column: `years`.

df <- df[c("Country.Name","Country.Code","Indicator.Name","Indicator.Code", "years", "Value")]
head(df,5)

## # A tibble: 5 × 6
##   Country.Name Country.Code Indicator.Name            Indicator.Code years Value
##   <chr>        <chr>        <chr>                     <chr>          <chr> <dbl>
## 1 Aruba        ABW          Nurses and midwives (per… SH.MED.NUMW.P3 "X19…    NA
## 2 Aruba        ABW          Nurses and midwives (per… SH.MED.NUMW.P3 "X19…    NA
## 3 Aruba        ABW          Nurses and midwives (per… SH.MED.NUMW.P3 "X19…    NA
## 4 Aruba        ABW          Nurses and midwives (per… SH.MED.NUMW.P3 "X19…    NA
## 5 Aruba        ABW          Nurses and midwives (per… SH.MED.NUMW.P3 "X19…    NA

#Summary

provides a quick summary of the dataset, including the minimum, maximum, median, mean, and quartile values.

summary (df)

##  Country.Name       Country.Code       Indicator.Name     Indicator.Code    
##  Length:16758       Length:16758       Length:16758       Length:16758      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##     years               Value       
##  Length:16758       Min.   : 0.048  
##  Class :character   1st Qu.: 1.504  
##  Mode  :character   Median : 4.035  
##                     Mean   : 4.855  
##                     3rd Qu.: 6.958  
##                     Max.   :22.308  
##                     NA's   :13903

#Data Exploration and Visualization

The following code generates a bar chart using the ggplot2 library to visualize the nursing shortage by Country Code. The dataset is grouped by Country Code and a count of the years with nursing shortage is obtained using the summarise function. The resulting dataframe is then used to generate the bar chart using ggplot2. The x-axis of the chart represents the Country Code, and the y-axis represents the Value of years. The reorder function is used to order the bars in descending order based on the Value of years. The geom_bar function is used to create the bars, and the stat parameter is set to “identity” to show the actual values of the bars. The fill parameter is set to “blue” to set the bar color. The labs function is used to add a title to the chart, and to label the x and y axes. The coord_flip function is used to flip the chart so that the Country Codes are displayed vertically. The geom_text function is used to add the text labels to the bars, which displays the Value of years.

league <- df %>% 
  group_by(Country.Code) %>%
  summarise(Value = n())


# Visualization using ggplot 
ggplot(league, aes(x=reorder(Country.Code, Value), y=Value)) +
  geom_bar(stat= "identity", fill="blue")+labs(title="Nursing shortage by Country.Code", x="Country.Code", y="Value of years") + coord_flip()+geom_text(aes(label=Value), vjust=0.1, hjust=10, size=5, color="black")

This code below creates a scatter plot using ggplot to visualize the relationship between nursing shortage years and country codes. The x-axis represents the country codes, and the y-axis represents the number of years for which nursing shortages were reported. Each data point in the scatter plot represents a country’s nursing shortage data. The alpha parameter of the geom_point function is set to 0.5, which sets the transparency of the points to 50%, making the plot easier to read. The labs function is used to add a title and axis labels to the plot. This plot provides a visual representation of the distribution of nursing shortage years across different countries.

# Create a scatter plot using ggplot
ggplot(df, aes(x = Country.Code, y = Value)) +
  geom_point(alpha = 0.5) +
  labs(title = "Nursing shortage of years vs. Country.Code",
       x = "Country.Code",
       y = "years")

## Warning: Removed 13903 rows containing missing values (`geom_point()`).

The following code creates an interactive scatter plot using Plotly and provides an interactive way to explore the dataset and identify any patterns or trends in the data. The scatter plot visualizes the relationship between the Country.Code and Value columns in the dataset, where each point represents a data point. The color of each point is determined by the Value column. Hovering over a point in the scatter plot displays the name of the country, the Country.Code, and the value of years. The title of the scatter plot is “Value of years vs. Country.Code,” and the labels for the x-axis and y-axis are “Country.Code” and “Value of years,” respectively.

# Create an interactive scatter plot using Plotly
plot_ly(df, x = ~Country.Code, y = ~Value, color = ~Value,
        type = "scatter", mode = "markers",
        hovertemplate = paste("Country: %{text}",
                              "<br>Country.Code: %{x:.2f}",
                              "<br>Value of years: %{y:.2f}")) %>%
  layout(title = "Value of years vs. Country.Code",
         xaxis = list(title = "Country.Code"),
         yaxis = list(title = "Value of years"),
         hovermode = "closest")

## Warning: Ignoring 13903 observations

The follwoing plot shows the impact of nursing shortage on healthcare systems globally, as measured by the number of years of nursing shortage. It demonstrates that nursing shortage is a common issue across many countries, with some countries experiencing shortages for longer periods than others. The use of a pink background adds a sense of urgency to the issue, emphasizing the need for action to address the shortage and its impact on healthcare systems. In the following interactive map I am showing the nursing shortage data by country and year.

#create map
p <- plot_geo(df, locationmode = 'world') %>%
add_trace( z = ~df$Value, locations = df$Country.Code, frame=~df$years,
color = ~df$Value)
#export as html file
htmlwidgets::saveWidget(p, file = "map.html")

## Warning: Ignoring 13903 observations

## Warning: Ignoring 13903 observations

#Conclusion

Many countries are currently experiencing a shortage of nurses and midwives, which results in inadequate healthcare services, increased workload and burnout for existing nurses, and negative health outcomes for patients.

To illustrate the severity of this issue, I have used reliable data sources and robust visualization tools such as the ggplot and plotly libraries to create interactive visualizations that allow for exploring the data and gaining a better understanding of the impact of the nursing shortage on healthcare systems worldwide.

My aim in presenting this data is to raise awareness of the nursing shortage and its impact on healthcare systems globally. Additionally, I hope to provide insights and recommendations for addressing this critical issue. Through policy decisions and investments in the nursing workforce, we can improve healthcare outcomes and provide better care for patients worldwide.

#References

https://www.analytics-tuts.com/map-with-time-slider-using-plotly-in-r/

https://ggplot2.tidyverse.org/index.html

https://dplyr.tidyverse.org/index.html

https://plotly.com/r/line-and-scatter/

DATA608 FinalProject

IvanTikhonov

2023-04-20