--- 
title: "Sai Lasya Navoor"
output: html_document
---

#Importing data

data <- read.csv("C:\\Users\\91814\\Desktop\\Statistics\\nurses.csv")
# Install and load ggplot2
if (!requireNamespace("ggplot2", quietly = TRUE)) {
  install.packages("ggplot2")
}
library(ggplot2)

#Column 1 summary

col1_summary <- summary(data$Total_Employed_RN)

print(col1_summary)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     240   12210   31160   47704   60230  307060       5

The summary statistics for Total_Employed_RN reveal insights into the distribution of employed Registered Nurses:

Min: 240 (lowest employed RNs in a state) 1st Qu.: 12,210 (25% states below) Median: 31,160 (middle value) Mean: 47,704 (average across states) 3rd Qu.: 60,230 (75% states below) Max: 307,060 (highest employed RNs in a state) NA’s: 5 missing values Significance: This data provides a snapshot of the distribution of RN employment across states, offering insights into workforce variations.

Further Questions:

Outliers: Are there states with exceptionally high or low RN employment? Regional Disparities: Are there regional patterns in RN distribution? Healthcare Impact: How does RN variation correlate with healthcare services? Temporal Trends: Are there trends in RN employment over different years?

#Column 2 summary

col2_summary <- summary(data$Annual_Salary_Avg) 
print(col2_summary)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   19190   49300   58750   59248   67378  120560       6

The summary statistics for Annual_Salary_Avg provide insights into the distribution of average annual salaries for Registered Nurses:

Significance: This data sheds light on the range and central tendency of RN average annual salaries, offering insights into the financial landscape for nurses across different states.

Further Questions:

  1. Regional Disparities: Are there regional patterns in RN average annual salaries?

  2. Correlation with Employment: How does salary variation correlate with the total number of employed RNs?

  3. Workforce Impact: How do salary levels influence RN availability and retention?

  4. Factors Affecting Salaries: What state-specific factors contribute to the wide range of RN salaries?

#Categorical column summary

col3_unique_val <- unique(data$State) 
col3_val_count <- table(data$State)

cat("Categorical Summary for State :\n") 
## Categorical Summary for State :
print(data.frame(Value=col3_unique_val, Count= col3_val_count))
##                   Value           Count.Var1 Count.Freq
## 1               Alabama              Alabama         23
## 2                Alaska               Alaska         23
## 3               Arizona              Arizona         23
## 4              Arkansas             Arkansas         23
## 5            California           California         23
## 6              Colorado             Colorado         23
## 7           Connecticut          Connecticut         23
## 8              Delaware             Delaware         23
## 9  District of Columbia District of Columbia         23
## 10              Florida              Florida         23
## 11              Georgia              Georgia         23
## 12               Hawaii                 Guam         23
## 13                Idaho               Hawaii         23
## 14             Illinois                Idaho         23
## 15              Indiana             Illinois         23
## 16                 Iowa              Indiana         23
## 17               Kansas                 Iowa         23
## 18             Kentucky               Kansas         23
## 19            Louisiana             Kentucky         23
## 20                Maine            Louisiana         23
## 21             Maryland                Maine         23
## 22        Massachusetts             Maryland         23
## 23             Michigan        Massachusetts         23
## 24            Minnesota             Michigan         23
## 25          Mississippi            Minnesota         23
## 26             Missouri          Mississippi         23
## 27              Montana             Missouri         23
## 28             Nebraska              Montana         23
## 29               Nevada             Nebraska         23
## 30        New Hampshire               Nevada         23
## 31           New Jersey        New Hampshire         23
## 32           New Mexico           New Jersey         23
## 33             New York           New Mexico         23
## 34       North Carolina             New York         23
## 35         North Dakota       North Carolina         23
## 36                 Ohio         North Dakota         23
## 37             Oklahoma                 Ohio         23
## 38               Oregon             Oklahoma         23
## 39         Pennsylvania               Oregon         23
## 40         Rhode Island         Pennsylvania         23
## 41       South Carolina          Puerto Rico         23
## 42         South Dakota         Rhode Island         23
## 43            Tennessee       South Carolina         23
## 44                Texas         South Dakota         23
## 45                 Utah            Tennessee         23
## 46              Vermont                Texas         23
## 47             Virginia                 Utah         23
## 48           Washington              Vermont         23
## 49        West Virginia       Virgin Islands         23
## 50            Wisconsin             Virginia         23
## 51              Wyoming           Washington         23
## 52                 Guam        West Virginia         23
## 53          Puerto Rico            Wisconsin         23
## 54       Virgin Islands              Wyoming         23

The summary for the variable State shows that each state is consistently represented 23 times in the dataset. This ensures a balanced sample across diverse regions, including states, territories, and the District of Columbia.

Significance:

Further Questions:

  1. Regional Patterns: Are there distinct regional patterns in RN employment and salaries?

  2. Territorial Influence: How do territories and the District of Columbia impact overall healthcare workforce trends?

  3. State-specific Analyses: Should certain analyses be tailored to specific states based on unique characteristics?

  4. Temporal Trends: How have the representation and characteristics of states evolved over different years?

#3 novel questions to investigate-

  1. What are the variations in average and median hourly wages for Registered Nurses (RN) across different states in 2020?

  2. How does the total number of employed Registered Nurses vary among states in 2020?

  3. Is there a strong correlation between the hourly wage and annual salary for RNs across all states?

    Hourly Wage Variations:

    • Insight: Significant variations in average and median hourly wages for RNs across states in 2020.

    • Significance: Essential for assessing financial landscapes, aiding workforce planning, and attracting healthcare professionals.

    • Further Questions:

      • Are there specific states with significantly higher or lower hourly wages?

      • What factors contribute to the observed variations?

    RN Employment Variability:

    • Insight: Disparities in the total number of employed RNs across states in 2020.

    • Significance: Crucial for healthcare resource allocation, addressing shortages, and optimizing workforce distribution.

    • Further Questions:

      • Are there states experiencing shortages or surpluses of RNs?

      • What regional factors contribute to the observed variability?

    Hourly Wage vs. Annual Salary Correlation:

    • Insight: Examining the correlation between hourly wages and annual salaries for RNs across states.

    • Significance: Helps assess how changes in hourly wages may impact annual salaries, aiding both employers and employees.

    • Further Questions:

      • Are there states with notably strong or weak correlations?

      • What external factors contribute to observed correlation patterns?

Overall Implications: These insights inform healthcare policies, workforce planning, and financial considerations for RNs across states, contributing to more informed decision-making and efficient healthcare delivery. Further investigations into specific states and long-term trends can deepen our understanding of nursing workforce dynamics.

#Aggregate function for Question 2

total_employed_rn_by_state <- aggregate(Total_Employed_RN ~ State, data= data,sum)
print(total_employed_rn_by_state)
##                   State Total_Employed_RN
## 1               Alabama            946180
## 2                Alaska            121130
## 3               Arizona            937700
## 4              Arkansas            503920
## 5            California           5566540
## 6              Colorado            902540
## 7           Connecticut            767150
## 8              Delaware            207590
## 9  District of Columbia            222020
## 10              Florida           3526190
## 11              Georgia           1461450
## 12                 Guam             10070
## 13               Hawaii            219830
## 14                Idaho            253310
## 15             Illinois           2569850
## 16              Indiana           1309490
## 17                 Iowa            698910
## 18               Kansas            593800
## 19             Kentucky            926360
## 20            Louisiana            909870
## 21                Maine            312040
## 22             Maryland           1122460
## 23        Massachusetts           1825050
## 24             Michigan           1979710
## 25            Minnesota           1272460
## 26          Mississippi            606990
## 27             Missouri           1394100
## 28              Montana            194690
## 29             Nebraska            434100
## 30               Nevada            374640
## 31        New Hampshire            290730
## 32           New Jersey           1756880
## 33           New Mexico            305610
## 34             New York           3882230
## 35       North Carolina           1888280
## 36         North Dakota            171380
## 37                 Ohio           2672810
## 38             Oklahoma            592930
## 39               Oregon            681800
## 40         Pennsylvania           2934050
## 41          Puerto Rico            379720
## 42         Rhode Island            264480
## 43       South Carolina            838800
## 44         South Dakota            237720
## 45            Tennessee           1283370
## 46                Texas           3935710
## 47                 Utah            402360
## 48              Vermont            135440
## 49       Virgin Islands              7480
## 50             Virginia           1323510
## 51           Washington           1146690
## 52        West Virginia            405170
## 53            Wisconsin           1202980
## 54              Wyoming             99430

Insights: The aggregation of employed Registered Nurses (RNs) by state in 2020 reveals the total RN workforce distribution across different states.

Significance:

  1. Workforce Magnitude: Highlights states with larger or smaller RN populations.

  2. Healthcare Capacity: Crucial for assessing each state’s healthcare workforce capacity.

  3. Resource Planning: Aids policymakers in resource allocation and targeted interventions.

Further Questions:

  1. Regional Patterns: Are there regional disparities in RN distribution?

  2. Healthcare Impact: How does RN workforce variation correlate with healthcare service quality?

  3. Influencing Factors: What contributes to differences in RN workforce size across states?

  4. Temporal Trends: How has RN distribution evolved over different years?

#Visual Summaries for Distribution of Hourly Wages and Annual Salaries:

#Boxplot for Hourly Wages

boxplot(data$Hourly_Wage_Avg, main="Boxplot for Hourly Wages", ylab="Hourly Wage", col="skyblue")

Insights: The boxplot illustrates the distribution of average hourly wages for Registered Nurses (RNs) across states in 2020.

Significance:

  1. Variability Highlighted: The plot reveals the range and central tendency of hourly wages, identifying potential outliers.

  2. Outlier Identification: States with exceptionally high or low wages stand out, providing insights into potential disparities.

  3. Financial Landscape Snapshot: A visual representation aids in understanding how hourly wages vary, crucial for assessing the financial aspects of RN positions.

Further Questions:

  1. Factors Behind Outliers: What contributes to the extreme values in hourly wages in certain states?

  2. Regional Patterns: Are there discernible regional patterns in RN hourly wage distributions?

  3. Cost of Living Comparison: How does the wage distribution align with the cost of living in different states?

  4. Temporal Trends: How has the distribution evolved over different years?

#Barplot for Annual Salaries:

barplot(height = data$Annual_Salary_Avg, names.arg = data$State, 
        main = "Barplot for Annual Salaries", xlab = "State", ylab = "Annual Salary",
        col = "lightcoral", border = "black", space = 0.5)

Insights: The barplot displays the average annual salaries for Registered Nurses (RNs) in different states in 2020.

Significance:

  1. Quick Comparison: Allows for a rapid comparison of RN salaries across states.

  2. Disparity Identification: Highlights variations in salaries, indicating potential economic and workforce differences.

  3. Policy Considerations: Useful for policymakers in allocating resources and planning interventions based on states with specific salary challenges.

Further Questions:

  1. Regional Patterns: Are there noticeable regional trends in RN annual salary distributions?

  2. Workforce Impact: How do salary variations relate to RN availability and retention in different states?

  3. External Factors: What external factors contribute to observed salary disparities?

  4. Temporal Analysis: How has the distribution of annual salaries changed over different years?

Scatterplot Matrix with Interactions:

scatterplot_matrix <- ggplot(data, aes(x = Hourly_Wage_Avg, y = Annual_Salary_Avg, color = State)) +
  geom_point() +
  labs(title = "Scatterplot Matrix with Interactions", x = "Hourly Wage", y = "Annual Salary") +
  theme_minimal()
print(scatterplot_matrix)
## Warning: Removed 6 rows containing missing values (`geom_point()`).

Insights: The scatterplot matrix with interactions visually explores state-wise relationships between hourly wages and annual salaries for Registered Nurses (RNs) in 2020, with each state color-coded.

Significance:

  1. State-wise Patterns: Color-coded points reveal variations in wage-salary relationships among states.

  2. Interactive Exploration: Offers an interactive view of how states differ in these relationships.

  3. Cluster Identification: Clusters indicate groups of states with similar dynamics, hinting at regional trends.

Further Questions:

  1. Cluster Analysis: What factors contribute to distinct clusters in wage-salary relationships among states?

  2. Outlier Identification: Do specific states exhibit outlier behavior, and what factors explain such outliers?

  3. Regional Disparities: How do regional variations impact observed patterns?

  4. Temporal Analysis: How have state-wise relationships evolved over different years?

#Categorical variable with continuous variables

interaction_plot <- ggplot(data, aes(x = Hourly_Wage_Avg, y = Annual_Salary_Avg, color = State, shape = as.factor(Location_Quotient))) +
  geom_point() +
  labs(title = "Interactions between Categorical and Continuous Variables",
       x = "Hourly Wage", y = "Annual Salary", color = "State", shape = "Location Quotient") +
  theme_minimal()

print(interaction_plot)
## Warning: The shape palette can deal with a maximum of 6 discrete values because more
## than 6 becomes difficult to discriminate
## ℹ you have requested 130 values. Consider specifying shapes manually if you
##   need that many have them.
## Warning: Removed 1235 rows containing missing values (`geom_point()`).

Insights: The interaction plot examines relationships between hourly wages and annual salaries for Registered Nurses (RNs) in 2020, considering both states and Location Quotient (LQ) as categorical variables.

Significance:

  1. Dual Categorical Exploration: Simultaneously explores state and LQ influences on wage-salary relationships.

  2. Visual Differentiation: Color indicates states, and shape represents LQ, facilitating trend identification.

  3. Interactive Exploration: Offers an interactive view of how both categorical variables interact with continuous variables.

Further Questions:

  1. State-specific Trends: What trends emerge for states, and how do they relate to workforce dynamics?

  2. LQ Impact: How does LQ influence patterns, especially in states with notable variations?

  3. Policy Implications: How can policymakers use this information for targeted interventions?

  4. Temporal Changes: How have these interactions evolved over different years?