data <- read.csv("C:\\Users\\91814\\Desktop\\Statistics\\nurses.csv")
Out of all probably one of the most important continuous variable is “Annual_Salary_Median.” For any year, the median annual wage of registered nurses in each state has been determined by this variable. In the healthcare sector, salary often becomes the top highest priority for both employers and job seekers. The median number provides a helpful glimpse of the average earning potential of registered nurses in each state.
I’m choosing “State” as the categorical variable. Different states might have different average salaries for registered nurses due to factors like cost of living, demand for healthcare services, and state-specific regulations.
NULL HYPOTHESIS(H0): There is no significant difference in the average annual salary for registered nurses among different states.
Understanding the relationship between the location quotient and the yearly wage average is made easier by the visual aids and linear regression analysis. We can determine whether there is a linear relationship between these factors by fitting a regression model and charting the data. This gives us a quantitative knowledge of how changes in the location quotient might affect income levels. This relationship is further clarified by interpreting the coefficients, which show the projected pay average at zero location quotient and the change in salary average at one unit rise in location quotient. Based on the sign and amplitude of the slope coefficient, these insights allow for practical suggestions for optimizing wage averages, which aid in making well-informed decisions on the best possible locations.
# Load necessary library for ANOVA test
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data$State <- as.factor(data$State)
data <- data %>%
mutate(State = case_when(
State %in% c("California", "Oregon", "Washington") ~ "West Coast",
State %in% c("Texas", "Louisiana", "Florida") ~ "Southern",
State %in% c("New York", "Massachusetts", "Connecticut") ~ "Northeast",
TRUE ~ as.character(State)
))
#ANOVA test
anova_result <- aov(Annual_Salary_Avg ~ State, data = data)
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## State 47 9.352e+10 1.990e+09 16.57 <2e-16 ***
## Residuals 1188 1.427e+11 1.201e+08
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 6 observations deleted due to missingness
Degrees of Freedom (Df): There are 47 levels of the State factor variable, indicating 47 different categories or groups.
Sum of Squares (Sum Sq): The total sum of squares for the model (State) is 9.352e+10, and for residuals (error) is 1.427e+11.
Mean Square (Mean Sq): The mean sum of squares for the model is 1.990e+09, and for residuals is 1.201e+08.
F value: The F value is 16.57. Pr(>F): The F statistic’s corresponding p-value is <2e-16, or practically 0.
The selected significance threshold (e.g., 0.05) is significantly exceeded by the p-value (<2e-16). We therefore disprove the null hypothesis.Therefore, there is strong evidence to suggest that there are significant differences in the average annual salary for registered nurses among the different states (or regions, after consolidation). The results of this study suggest that the median annual salary of a registered nurse is significantly affected by the state or location in which they are employed. With the use of this data, healthcare institutions, legislators, and registered nurses themselves can make wise decisions on workforce planning strategies, negotiating wages, and possibilities for employment.
library(ggplot2)
ggplot(data, aes(x = State, y = Annual_Salary_Avg, fill = State)) +
geom_boxplot() + # Add box plot
labs(x = "State", y = "Annual Salary Average") +
theme_minimal() + # Minimalist theme
theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 6 rows containing non-finite values (`stat_boxplot()`).
# Calculate correlation coefficient between Location_Quotient and Annual_Salary_Avg, removing NA values
correlation <- cor(data$Location_Quotient, data$Annual_Salary_Avg, use = "complete.obs")
# Print correlation coefficient
correlation
## [1] -0.1251608
Because of the negative association coefficient, states with higher location quotients—that is, those where nursing employment is concentrated more than the national average—seem to have a minor propensity to have slightly lower average yearly incomes for registered nurses. Nevertheless, because the correlation coefficient is so near to zero, the relationship’s strength is weak. It’s crucial to remember that a connection does not indicate a cause. Although there is a statistical correlation between these variables, nurse pay may also be influenced by other factors that were not taken into account in our analysis.
lm_model <- lm(Annual_Salary_Avg ~ Location_Quotient, data = data)
summary(lm_model)
##
## Call:
## lm(formula = Annual_Salary_Avg ~ Location_Quotient, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37410 -7303 -1797 6725 51506
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 75938 2655 28.601 < 2e-16 ***
## Location_Quotient -7913 2582 -3.064 0.00228 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11810 on 590 degrees of freedom
## (650 observations deleted due to missingness)
## Multiple R-squared: 0.01567, Adjusted R-squared: 0.014
## F-statistic: 9.39 on 1 and 590 DF, p-value: 0.002282
(Estimated Intercept): 75938 is the estimated intercept. This is the mean yearly compensation for registered nurses in the case where the location quotient is zero. According to the data, registered nurses’ average yearly wage is roughly $75,938 in places where there is no concentration of nursing employment compared to the national average.
Location_Quotient (Estimated Slope): -7913 is the approximate slope. This coefficient shows how a one-unit increase in the location quotient is expected to affect the average yearly wage of registered nurses. In particular, the average yearly wage of registered nurses falls by almost $7,913 for every unit rise in the location quotient.
Based on these coefficients, it can be concluded that the location quotient and the average annual pay of registered nurses have a statistically significant negative connection. Put another way, the average pay for registered nurses tends to decline when the concentration of nursing employment rises relative to the national average.
States with higher concentrations of nursing employment than the national norm may have lower average earnings for registered nurses, according to the regression results. Therefore, when making decisions concerning workforce planning or salary negotiations, healthcare professionals or legislators who are interested in maximizing registered nurse salaries may need to take into account factors other than location quotient, such as demand-supply dynamics, cost of living, and healthcare legislation.
library(ggplot2)
ggplot(data, aes(x = Location_Quotient, y = Annual_Salary_Avg)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Location Quotient", y = "Annual Salary Average") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 650 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 650 rows containing missing values (`geom_point()`).
Interpretation: The intercept represents the expected annual salary average when the location quotient is 0. The location quotient coefficient represents the change in the annual salary average for a one-unit increase in the location quotient, holding all other variables constant.
As the location quotient increases, the annual salary average tends to decrease. To maximize the annual salary average, we can consider locations with lower location quotients.
Insights: Understanding the relationship between the location quotient and the yearly wage average is made easier by the visual aids and linear regression analysis. We can determine whether there is a linear relationship between these factors by fitting a regression model and charting the data. This gives us a quantitative knowledge of how changes in the location quotient might affect income levels. This relationship is further clarified by interpreting the coefficients, which show the projected pay average at zero location quotient and the change in salary average at one unit rise in location quotient. Based on the sign and amplitude of the slope coefficient, these insights allow for practical suggestions for optimizing wage averages, which aid in making well-informed decisions on the best possible locations.
Further Investigation: A more thorough understanding of the factors that influence salaries could be obtained by investigating the impact of other variables on average salaries, such as industry type, cost of living, or education levels. Second, it is important to investigate any potential non-linear correlations between the location quotient and wage average using polynomial or other regression models. Last but not least, examining the spatial distribution of location quotients and incomes may reveal regional differences that could have a distinct impact on the connection, calling for the application of geographic information systems (GIS) techniques or spatial regression models to identify spatial trends. By focusing on these research areas, we may improve our comprehension of the variables that affect income averages and hone our suggestions for the best locations.