data <- read.csv("C:\\Users\\91814\\Desktop\\Statistics\\nurses.csv")
data$Years_of_Experience <- data$Hourly_Wage_Avg / data$Hourly_10th_Percentile
pair1 <- data[c("Annual_Salary_Avg", "Years_of_Experience")]
data$Avg_Wage_Difference <- data$Hourly_Wage_Avg - data$Hourly_Wage_Median
pair3 <- data[c("Yearly_Total_Employed_Aggregate", "Avg_Wage_Difference")]
head(pair1)
## Annual_Salary_Avg Years_of_Experience
## 1 60230 1.395663
## 2 95270 1.454286
## 3 80380 1.396963
## 4 63640 1.425245
## 5 120560 1.582742
## 6 77860 1.394560
head(pair3)
## Yearly_Total_Employed_Aggregate Avg_Wage_Difference
## 1 1903210 0.77
## 2 296300 0.58
## 3 2835110 0.66
## 4 1177860 0.63
## 5 16430660 1.03
## 6 2578000 0.65
Insights:
A new variable is produced by dividing Hourly_Wage_Avg by
Hourly_10th_Percentile,‘Years_of_Experience’.
pair1 has the newly constructed Years_of_Experience and
Annual_Salary_Avg columns.
Significance:
A ratio or index showing the multiple by which the average hourly wage
is related to the hourly wage in the 10th percentile could be
represented by the variable Years_of_Experience. Examining the
connection between Years_of_Experience and Annual_Salary_Avg may shed
light on the effects of pay differences in the dataset.
Further Questions:
What is the relationship between Years_of_Experience and
Annual_Salary_Avg in a correlation analysis? Does a greater annual
income correlate with more years of experience?
Years of Experience Distribution: What is the Years of Experience
distribution like? Are there anomalies in the data, and if so, do they
indicate any particular patterns?
Insights: The difference between Hourly_Wage_Avg and
Hourly_Wage_Median is subtracted to generate a new variable called
Avg_Wage_Difference.
pair3 has the newly generated Avg_Wage_Difference and the columns
Yearly_Total_Employed_Aggregate.
Significance:
The variability between the average and median hourly wages within the
dataset may be represented by Avg_Wage_Difference.
Studying this connection in conjunction with
Yearly_Total_Employed_Aggregate can reveal patterns in the wage
distribution.
Further Questions: Effect of Wage Disparity on Employment: Does
the wage gap affect the aggregate employment figures
(Yearly_Total_Employed_Aggregate) in any way?
Sector Analysis: Is there a greater disparity in wages in any particular
sectors or geographical areas? Does this have anything to do with those
sectors’ employment trends?
library(ggplot2)
# Pair 1: Annual Salary vs. Years of Experience
ggplot(pair1, aes(x = Years_of_Experience, y = Annual_Salary_Avg)) +
geom_point(alpha = 0.7) + # Add transparency to points for better visibility
geom_smooth(method = "lm", se = FALSE, color = "blue") +
labs(title = "Annual Salary vs. Years of Experience",
x = "Years of Experience",
y = "Annual Salary (Average)") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 6 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 6 rows containing missing values (`geom_point()`).
# Pair 2: Yearly Total Employed Aggregate vs. Average Wage Difference
ggplot(pair3, aes(x = Yearly_Total_Employed_Aggregate, y = Avg_Wage_Difference)) +
geom_point(alpha = 0.7) + # Add transparency to points for better visibility
geom_smooth(method = "lm", se = FALSE, color = "green") +
labs(title = "Yearly Total Employed Aggregate vs. Average Wage Difference",
x = "Yearly Total Employed",
y = "Average Wage Difference") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 6 rows containing non-finite values (`stat_smooth()`).
## Removed 6 rows containing missing values (`geom_point()`).
Insight:The scatter plot shows a positive correlation between Years of Experience and Annual Salary. The blue regression line indicates the general trend in the data.
Significance:The regression line’s positive slope indicates that annual salary generally tends to rise as years of experience do. The density of data points in the various plot sections may be seen thanks to the points’ transparency.
Further Questions:Are there any outliers that significantly influence the trend? Does the relationship hold consistently across different subgroups or regions? What factors might contribute to variations in Annual Salary among individuals with similar Years of Experience?
Insight:The scatter plot shows a seemingly random distribution between Yearly Total Employed and Average Wage Difference.The green regression line does not capture a clear trend in the data.
Significance:The lack of a strong pattern in the scatter plot suggests that Yearly Total Employed may not be a strong predictor of Average Wage Difference.The wide dispersion of points indicates variability in Average Wage Difference regardless of the level of Yearly Total Employed.
Further Questions:Are there specific regions or clusters where the relationship between Yearly Total Employed and Average Wage Difference is more pronounced? Could other variables not included in this analysis better explain the variation in Average Wage Difference? Is there any seasonal or temporal pattern in the data that influences the relationship?
Pair 1 - Annual Salary vs. Years of Experience:
Explanatory Variable: Years of Experience
Response Variable: Annual Salary
Pair 2- Yearly Total Employed Aggregate vs. Average Wage Difference:
Explanatory Variable: Yearly Total Employed
Response Variable: Average Wage Difference
# Pair 1
cor(pair1$Years_of_Experience, pair1$Annual_Salary_Avg, use = "complete.obs")
## [1] 0.08639091
# Pair 2
cor(pair3$Yearly_Total_Employed_Aggregate, pair3$Avg_Wage_Difference, use = "complete.obs")
## [1] 0.1051373
Insight:The correlation coefficient measures the strength and direction of the linear relationship between Years of Experience and Annual Salary.A positive correlation coefficient suggests a positive linear relationship.
Significance:The positive correlation coefficient confirms the visual observation of a positive relationship in the scatter plot.The value of the correlation coefficient indicates the strength of the relationship: close to 1 suggests a strong positive correlation.
Further Questions:Is the correlation consistent across different subgroups or categories? What other factors might influence the relationship between Years of Experience and Annual Salary? Are there nonlinear relationships or interactions not captured by the correlation coefficient?
Insight:The correlation coefficient measures the strength and direction of the linear relationship between Yearly Total Employed and Average Wage Difference.
Significance:A correlation coefficient close to 0 suggests a weak linear relationship or no linear relationship between the two variables. The scatter plot and the lack of a clear pattern in the regression line align with a correlation coefficient near 0.
Further Questions:Are there specific subsets or clusters within the data where a stronger relationship might exist? Could other variables not considered in this analysis explain the variation in Average Wage Difference? Does the relationship change over time or in different regions?
confidence_interval_cal <- function(data, variable, confidence_level = 0.95) {
x <- data[[variable]]
x <- na.omit(as.numeric(x))
if (length(x) == 0) {
warning("Data contains missing or non-numeric values. Confidence interval cannot be calculated.")
return(NULL)
}
n <- length(x)
mean_value <- mean(x)
std_dev <- sd(x)
margin_of_error <- qnorm((1 + confidence_level) / 2) * std_dev / sqrt(n)
return(list(
variable = variable,
confidence_interval = c(mean_value - margin_of_error, mean_value + margin_of_error),
mean = mean_value
))
}
# Pair 1 - Annual Salary vs. Years of Experience
pair1_cfdc_int <- confidence_interval_cal(pair1, "Annual_Salary_Avg")
# Pair 3 - Total Employed Healthcare State Aggregate vs. Average Wage Difference
pair3_cfdc_int <- confidence_interval_cal(pair3, "Avg_Wage_Difference")
print(pair1_cfdc_int)
## $variable
## [1] "Annual_Salary_Avg"
##
## $confidence_interval
## [1] 58477.34 60019.26
##
## $mean
## [1] 59248.3
print(pair3_cfdc_int)
## $variable
## [1] "Avg_Wage_Difference"
##
## $confidence_interval
## [1] 0.5908932 0.6496246
##
## $mean
## [1] 0.6202589
Insight:The calculated confidence interval for Annual Salary (Average) is [58477.34, 60019.26] with a mean value of approximately 59248.3.This means we can be 95% confident that the true average Annual Salary falls within this interval.
Significance:The confidence interval provides a range within which we estimate the true population average Annual Salary to lie.The mean value (59248.3) serves as the point estimate for the average Annual Salary.
Further Questions:How does this confidence interval compare to salary benchmarks or industry standards? Are there specific subgroups within the data where the confidence interval differs significantly? What factors contribute to the variability in Annual Salary?
Insight:The calculated confidence interval for Average Wage Difference is [0.5908932, 0.6496246] with a mean value of approximately 0.6202589.This means we can be 95% confident that the true average Average Wage Difference falls within this interval.
Significance:The confidence interval provides a range within which we estimate the true population average Average Wage Difference to lie.The mean value (0.6202589) serves as the point estimate for the average Average Wage Difference.
Further Questions:How does this confidence interval align with expectations or industry norms for wage differences? Are there specific factors or regions that contribute to variations in Average Wage Difference? Does the confidence interval change when considering different time periods or subgroups?