The dataset includes 200 public service employees, with information on their annual salary (€), years of employment, and gender. Of the sample, 131 are male and 69 are female. Salaries range from approximately €22,000 to €296,000, with an average of around €122,000. Years of employment range from under 2 years to over 40, with an average of about 20 years. This dataset provides a strong basis to explore how salary relates to experience and whether gender plays a role.
SalaryData = read.xlsx("SalaryData.xlsx")
row(SalaryData)
## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 2 2 2
## [3,] 3 3 3
## [4,] 4 4 4
## [5,] 5 5 5
## [6,] 6 6 6
## [7,] 7 7 7
## [8,] 8 8 8
## [9,] 9 9 9
## [10,] 10 10 10
## [11,] 11 11 11
## [12,] 12 12 12
## [13,] 13 13 13
## [14,] 14 14 14
## [15,] 15 15 15
## [16,] 16 16 16
## [17,] 17 17 17
## [18,] 18 18 18
## [19,] 19 19 19
## [20,] 20 20 20
## [21,] 21 21 21
## [22,] 22 22 22
## [23,] 23 23 23
## [24,] 24 24 24
## [25,] 25 25 25
## [26,] 26 26 26
## [27,] 27 27 27
## [28,] 28 28 28
## [29,] 29 29 29
## [30,] 30 30 30
## [31,] 31 31 31
## [32,] 32 32 32
## [33,] 33 33 33
## [34,] 34 34 34
## [35,] 35 35 35
## [36,] 36 36 36
## [37,] 37 37 37
## [38,] 38 38 38
## [39,] 39 39 39
## [40,] 40 40 40
## [41,] 41 41 41
## [42,] 42 42 42
## [43,] 43 43 43
## [44,] 44 44 44
## [45,] 45 45 45
## [46,] 46 46 46
## [47,] 47 47 47
## [48,] 48 48 48
## [49,] 49 49 49
## [50,] 50 50 50
## [51,] 51 51 51
## [52,] 52 52 52
## [53,] 53 53 53
## [54,] 54 54 54
## [55,] 55 55 55
## [56,] 56 56 56
## [57,] 57 57 57
## [58,] 58 58 58
## [59,] 59 59 59
## [60,] 60 60 60
## [61,] 61 61 61
## [62,] 62 62 62
## [63,] 63 63 63
## [64,] 64 64 64
## [65,] 65 65 65
## [66,] 66 66 66
## [67,] 67 67 67
## [68,] 68 68 68
## [69,] 69 69 69
## [70,] 70 70 70
## [71,] 71 71 71
## [72,] 72 72 72
## [73,] 73 73 73
## [74,] 74 74 74
## [75,] 75 75 75
## [76,] 76 76 76
## [77,] 77 77 77
## [78,] 78 78 78
## [79,] 79 79 79
## [80,] 80 80 80
## [81,] 81 81 81
## [82,] 82 82 82
## [83,] 83 83 83
## [84,] 84 84 84
## [85,] 85 85 85
## [86,] 86 86 86
## [87,] 87 87 87
## [88,] 88 88 88
## [89,] 89 89 89
## [90,] 90 90 90
## [91,] 91 91 91
## [92,] 92 92 92
## [93,] 93 93 93
## [94,] 94 94 94
## [95,] 95 95 95
## [96,] 96 96 96
## [97,] 97 97 97
## [98,] 98 98 98
## [99,] 99 99 99
## [100,] 100 100 100
## [101,] 101 101 101
## [102,] 102 102 102
## [103,] 103 103 103
## [104,] 104 104 104
## [105,] 105 105 105
## [106,] 106 106 106
## [107,] 107 107 107
## [108,] 108 108 108
## [109,] 109 109 109
## [110,] 110 110 110
## [111,] 111 111 111
## [112,] 112 112 112
## [113,] 113 113 113
## [114,] 114 114 114
## [115,] 115 115 115
## [116,] 116 116 116
## [117,] 117 117 117
## [118,] 118 118 118
## [119,] 119 119 119
## [120,] 120 120 120
## [121,] 121 121 121
## [122,] 122 122 122
## [123,] 123 123 123
## [124,] 124 124 124
## [125,] 125 125 125
## [126,] 126 126 126
## [127,] 127 127 127
## [128,] 128 128 128
## [129,] 129 129 129
## [130,] 130 130 130
## [131,] 131 131 131
## [132,] 132 132 132
## [133,] 133 133 133
## [134,] 134 134 134
## [135,] 135 135 135
## [136,] 136 136 136
## [137,] 137 137 137
## [138,] 138 138 138
## [139,] 139 139 139
## [140,] 140 140 140
## [141,] 141 141 141
## [142,] 142 142 142
## [143,] 143 143 143
## [144,] 144 144 144
## [145,] 145 145 145
## [146,] 146 146 146
## [147,] 147 147 147
## [148,] 148 148 148
## [149,] 149 149 149
## [150,] 150 150 150
## [151,] 151 151 151
## [152,] 152 152 152
## [153,] 153 153 153
## [154,] 154 154 154
## [155,] 155 155 155
## [156,] 156 156 156
## [157,] 157 157 157
## [158,] 158 158 158
## [159,] 159 159 159
## [160,] 160 160 160
## [161,] 161 161 161
## [162,] 162 162 162
## [163,] 163 163 163
## [164,] 164 164 164
## [165,] 165 165 165
## [166,] 166 166 166
## [167,] 167 167 167
## [168,] 168 168 168
## [169,] 169 169 169
## [170,] 170 170 170
## [171,] 171 171 171
## [172,] 172 172 172
## [173,] 173 173 173
## [174,] 174 174 174
## [175,] 175 175 175
## [176,] 176 176 176
## [177,] 177 177 177
## [178,] 178 178 178
## [179,] 179 179 179
## [180,] 180 180 180
## [181,] 181 181 181
## [182,] 182 182 182
## [183,] 183 183 183
## [184,] 184 184 184
## [185,] 185 185 185
## [186,] 186 186 186
## [187,] 187 187 187
## [188,] 188 188 188
## [189,] 189 189 189
## [190,] 190 190 190
## [191,] 191 191 191
## [192,] 192 192 192
## [193,] 193 193 193
## [194,] 194 194 194
## [195,] 195 195 195
## [196,] 196 196 196
## [197,] 197 197 197
## [198,] 198 198 198
## [199,] 199 199 199
## [200,] 200 200 200
table(SalaryData$gender)
##
## Female Male
## 69 131
summary(SalaryData$salary)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 30028 60076 93164 108491 150437 255381
sd(SalaryData$salary)
## [1] 60116.36
summary(SalaryData$years_exp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.007167 7.749079 16.662402 15.666479 22.823554 29.666752
sd(SalaryData$years_exp)
## [1] 8.760716
The scatterplot below shows a positive association between years of employment and salary. However, the relationship appears to be nonlinear—salaries rise with experience, but not at a constant rate. There may be diminishing returns at higher experience levels.
plot(SalaryData$years_exp, SalaryData$salary,
main = "Scatterplot: Years of Experience vs. Salary",
xlab = "Years of Experience", ylab = "Salary (€)",
pch = 19, col = "steelblue")
To better estimate the relationship, we log-transformed the salary variable to linearize the association. A linear model was then fitted with log(salary) as the dependent variable and years of experience as the predictor.
SalaryData$log_salary = log(SalaryData$salary)
model = lm(log_salary ~ years_exp, data = SalaryData)
summary(model)
##
## Call:
## lm(formula = log_salary ~ years_exp, data = SalaryData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.74993 -0.11686 0.00666 0.11146 0.77461
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.436444 0.032197 324.14 <2e-16 ***
## years_exp 0.063322 0.001795 35.28 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2218 on 198 degrees of freedom
## Multiple R-squared: 0.8628, Adjusted R-squared: 0.8621
## F-statistic: 1245 on 1 and 198 DF, p-value: < 2.2e-16
The regression model shows a strong relationship between log-transformed salary and years of experience, with an R-squared of around 0.86. The slope of the model is approximately 0.063, which means that for each additional year of experience, salary increases by about 6.3% on average. This supports the idea that salary grows exponentially with experience rather than linearly.
To explore gender effects, we fit separate models for male and female employees.
model_male = lm(log_salary ~ years_exp, data = subset(SalaryData, gender == "Male"))
model_female = lm(log_salary ~ years_exp, data = subset(SalaryData, gender == "Female"))
summary(model_male)
##
## Call:
## lm(formula = log_salary ~ years_exp, data = subset(SalaryData,
## gender == "Male"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.29310 -0.10494 -0.03192 0.07773 0.66850
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.54381 0.02735 385.48 <2e-16 ***
## years_exp 0.06296 0.00165 38.17 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1685 on 129 degrees of freedom
## Multiple R-squared: 0.9186, Adjusted R-squared: 0.918
## F-statistic: 1457 on 1 and 129 DF, p-value: < 2.2e-16
summary(model_female)
##
## Call:
## lm(formula = log_salary ~ years_exp, data = subset(SalaryData,
## gender == "Female"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.39797 -0.04687 -0.00449 0.08348 0.16695
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.891181 0.036259 272.79 <2e-16 ***
## years_exp 0.081904 0.001789 45.79 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1095 on 67 degrees of freedom
## Multiple R-squared: 0.969, Adjusted R-squared: 0.9686
## F-statistic: 2096 on 1 and 67 DF, p-value: < 2.2e-16
While both models show a positive relationship between experience and salary, the slopes and intercepts differ slightly. This could indicate gender-based differences in salary progression, which would require further investigation to confirm and explain.