The dataset includes 200 public service employees: 100 male and 100 female. Salaries range from €30,203 to €331,348, with an average salary of €122,304 and a standard deviation of €79,030.12. The variable years_empl represents employees’ work experience in years, and is used as the predictor to estimate how experience influences salary.
SalaryData = read.xlsx("SalaryData.xlsx")
row(SalaryData)
## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 2 2 2
## [3,] 3 3 3
## [4,] 4 4 4
## [5,] 5 5 5
## [6,] 6 6 6
## [7,] 7 7 7
## [8,] 8 8 8
## [9,] 9 9 9
## [10,] 10 10 10
## [11,] 11 11 11
## [12,] 12 12 12
## [13,] 13 13 13
## [14,] 14 14 14
## [15,] 15 15 15
## [16,] 16 16 16
## [17,] 17 17 17
## [18,] 18 18 18
## [19,] 19 19 19
## [20,] 20 20 20
## [21,] 21 21 21
## [22,] 22 22 22
## [23,] 23 23 23
## [24,] 24 24 24
## [25,] 25 25 25
## [26,] 26 26 26
## [27,] 27 27 27
## [28,] 28 28 28
## [29,] 29 29 29
## [30,] 30 30 30
## [31,] 31 31 31
## [32,] 32 32 32
## [33,] 33 33 33
## [34,] 34 34 34
## [35,] 35 35 35
## [36,] 36 36 36
## [37,] 37 37 37
## [38,] 38 38 38
## [39,] 39 39 39
## [40,] 40 40 40
## [41,] 41 41 41
## [42,] 42 42 42
## [43,] 43 43 43
## [44,] 44 44 44
## [45,] 45 45 45
## [46,] 46 46 46
## [47,] 47 47 47
## [48,] 48 48 48
## [49,] 49 49 49
## [50,] 50 50 50
## [51,] 51 51 51
## [52,] 52 52 52
## [53,] 53 53 53
## [54,] 54 54 54
## [55,] 55 55 55
## [56,] 56 56 56
## [57,] 57 57 57
## [58,] 58 58 58
## [59,] 59 59 59
## [60,] 60 60 60
## [61,] 61 61 61
## [62,] 62 62 62
## [63,] 63 63 63
## [64,] 64 64 64
## [65,] 65 65 65
## [66,] 66 66 66
## [67,] 67 67 67
## [68,] 68 68 68
## [69,] 69 69 69
## [70,] 70 70 70
## [71,] 71 71 71
## [72,] 72 72 72
## [73,] 73 73 73
## [74,] 74 74 74
## [75,] 75 75 75
## [76,] 76 76 76
## [77,] 77 77 77
## [78,] 78 78 78
## [79,] 79 79 79
## [80,] 80 80 80
## [81,] 81 81 81
## [82,] 82 82 82
## [83,] 83 83 83
## [84,] 84 84 84
## [85,] 85 85 85
## [86,] 86 86 86
## [87,] 87 87 87
## [88,] 88 88 88
## [89,] 89 89 89
## [90,] 90 90 90
## [91,] 91 91 91
## [92,] 92 92 92
## [93,] 93 93 93
## [94,] 94 94 94
## [95,] 95 95 95
## [96,] 96 96 96
## [97,] 97 97 97
## [98,] 98 98 98
## [99,] 99 99 99
## [100,] 100 100 100
## [101,] 101 101 101
## [102,] 102 102 102
## [103,] 103 103 103
## [104,] 104 104 104
## [105,] 105 105 105
## [106,] 106 106 106
## [107,] 107 107 107
## [108,] 108 108 108
## [109,] 109 109 109
## [110,] 110 110 110
## [111,] 111 111 111
## [112,] 112 112 112
## [113,] 113 113 113
## [114,] 114 114 114
## [115,] 115 115 115
## [116,] 116 116 116
## [117,] 117 117 117
## [118,] 118 118 118
## [119,] 119 119 119
## [120,] 120 120 120
## [121,] 121 121 121
## [122,] 122 122 122
## [123,] 123 123 123
## [124,] 124 124 124
## [125,] 125 125 125
## [126,] 126 126 126
## [127,] 127 127 127
## [128,] 128 128 128
## [129,] 129 129 129
## [130,] 130 130 130
## [131,] 131 131 131
## [132,] 132 132 132
## [133,] 133 133 133
## [134,] 134 134 134
## [135,] 135 135 135
## [136,] 136 136 136
## [137,] 137 137 137
## [138,] 138 138 138
## [139,] 139 139 139
## [140,] 140 140 140
## [141,] 141 141 141
## [142,] 142 142 142
## [143,] 143 143 143
## [144,] 144 144 144
## [145,] 145 145 145
## [146,] 146 146 146
## [147,] 147 147 147
## [148,] 148 148 148
## [149,] 149 149 149
## [150,] 150 150 150
## [151,] 151 151 151
## [152,] 152 152 152
## [153,] 153 153 153
## [154,] 154 154 154
## [155,] 155 155 155
## [156,] 156 156 156
## [157,] 157 157 157
## [158,] 158 158 158
## [159,] 159 159 159
## [160,] 160 160 160
## [161,] 161 161 161
## [162,] 162 162 162
## [163,] 163 163 163
## [164,] 164 164 164
## [165,] 165 165 165
## [166,] 166 166 166
## [167,] 167 167 167
## [168,] 168 168 168
## [169,] 169 169 169
## [170,] 170 170 170
## [171,] 171 171 171
## [172,] 172 172 172
## [173,] 173 173 173
## [174,] 174 174 174
## [175,] 175 175 175
## [176,] 176 176 176
## [177,] 177 177 177
## [178,] 178 178 178
## [179,] 179 179 179
## [180,] 180 180 180
## [181,] 181 181 181
## [182,] 182 182 182
## [183,] 183 183 183
## [184,] 184 184 184
## [185,] 185 185 185
## [186,] 186 186 186
## [187,] 187 187 187
## [188,] 188 188 188
## [189,] 189 189 189
## [190,] 190 190 190
## [191,] 191 191 191
## [192,] 192 192 192
## [193,] 193 193 193
## [194,] 194 194 194
## [195,] 195 195 195
## [196,] 196 196 196
## [197,] 197 197 197
## [198,] 198 198 198
## [199,] 199 199 199
## [200,] 200 200 200
table(SalaryData$gender)
##
## Female Male
## 100 100
summary(SalaryData$salary)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 30203 54208 97496 122304 179447 331348
sd(SalaryData$salary)
## [1] 79030.12
summary(SalaryData$years_exp)
## Length Class Mode
## 0 NULL NULL
sd(SalaryData$years_exp)
## [1] NA
The scatterplot reveals a clear positive, but non-linear, relationship between years of employment and salary. While salary grows slowly at first, it increases more steeply with more experience. This suggests an exponential pattern: justifying the log transformation used later in the analysis.
names(SalaryData)
## [1] "years_empl" "salary" "gender"
plot_data = na.omit(SalaryData[, c("years_empl", "salary")])
# Plot
plot(plot_data$years_empl, plot_data$salary,
main = "Scatterplot: Years of Employment vs. Salary",
xlab = "Years of Employment",
ylab = "Salary (€)",
pch = 19,
col = "steelblue")
To account for the non-linear relationship observed in the scatterplot a logarithmic transformation to the salary variable was applied. This linearizes the data and makes it suitable for linear regression. The model shows a very strong fit, with an R² of 0.917, indicating that years of employment explain over 91% of the variance in log-salary.
SalaryData$log_salary = log(SalaryData$salary)
model = lm(log_salary ~ years_empl, data = SalaryData)
summary(model)
##
## Call:
## lm(formula = log_salary ~ years_empl, data = SalaryData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.77041 -0.12197 -0.00111 0.15234 0.41044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.382774 0.027501 377.54 <2e-16 ***
## years_empl 0.070998 0.001517 46.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1933 on 198 degrees of freedom
## Multiple R-squared: 0.9171, Adjusted R-squared: 0.9167
## F-statistic: 2191 on 1 and 198 DF, p-value: < 2.2e-16
The estimated coefficient for years_empl is 0.071. Since the salary
variable was log-transformed, this means that each additional year of
employment is associated with an average salary increase of
approximately 7.37 percent. The model fits the data very well, with an
R-squared of 0.917. This means that around 91 percent of the variation
in salary (on the log scale) can be explained by years of
employment.
Separate regression models for men and women show different results.
model_male = lm(log_salary ~ years_empl, data = subset(SalaryData, gender == "Male"))
model_female = lm(log_salary ~ years_empl, data = subset(SalaryData, gender == "Female"))
summary(model_male)
##
## Call:
## lm(formula = log_salary ~ years_empl, data = subset(SalaryData,
## gender == "Male"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.56063 -0.08644 0.00333 0.06960 0.38121
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.380951 0.030790 337.15 <2e-16 ***
## years_empl 0.076372 0.001698 44.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.153 on 98 degrees of freedom
## Multiple R-squared: 0.9538, Adjusted R-squared: 0.9533
## F-statistic: 2023 on 1 and 98 DF, p-value: < 2.2e-16
summary(model_female)
##
## Call:
## lm(formula = log_salary ~ years_empl, data = subset(SalaryData,
## gender == "Female"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.71847 -0.07628 0.01426 0.10656 0.40887
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.384598 0.036725 282.8 <2e-16 ***
## years_empl 0.065623 0.002025 32.4 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1825 on 98 degrees of freedom
## Multiple R-squared: 0.9146, Adjusted R-squared: 0.9138
## F-statistic: 1050 on 1 and 98 DF, p-value: < 2.2e-16
For men, the coefficient is 0.076, which corresponds to an average salary increase of about 7.9 percent per year of employment. For women, the coefficient is 0.066, meaning an average increase of around 6.8 percent per year. Both models show a strong fit: the R-squared is 0.954 for men and 0.915 for women. This indicates that experience is closely linked to salary in both groups, although the yearly increase is slightly higher for men in this sample.