1. Sample description

The dataset includes 200 public service employees, with information on their annual salary (€), years of employment, and gender. Of the sample, 131 are male and 69 are female. Salaries range from approximately €22,000 to €296,000, with an average of around €122,000. Years of employment range from under 2 years to over 40, with an average of about 20 years. This dataset provides a strong basis to explore how salary relates to experience and whether gender plays a role.

SalaryData = read.xlsx("SalaryData.xlsx")
row(SalaryData)
##        [,1] [,2] [,3]
##   [1,]    1    1    1
##   [2,]    2    2    2
##   [3,]    3    3    3
##   [4,]    4    4    4
##   [5,]    5    5    5
##   [6,]    6    6    6
##   [7,]    7    7    7
##   [8,]    8    8    8
##   [9,]    9    9    9
##  [10,]   10   10   10
##  [11,]   11   11   11
##  [12,]   12   12   12
##  [13,]   13   13   13
##  [14,]   14   14   14
##  [15,]   15   15   15
##  [16,]   16   16   16
##  [17,]   17   17   17
##  [18,]   18   18   18
##  [19,]   19   19   19
##  [20,]   20   20   20
##  [21,]   21   21   21
##  [22,]   22   22   22
##  [23,]   23   23   23
##  [24,]   24   24   24
##  [25,]   25   25   25
##  [26,]   26   26   26
##  [27,]   27   27   27
##  [28,]   28   28   28
##  [29,]   29   29   29
##  [30,]   30   30   30
##  [31,]   31   31   31
##  [32,]   32   32   32
##  [33,]   33   33   33
##  [34,]   34   34   34
##  [35,]   35   35   35
##  [36,]   36   36   36
##  [37,]   37   37   37
##  [38,]   38   38   38
##  [39,]   39   39   39
##  [40,]   40   40   40
##  [41,]   41   41   41
##  [42,]   42   42   42
##  [43,]   43   43   43
##  [44,]   44   44   44
##  [45,]   45   45   45
##  [46,]   46   46   46
##  [47,]   47   47   47
##  [48,]   48   48   48
##  [49,]   49   49   49
##  [50,]   50   50   50
##  [51,]   51   51   51
##  [52,]   52   52   52
##  [53,]   53   53   53
##  [54,]   54   54   54
##  [55,]   55   55   55
##  [56,]   56   56   56
##  [57,]   57   57   57
##  [58,]   58   58   58
##  [59,]   59   59   59
##  [60,]   60   60   60
##  [61,]   61   61   61
##  [62,]   62   62   62
##  [63,]   63   63   63
##  [64,]   64   64   64
##  [65,]   65   65   65
##  [66,]   66   66   66
##  [67,]   67   67   67
##  [68,]   68   68   68
##  [69,]   69   69   69
##  [70,]   70   70   70
##  [71,]   71   71   71
##  [72,]   72   72   72
##  [73,]   73   73   73
##  [74,]   74   74   74
##  [75,]   75   75   75
##  [76,]   76   76   76
##  [77,]   77   77   77
##  [78,]   78   78   78
##  [79,]   79   79   79
##  [80,]   80   80   80
##  [81,]   81   81   81
##  [82,]   82   82   82
##  [83,]   83   83   83
##  [84,]   84   84   84
##  [85,]   85   85   85
##  [86,]   86   86   86
##  [87,]   87   87   87
##  [88,]   88   88   88
##  [89,]   89   89   89
##  [90,]   90   90   90
##  [91,]   91   91   91
##  [92,]   92   92   92
##  [93,]   93   93   93
##  [94,]   94   94   94
##  [95,]   95   95   95
##  [96,]   96   96   96
##  [97,]   97   97   97
##  [98,]   98   98   98
##  [99,]   99   99   99
## [100,]  100  100  100
## [101,]  101  101  101
## [102,]  102  102  102
## [103,]  103  103  103
## [104,]  104  104  104
## [105,]  105  105  105
## [106,]  106  106  106
## [107,]  107  107  107
## [108,]  108  108  108
## [109,]  109  109  109
## [110,]  110  110  110
## [111,]  111  111  111
## [112,]  112  112  112
## [113,]  113  113  113
## [114,]  114  114  114
## [115,]  115  115  115
## [116,]  116  116  116
## [117,]  117  117  117
## [118,]  118  118  118
## [119,]  119  119  119
## [120,]  120  120  120
## [121,]  121  121  121
## [122,]  122  122  122
## [123,]  123  123  123
## [124,]  124  124  124
## [125,]  125  125  125
## [126,]  126  126  126
## [127,]  127  127  127
## [128,]  128  128  128
## [129,]  129  129  129
## [130,]  130  130  130
## [131,]  131  131  131
## [132,]  132  132  132
## [133,]  133  133  133
## [134,]  134  134  134
## [135,]  135  135  135
## [136,]  136  136  136
## [137,]  137  137  137
## [138,]  138  138  138
## [139,]  139  139  139
## [140,]  140  140  140
## [141,]  141  141  141
## [142,]  142  142  142
## [143,]  143  143  143
## [144,]  144  144  144
## [145,]  145  145  145
## [146,]  146  146  146
## [147,]  147  147  147
## [148,]  148  148  148
## [149,]  149  149  149
## [150,]  150  150  150
## [151,]  151  151  151
## [152,]  152  152  152
## [153,]  153  153  153
## [154,]  154  154  154
## [155,]  155  155  155
## [156,]  156  156  156
## [157,]  157  157  157
## [158,]  158  158  158
## [159,]  159  159  159
## [160,]  160  160  160
## [161,]  161  161  161
## [162,]  162  162  162
## [163,]  163  163  163
## [164,]  164  164  164
## [165,]  165  165  165
## [166,]  166  166  166
## [167,]  167  167  167
## [168,]  168  168  168
## [169,]  169  169  169
## [170,]  170  170  170
## [171,]  171  171  171
## [172,]  172  172  172
## [173,]  173  173  173
## [174,]  174  174  174
## [175,]  175  175  175
## [176,]  176  176  176
## [177,]  177  177  177
## [178,]  178  178  178
## [179,]  179  179  179
## [180,]  180  180  180
## [181,]  181  181  181
## [182,]  182  182  182
## [183,]  183  183  183
## [184,]  184  184  184
## [185,]  185  185  185
## [186,]  186  186  186
## [187,]  187  187  187
## [188,]  188  188  188
## [189,]  189  189  189
## [190,]  190  190  190
## [191,]  191  191  191
## [192,]  192  192  192
## [193,]  193  193  193
## [194,]  194  194  194
## [195,]  195  195  195
## [196,]  196  196  196
## [197,]  197  197  197
## [198,]  198  198  198
## [199,]  199  199  199
## [200,]  200  200  200
table(SalaryData$gender)
## 
## Female   Male 
##     69    131
summary(SalaryData$salary)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   30028   60076   93164  108491  150437  255381
sd(SalaryData$salary)
## [1] 60116.36
summary(SalaryData$years_exp)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##  0.007167  7.749079 16.662402 15.666479 22.823554 29.666752
sd(SalaryData$years_exp)
## [1] 8.760716


2. Association between years and salary as scatterplot.

The scatterplot below shows a positive association between years of employment and salary. However, the relationship appears to be nonlinear—salaries rise with experience, but not at a constant rate. There may be diminishing returns at higher experience levels.

plot(SalaryData$years_exp, SalaryData$salary,
     main = "Scatterplot: Years of Experience vs. Salary",
     xlab = "Years of Experience", ylab = "Salary (€)",
     pch = 19, col = "steelblue")


3. Estimate salary by years of employment

To better estimate the relationship, we log-transformed the salary variable to linearize the association. A linear model was then fitted with log(salary) as the dependent variable and years of experience as the predictor.

SalaryData$log_salary = log(SalaryData$salary)
model = lm(log_salary ~ years_exp, data = SalaryData)
summary(model)
## 
## Call:
## lm(formula = log_salary ~ years_exp, data = SalaryData)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.74993 -0.11686  0.00666  0.11146  0.77461 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.436444   0.032197  324.14   <2e-16 ***
## years_exp    0.063322   0.001795   35.28   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2218 on 198 degrees of freedom
## Multiple R-squared:  0.8628, Adjusted R-squared:  0.8621 
## F-statistic:  1245 on 1 and 198 DF,  p-value: < 2.2e-16


4. Interpretation

The regression model shows a strong relationship between log-transformed salary and years of experience, with an R-squared of around 0.86. The slope of the model is approximately 0.063, which means that for each additional year of experience, salary increases by about 6.3% on average. This supports the idea that salary grows exponentially with experience rather than linearly.


5. (Voluntary) Gender effects

To explore gender effects, we fit separate models for male and female employees.

model_male = lm(log_salary ~ years_exp, data = subset(SalaryData, gender == "Male"))
model_female = lm(log_salary ~ years_exp, data = subset(SalaryData, gender == "Female"))

summary(model_male)
## 
## Call:
## lm(formula = log_salary ~ years_exp, data = subset(SalaryData, 
##     gender == "Male"))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.29310 -0.10494 -0.03192  0.07773  0.66850 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.54381    0.02735  385.48   <2e-16 ***
## years_exp    0.06296    0.00165   38.17   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1685 on 129 degrees of freedom
## Multiple R-squared:  0.9186, Adjusted R-squared:  0.918 
## F-statistic:  1457 on 1 and 129 DF,  p-value: < 2.2e-16
summary(model_female)
## 
## Call:
## lm(formula = log_salary ~ years_exp, data = subset(SalaryData, 
##     gender == "Female"))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.39797 -0.04687 -0.00449  0.08348  0.16695 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 9.891181   0.036259  272.79   <2e-16 ***
## years_exp   0.081904   0.001789   45.79   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1095 on 67 degrees of freedom
## Multiple R-squared:  0.969,  Adjusted R-squared:  0.9686 
## F-statistic:  2096 on 1 and 67 DF,  p-value: < 2.2e-16

While both models show a positive relationship between experience and salary, the slopes and intercepts differ slightly. This could indicate gender-based differences in salary progression, which would require further investigation to confirm and explain.