Abstract

As young adults enter the workforce, they may experience conflicting expectations and realities surrounding their wages. Utilizing data gathered from Kaggle, our team analyzed relevant information regarding the correlation between age and earnings. While generational gaps widen, the varying perceptions of what workers may demand from their employers act to further divide. Are millennials’ poor financial decision makers? Did baby boomers have it easier paying for college? Are young people too lazy to work? Much discourse envelops the aforementioned questions, let’s take a statistical approach to finding answers. You can refer to the Kaggle article on the following link: https://www.kaggle.com/datasets/codebreaker619/salary-data-with-age-and-experience

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

age<-c(21,21.5,21.7,22,22.2,23,23,23.3,23.3,23.6,23.9,24,24,24,25,25,26,27,28,29,30,30,31,32,33,34,35,36,37,38)
salary<-c(39343,46205,37731,43525,39891,56642,60150,54445,64445,57189,63218,55794,56957,57081,61111,67938,66029,83088,81363,93940,91738,98273,101302,113812,109431,105582,116969,112635,122391,121872)
 plot(age,salary)

 cor(age,salary)
## [1] 0.9745295
model.name<-lm(salary~age)
 project<-lm(salary~age)
summary(project)$coef
##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) -64878.124  6232.3206 -10.40995 3.935544e-11
## age           5176.281   225.1098  22.99448 1.013771e-19
summary(project)$r.squared
## [1] 0.9497078
anova(project)
## Analysis of Variance Table
## 
## Response: salary
##           Df     Sum Sq    Mean Sq F value    Pr(>F)    
## age        1 2.0699e+10 2.0699e+10  528.75 < 2.2e-16 ***
## Residuals 28 1.0961e+09 3.9147e+07                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(age,salary,main="Salary vs Age")
abline(project$coef,lty=1)

Conclusion

The difficulty of beginning to work a job is compounded by low starting salaries, as is evident in the linear model we created. With an R2 of 0.94, our robust model demonstrates a strong relationship between age and salary. The data points plotted on the graph reveal no outliers and minimal inaccuracies. The older the individual is, the more money we can predict they make. It can be inferred that older generations have a warped view of the struggle of the younger population. To conclude, throughout the decades, the experiences of each age group have contrasted each other and contributed to distorted perspectives.

ANOVA Analysis

From the Anova Table we know the SSR = 2.0699e+10 From the Anova Table we know the SSE = 1.0961e+9 So the SST must be 2.0699e+10 + 1.0961e+9 = 21,795,100,000 We can calculate the R2 = SSR/SST = 2.0699e+10/21,795,100,000 = 0.94