CASE STUDY

WHETHER SALARY DEPEND ON AGE OR/AND EDUCATION

Introduction

The given data set includes three variables Age, Education, and Salary. To analyse this data we are performing regression analysis. For this regression analysis the independent and dependent variables are: dependent=salary; independent=age + education

Hypothesis:

Hypothesis_1

Null Hypothesis (H0)- There is no statistical significance b/w salary and age+education

Alternative Hypothesis (H1)- There is statistical significance b/w salary and age+education

Hypothesis_2

Null Hypothesis (H0)- There is no statistical significance b/w salary and education

Alternative Hypothesis (H1)- There is statistical significance b/w salary and education

#Setting the working directory

#Let us load thedatset - Education

Project_AES <- read_excel("Education.xlsx")

#Let use proceed with the case

#To check the data in the dataset

View(Project_AES)

#Let us create a regression model - to check the cause of effect b/w dependent and independent varibale

#Before we create a model first create a variable

sal<-Project_AES$Salary
edu<-Project_AES$Education
age<-Project_AES$Age
model1<-lm(sal~edu+age)
model1
## 
## Call:
## lm(formula = sal ~ edu + age)
## 
## Coefficients:
## (Intercept)          edu          age  
##   -12.58281      3.62060      0.03253

#To get the regression output use the below code

summary(model1)
## 
## Call:
## lm(formula = sal ~ edu + age)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -33.002  -8.406  -0.866   6.753  49.592 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -12.58281    5.70073  -2.207   0.0297 *  
## edu           3.62060    0.26562  13.631   <2e-16 ***
## age           0.03253    0.10944   0.297   0.7669    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.12 on 97 degrees of freedom
## Multiple R-squared:  0.668,  Adjusted R-squared:  0.6611 
## F-statistic: 97.57 on 2 and 97 DF,  p-value: < 2.2e-16

#Based on the regression output, here are the key points for decision making:

1.Coefficient:

Intercept= -12.58281; edu = 3.62060; age = 0.03253

Equation:(y=a+b1x1+b2x2)

Salary=-12.58281+3.62060(edu)+0.03253(age)

This equation is used to predict salary(marketing analysis; forcasting salary;so on)

2.Level of Significance: (5%-0.05) [Level of confidence=95%]

3.p-value:

Intercept = 0.0297 * < 0.05

edu = <2e-16 *** < 0.05

age = 0.7669 > 0.05

4.R-squared: (0.668=66.8%)

If regression R-squared is considered

5.Adjusted R-squared: (o.6611=66.11%)

If multiple regression Adjusted R-squared is considered

This implies, to the extend of 66.11% education explain the salary, not age.

Decision:

If P value is less than 0.05 reject the null hypothesis

If P value is greater than 0.05 accept the null hypothesis

Conclusion:

Hypothesis 1: Reject the null hypothesis

Hypothesis 2: Accept the null hypothesis

Therefore, it is clear from the analysis that the salary is dependent on education level and not of the age

End