WHETHER SALARY DEPEND ON AGE OR/AND EDUCATION
The given data set includes three variables Age, Education, and Salary. To analyse this data we are performing regression analysis. For this regression analysis the independent and dependent variables are: dependent=salary; independent=age + education
Null Hypothesis (H0)- There is no statistical significance b/w salary and age+education
Alternative Hypothesis (H1)- There is statistical significance b/w salary and age+education
Null Hypothesis (H0)- There is no statistical significance b/w salary and education
Alternative Hypothesis (H1)- There is statistical significance b/w salary and education
#Setting the working directory
#Let us load thedatset - Education
Project_AES <- read_excel("Education.xlsx")
#Let use proceed with the case
#To check the data in the dataset
View(Project_AES)
#Let us create a regression model - to check the cause of effect b/w dependent and independent varibale
#Before we create a model first create a variable
sal<-Project_AES$Salary
edu<-Project_AES$Education
age<-Project_AES$Age
model1<-lm(sal~edu+age)
model1
##
## Call:
## lm(formula = sal ~ edu + age)
##
## Coefficients:
## (Intercept) edu age
## -12.58281 3.62060 0.03253
#To get the regression output use the below code
summary(model1)
##
## Call:
## lm(formula = sal ~ edu + age)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.002 -8.406 -0.866 6.753 49.592
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -12.58281 5.70073 -2.207 0.0297 *
## edu 3.62060 0.26562 13.631 <2e-16 ***
## age 0.03253 0.10944 0.297 0.7669
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.12 on 97 degrees of freedom
## Multiple R-squared: 0.668, Adjusted R-squared: 0.6611
## F-statistic: 97.57 on 2 and 97 DF, p-value: < 2.2e-16
#Based on the regression output, here are the key points for decision making:
1.Coefficient:
Intercept= -12.58281; edu = 3.62060; age = 0.03253
Equation:(y=a+b1x1+b2x2)
Salary=-12.58281+3.62060(edu)+0.03253(age)
This equation is used to predict salary(marketing analysis; forcasting salary;so on)
2.Level of Significance: (5%-0.05) [Level of confidence=95%]
3.p-value:
Intercept = 0.0297 * < 0.05
edu = <2e-16 *** < 0.05
age = 0.7669 > 0.05
4.R-squared: (0.668=66.8%)
If regression R-squared is considered
5.Adjusted R-squared: (o.6611=66.11%)
If multiple regression Adjusted R-squared is considered
This implies, to the extend of 66.11% education explain the salary, not age.
If P value is less than 0.05 reject the null hypothesis
If P value is greater than 0.05 accept the null hypothesis
Hypothesis 1: Reject the null hypothesis
Hypothesis 2: Accept the null hypothesis
Therefore, it is clear from the analysis that the salary is dependent on education level and not of the age