MBASalaries.df <- read.csv("C:/interships/MBA Starting Salaries Data.csv")
View(MBASalaries.df)
summary(MBASalaries.df$age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 22.00 25.00 27.00 27.36 29.00 48.00
This tells us that the average age of the students is 27years.
summary(MBASalaries.df$gmat_tot)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 450.0 580.0 620.0 619.5 660.0 790.0
The mean gmat total score of the students are quite lesser than the maximum score that has been scored. The maximun score=790, whereas the mean of it is around 619.
table(MBASalaries.df$sex)
##
## 1 2
## 206 68
prop.table(table(MBASalaries.df$sex))
##
## 1 2
## 0.7518248 0.2481752
There were approximately 75.18% male in the institutional program whereas there were only 28.81% female.
table(MBASalaries.df$salary)
##
## 0 998 999 64000 77000 78256 82000 85000 86000 88000
## 90 46 35 1 1 1 1 4 2 1
## 88500 90000 92000 93000 95000 96000 96500 97000 98000 99000
## 1 3 3 3 7 4 1 2 10 1
## 100000 100400 101000 101100 101600 102500 103000 104000 105000 106000
## 9 1 2 1 1 1 1 2 11 3
## 107000 107300 107500 108000 110000 112000 115000 118000 120000 126710
## 1 1 1 2 1 3 5 1 4 1
## 130000 145800 146000 162000 220000
## 1 1 1 1 1
998 no of candidates didnt disclose ther salaries but we will see if the marks decided their salaries.
table(MBASalaries.df$satis)
##
## 1 2 3 4 5 6 7 998
## 1 1 5 17 74 97 33 46
This table shows us that most of the students were well satisfied with this management program.
summary(MBASalaries.df$gmat_tpc)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 78.0 87.0 84.2 94.0 99.0
table(MBASalaries.df$frstlang)
##
## 1 2
## 242 32
This table shows us that the number of students whose first language is English were 242 and the number of students whose first language was some other language is 32. We will see if the first language is a measure which effects the the marks of the students and the salaries or not.
library("psych", lib.loc="~/R/win-library/3.4")
mba_job<- MBASalaries.df[which(MBASalaries.df$salary!='0'),]
t.test(mba_job$sex, mba_job$salary, var.equal = TRUE)
##
## Two Sample t-test
##
## data: mba_job$sex and mba_job$salary
## t = -15.012, df = 366, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -65725.64 -50500.56
## sample estimates:
## mean of x mean of y
## 1.244565 58114.342391
Since the p<0.05, hence we reject null hytothesis saying the salaries of males is not higher than females and that the variable salary is independent of sex.
boxplot(MBASalaries.df, xlab="salary",ylab="mba student", horizontal = TRUE)
mba_job<- MBASalaries.df[which(MBASalaries.df$salary!='0'),]
salaryregg<-lm(mba_job$salary~mba_job$gmat_tot+mba_job$sex+mba_job$gmat_tpc+mba_job$work_yrs+mba_job$frstlang)
summary(salaryregg)
##
## Call:
## lm(formula = mba_job$salary ~ mba_job$gmat_tot + mba_job$sex +
## mba_job$gmat_tpc + mba_job$work_yrs + mba_job$frstlang)
##
## Residuals:
## Min 1Q Median 3Q Max
## -88592 -51648 20551 39847 126530
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 177593.0 55338.3 3.209 0.00158 **
## mba_job$gmat_tot -286.4 134.4 -2.131 0.03447 *
## mba_job$sex 14577.9 8679.1 1.680 0.09478 .
## mba_job$gmat_tpc 753.4 565.0 1.333 0.18407
## mba_job$work_yrs 3797.4 1519.1 2.500 0.01333 *
## mba_job$frstlang -32729.2 11226.1 -2.915 0.00401 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 50350 on 178 degrees of freedom
## Multiple R-squared: 0.1057, Adjusted R-squared: 0.08053
## F-statistic: 4.206 on 5 and 178 DF, p-value: 0.001228
mba_job<- MBASalaries.df[which(MBASalaries.df$salary!='0'),]
salaryregg<-lm(mba_job$salary~mba_job$gmat_tot+mba_job$sex+mba_job$gmat_tpc+mba_job$work_yrs+mba_job$frstlang+mba_job$age)
summary(salaryregg)
##
## Call:
## lm(formula = mba_job$salary ~ mba_job$gmat_tot + mba_job$sex +
## mba_job$gmat_tpc + mba_job$work_yrs + mba_job$frstlang +
## mba_job$age)
##
## Residuals:
## Min 1Q Median 3Q Max
## -86355 -51529 20692 39672 125538
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 166385.2 76804.1 2.166 0.03162 *
## mba_job$gmat_tot -292.6 137.9 -2.122 0.03527 *
## mba_job$sex 14845.7 8794.5 1.688 0.09316 .
## mba_job$gmat_tpc 780.6 581.0 1.344 0.18081
## mba_job$work_yrs 3290.2 2845.2 1.156 0.24907
## mba_job$frstlang -33379.5 11670.4 -2.860 0.00474 **
## mba_job$age 556.8 2638.0 0.211 0.83309
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 50490 on 177 degrees of freedom
## Multiple R-squared: 0.1059, Adjusted R-squared: 0.07557
## F-statistic: 3.493 on 6 and 177 DF, p-value: 0.002736
Keeping into consideration both the analysis, we see that the regression 2 model is a better model as it can actually show us all the dependent variables which effects the salary of a student. It isn’t such a good model maybe becuase the no. of rows are really less for a good regression analysis. The star mark beside the variables denote that those corresponding variables are the ones which mostly affect the salary.
library("car", lib.loc="~/R/win-library/3.4")
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplot(mba_job$sex,mba_job$salary,xlab = "sex", ylab = "salary")
library("car", lib.loc="~/R/win-library/3.4")
scatterplot(mba_job$age,mba_job$salary,xlab = "age", ylab = "salary")
library("corrgram", lib.loc="~/R/win-library/3.4")
corrgram(mba_job, order=TRUE, lower.panel=panel.shade,
upper.panel=panel.pie, text.panel=panel.txt,
main="Corrgram of Variables")
By this analysis, it can be seen that there are only few people who didnt get the job. Most of the students were really satisfied with the management programme. The regression analysis gave us a wider view to look upon the different factors which actually affect the salary of an individual. The t.test showed us that the sex has no connection with one’s salary. They both are independent. The scatterplot shows the variations between sex and salary and salary and age. Overall, it is a pretty good model.