1.Reading data file in R
setwd("C:/Users/Taiyyab Ali/Desktop/R language")
MBASalary <- read.csv(paste("MBAStartingSalariesData.csv",sep=""))
View(MBASalary)
str(MBASalary)
## 'data.frame': 274 obs. of 13 variables:
## $ age : int 23 24 24 24 24 24 25 25 25 25 ...
## $ sex : int 2 1 1 1 2 1 1 2 1 1 ...
## $ gmat_tot: int 620 610 670 570 710 640 610 650 630 680 ...
## $ gmat_qpc: int 77 90 99 56 93 82 89 88 79 99 ...
## $ gmat_vpc: int 87 71 78 81 98 89 74 89 91 81 ...
## $ gmat_tpc: int 87 87 95 75 98 91 87 92 89 96 ...
## $ s_avg : num 3.4 3.5 3.3 3.3 3.6 3.9 3.4 3.3 3.3 3.45 ...
## $ f_avg : num 3 4 3.25 2.67 3.75 3.75 3.5 3.75 3.25 3.67 ...
## $ quarter : int 1 1 1 1 1 1 1 1 1 1 ...
## $ work_yrs: int 2 2 2 1 2 2 2 2 2 2 ...
## $ frstlang: int 1 1 1 1 1 1 1 1 2 1 ...
## $ salary : int 0 0 0 0 999 0 0 0 999 998 ...
## $ satis : int 7 6 6 7 5 6 5 6 4 998 ...
library(psych)
describe(MBASalary)[ ,c(3,4,5,7,8,9)]
## mean sd median mad min max
## age 27.36 3.71 27 2.97 22 48
## sex 1.25 0.43 1 0.00 1 2
## gmat_tot 619.45 57.54 620 59.30 450 790
## gmat_qpc 80.64 14.87 83 14.83 28 99
## gmat_vpc 78.32 16.86 81 14.83 16 99
## gmat_tpc 84.20 14.02 87 11.86 0 99
## s_avg 3.03 0.38 3 0.44 2 4
## f_avg 3.06 0.53 3 0.37 0 4
## quarter 2.48 1.11 2 1.48 1 4
## work_yrs 3.87 3.23 3 1.48 0 22
## frstlang 1.12 0.32 1 0.00 1 2
## salary 39025.69 50951.56 999 1481.12 0 220000
## satis 172.18 371.61 6 1.48 1 998
table(MBASalary$age)
##
## 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 39 40 42 43 48
## 2 8 33 53 40 46 21 22 12 10 8 1 4 3 2 1 2 2 1 2 1
Most of the MBAs students are between 23 to 32 years old.
3.Analysing sex ratio of students
table(MBASalary$sex)
##
## 1 2
## 206 68
So, there are total 206 male student and 68 female students.
4.What weigh more in GMAT score quantitative or verbal
plot(~gmat_qpc + gmat_vpc, data = MBASalary, main = "Quantitative GMAT score vs Verbal GMAT score (both in percentile) ")
abline(0,1)
Most of the people scored well in GMAT but there case some people who have low score in verbal and high in quantitative and some are just opposite.
plot(~MBASalary$frstlang+MBASalary$gmat_vpc,main = " Verbal score vs Firstlanguage")
As lowest score earn by an english speaking person and both distrubuted widely,so giving this excuse that mine native language is not english that’s why I didn’t score well in verbal.
6.Relation between GMAT score and gender
boxplot(MBASalary$gmat_tot~MBASalary$sex,horizontal = TRUE,yaxt="n", main = "Gender vs GMAT score")
axis(side = 2,at = c(1,2), labels = c("Male","Female"))
There no significant difference between male and female GMAT score, except highest score earned by a male.
plot(~ MBASalary$s_avg + MBASalary$f_avg, main = "Fall MBA average vs Spring MBA average ")
abline(0,1)
So, who have done well in spring are also good in fall MBA excepet 3 exception.
plot(~ MBASalary$gmat_tpc + MBASalary$f_avg, main = "Fall MBA average vs GMAT score")
abline(lm(MBASalary$gmat_tpc~ MBASalary$f_avg))
On an average student who have scored more than 50 percentile have done good in their MBA.
table(MBASalary$quarter)
##
## 1 2 3 4
## 69 70 70 65
People are almost equal in every category 8. Is year of working Experience affect Starting salary
plot(~ MBASalary$work_yrs + MBASalary$salary)
Year of working experience affect salary to a certain limit. Most of MBAs have 1 to 7 years of experiece.
Most of the people ## R Markdown 1.Create summary statistics (e.g. mean, standard deviation, median, mode) for the important variables in the dataset. 2.Draw Box Plots / Bar Plots to visualize the distribution of each variable 3.independently 4.Draw Scatter Plots to understand how are the variables correlated pair-wise 5.Draw a Corrgram; Create a Variance-Covariance Matrix
Newvar <- MBASalary[which(MBASalary$satis<=7),]
boxplot(Newvar$satis, main = " Distribution of satisfaction level between MBAs", horizontal = TRUE)
Most of the student on an average are satisfied by their program in MBA few students are very disappointed by their program.
library(corrgram)
library(ellipse)
corrgram(MBASalary, main = "corrgram plot of MBA Starting Salary data", lower.panel=panel.pts, upper.panel=panel.pie)
from this diagram, salary weakly correlated with spring MBA average. Not too much clearance between the correlation of salary and other variables.