1.Reading data file in R

 setwd("C:/Users/Taiyyab Ali/Desktop/R language")
MBASalary <- read.csv(paste("MBAStartingSalariesData.csv",sep=""))
View(MBASalary)
str(MBASalary)
## 'data.frame':    274 obs. of  13 variables:
##  $ age     : int  23 24 24 24 24 24 25 25 25 25 ...
##  $ sex     : int  2 1 1 1 2 1 1 2 1 1 ...
##  $ gmat_tot: int  620 610 670 570 710 640 610 650 630 680 ...
##  $ gmat_qpc: int  77 90 99 56 93 82 89 88 79 99 ...
##  $ gmat_vpc: int  87 71 78 81 98 89 74 89 91 81 ...
##  $ gmat_tpc: int  87 87 95 75 98 91 87 92 89 96 ...
##  $ s_avg   : num  3.4 3.5 3.3 3.3 3.6 3.9 3.4 3.3 3.3 3.45 ...
##  $ f_avg   : num  3 4 3.25 2.67 3.75 3.75 3.5 3.75 3.25 3.67 ...
##  $ quarter : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ work_yrs: int  2 2 2 1 2 2 2 2 2 2 ...
##  $ frstlang: int  1 1 1 1 1 1 1 1 2 1 ...
##  $ salary  : int  0 0 0 0 999 0 0 0 999 998 ...
##  $ satis   : int  7 6 6 7 5 6 5 6 4 998 ...
  1. Visualizing data via short summary
library(psych)
describe(MBASalary)[ ,c(3,4,5,7,8,9)]
##              mean       sd median     mad min    max
## age         27.36     3.71     27    2.97  22     48
## sex          1.25     0.43      1    0.00   1      2
## gmat_tot   619.45    57.54    620   59.30 450    790
## gmat_qpc    80.64    14.87     83   14.83  28     99
## gmat_vpc    78.32    16.86     81   14.83  16     99
## gmat_tpc    84.20    14.02     87   11.86   0     99
## s_avg        3.03     0.38      3    0.44   2      4
## f_avg        3.06     0.53      3    0.37   0      4
## quarter      2.48     1.11      2    1.48   1      4
## work_yrs     3.87     3.23      3    1.48   0     22
## frstlang     1.12     0.32      1    0.00   1      2
## salary   39025.69 50951.56    999 1481.12   0 220000
## satis      172.18   371.61      6    1.48   1    998
table(MBASalary$age)
## 
## 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 39 40 42 43 48 
##  2  8 33 53 40 46 21 22 12 10  8  1  4  3  2  1  2  2  1  2  1

Most of the MBAs students are between 23 to 32 years old.

3.Analysing sex ratio of students

table(MBASalary$sex)
## 
##   1   2 
## 206  68

So, there are total 206 male student and 68 female students.

4.What weigh more in GMAT score quantitative or verbal

plot(~gmat_qpc + gmat_vpc, data = MBASalary, main =  "Quantitative GMAT score vs Verbal GMAT score (both in percentile) ")
abline(0,1)

Most of the people scored well in GMAT but there case some people who have low score in verbal and high in quantitative and some are just opposite.

  1. Lets check is low score people don’t have english as their first language
plot(~MBASalary$frstlang+MBASalary$gmat_vpc,main = " Verbal score vs Firstlanguage")

As lowest score earn by an english speaking person and both distrubuted widely,so giving this excuse that mine native language is not english that’s why I didn’t score well in verbal.

6.Relation between GMAT score and gender

boxplot(MBASalary$gmat_tot~MBASalary$sex,horizontal = TRUE,yaxt="n", main = "Gender vs GMAT score")
axis(side = 2,at = c(1,2), labels = c("Male","Female"))

There no significant difference between male and female GMAT score, except highest score earned by a male.

  1. Student who have done good in both exams are really good
plot(~ MBASalary$s_avg + MBASalary$f_avg, main = "Fall MBA average vs Spring MBA average ")
abline(0,1)

So, who have done well in spring are also good in fall MBA excepet 3 exception.

  1. Is it true student who have done good in GMAT are also score well in MBA
plot(~ MBASalary$gmat_tpc + MBASalary$f_avg, main = "Fall MBA average vs GMAT score")
abline(lm(MBASalary$gmat_tpc~ MBASalary$f_avg))

On an average student who have scored more than 50 percentile have done good in their MBA.

table(MBASalary$quarter)
## 
##  1  2  3  4 
## 69 70 70 65

People are almost equal in every category 8. Is year of working Experience affect Starting salary

plot(~ MBASalary$work_yrs + MBASalary$salary)

Year of working experience affect salary to a certain limit. Most of MBAs have 1 to 7 years of experiece.

Most of the people ## R Markdown 1.Create summary statistics (e.g. mean, standard deviation, median, mode) for the important variables in the dataset. 2.Draw Box Plots / Bar Plots to visualize the distribution of each variable 3.independently 4.Draw Scatter Plots to understand how are the variables correlated pair-wise 5.Draw a Corrgram; Create a Variance-Covariance Matrix

  1. Are student satisfied with their program
Newvar <- MBASalary[which(MBASalary$satis<=7),]
boxplot(Newvar$satis, main = "  Distribution of satisfaction level between MBAs", horizontal = TRUE)

Most of the student on an average are satisfied by their program in MBA few students are very disappointed by their program.

library(corrgram)
library(ellipse)
corrgram(MBASalary, main = "corrgram plot of MBA Starting Salary data", lower.panel=panel.pts, upper.panel=panel.pie)

from this diagram, salary weakly correlated with spring MBA average. Not too much clearance between the correlation of salary and other variables.