This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. ## Reading Titanic Dataset
2b. First we need to import the Dean’s Dilemma.csv file into R dataframe:
setwd("F:/Data Analytics for Managerial Applications")
mba.df <- read.csv(paste("Data - Deans Dilemma.csv", sep = ""))
View(mba.df)
2c. To summarize the dataset:
summary(mba.df)
## SlNo Gender Gender.B Percent_SSC Board_SSC
## Min. : 1.0 F:127 Min. :0.0000 Min. :37.00 CBSE :113
## 1st Qu.: 98.5 M:264 1st Qu.:0.0000 1st Qu.:56.00 ICSE : 77
## Median :196.0 Median :0.0000 Median :64.50 Others:201
## Mean :196.0 Mean :0.3248 Mean :64.65
## 3rd Qu.:293.5 3rd Qu.:1.0000 3rd Qu.:74.00
## Max. :391.0 Max. :1.0000 Max. :87.20
##
## Board_CBSE Board_ICSE Percent_HSC Board_HSC
## Min. :0.000 Min. :0.0000 Min. :40.0 CBSE : 96
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:54.0 ISC : 48
## Median :0.000 Median :0.0000 Median :63.0 Others:247
## Mean :0.289 Mean :0.1969 Mean :63.8
## 3rd Qu.:1.000 3rd Qu.:0.0000 3rd Qu.:72.0
## Max. :1.000 Max. :1.0000 Max. :94.7
##
## Stream_HSC Percent_Degree Course_Degree
## Arts : 18 Min. :35.00 Arts : 13
## Commerce:222 1st Qu.:57.52 Commerce :117
## Science :151 Median :63.00 Computer Applications: 32
## Mean :62.98 Engineering : 37
## 3rd Qu.:69.00 Management :163
## Max. :89.00 Others : 5
## Science : 24
## Degree_Engg Experience_Yrs Entrance_Test S.TEST
## Min. :0.00000 Min. :0.0000 MAT :265 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000 None : 67 1st Qu.:1.0000
## Median :0.00000 Median :0.0000 K-MAT : 24 Median :1.0000
## Mean :0.09463 Mean :0.4783 CAT : 22 Mean :0.8286
## 3rd Qu.:0.00000 3rd Qu.:1.0000 PGCET : 8 3rd Qu.:1.0000
## Max. :1.00000 Max. :3.0000 GCET : 2 Max. :1.0000
## (Other): 3
## Percentile_ET S.TEST.SCORE Percent_MBA
## Min. : 0.00 Min. : 0.00 Min. :50.83
## 1st Qu.:41.19 1st Qu.:41.19 1st Qu.:57.20
## Median :62.00 Median :62.00 Median :61.01
## Mean :54.93 Mean :54.93 Mean :61.67
## 3rd Qu.:78.00 3rd Qu.:78.00 3rd Qu.:66.02
## Max. :98.69 Max. :98.69 Max. :77.89
##
## Specialization_MBA Marks_Communication Marks_Projectwork
## Marketing & Finance:222 Min. :50.00 Min. :50.00
## Marketing & HR :156 1st Qu.:53.00 1st Qu.:64.00
## Marketing & IB : 13 Median :58.00 Median :69.00
## Mean :60.54 Mean :68.36
## 3rd Qu.:67.00 3rd Qu.:74.00
## Max. :88.00 Max. :87.00
##
## Marks_BOCA Placement Placement_B Salary
## Min. :50.00 Not Placed: 79 Min. :0.000 Min. : 0
## 1st Qu.:57.00 Placed :312 1st Qu.:1.000 1st Qu.:172800
## Median :63.00 Median :1.000 Median :240000
## Mean :64.38 Mean :0.798 Mean :219078
## 3rd Qu.:72.50 3rd Qu.:1.000 3rd Qu.:300000
## Max. :96.00 Max. :1.000 Max. :940000
##
library(psych)
describe(mba.df)
## vars n mean sd median trimmed
## SlNo 1 391 196.00 113.02 196.00 196.00
## Gender* 2 391 1.68 0.47 2.00 1.72
## Gender.B 3 391 0.32 0.47 0.00 0.28
## Percent_SSC 4 391 64.65 10.96 64.50 64.76
## Board_SSC* 5 391 2.23 0.87 3.00 2.28
## Board_CBSE 6 391 0.29 0.45 0.00 0.24
## Board_ICSE 7 391 0.20 0.40 0.00 0.12
## Percent_HSC 8 391 63.80 11.42 63.00 63.34
## Board_HSC* 9 391 2.39 0.85 3.00 2.48
## Stream_HSC* 10 391 2.34 0.56 2.00 2.36
## Percent_Degree 11 391 62.98 8.92 63.00 62.91
## Course_Degree* 12 391 3.85 1.61 4.00 3.81
## Degree_Engg 13 391 0.09 0.29 0.00 0.00
## Experience_Yrs 14 391 0.48 0.67 0.00 0.36
## Entrance_Test* 15 391 5.85 1.35 6.00 6.08
## S.TEST 16 391 0.83 0.38 1.00 0.91
## Percentile_ET 17 391 54.93 31.17 62.00 56.87
## S.TEST.SCORE 18 391 54.93 31.17 62.00 56.87
## Percent_MBA 19 391 61.67 5.85 61.01 61.45
## Specialization_MBA* 20 391 1.47 0.56 1.00 1.42
## Marks_Communication 21 391 60.54 8.82 58.00 59.68
## Marks_Projectwork 22 391 68.36 7.15 69.00 68.60
## Marks_BOCA 23 391 64.38 9.58 63.00 64.08
## Placement* 24 391 1.80 0.40 2.00 1.87
## Placement_B 25 391 0.80 0.40 1.00 0.87
## Salary 26 391 219078.26 138311.65 240000.00 217011.50
## mad min max range skew kurtosis
## SlNo 145.29 1.00 391.00 390.00 0.00 -1.21
## Gender* 0.00 1.00 2.00 1.00 -0.75 -1.45
## Gender.B 0.00 0.00 1.00 1.00 0.75 -1.45
## Percent_SSC 12.60 37.00 87.20 50.20 -0.06 -0.72
## Board_SSC* 0.00 1.00 3.00 2.00 -0.45 -1.53
## Board_CBSE 0.00 0.00 1.00 1.00 0.93 -1.14
## Board_ICSE 0.00 0.00 1.00 1.00 1.52 0.31
## Percent_HSC 13.34 40.00 94.70 54.70 0.29 -0.67
## Board_HSC* 0.00 1.00 3.00 2.00 -0.83 -1.13
## Stream_HSC* 0.00 1.00 3.00 2.00 -0.12 -0.72
## Percent_Degree 8.90 35.00 89.00 54.00 0.05 0.24
## Course_Degree* 1.48 1.00 7.00 6.00 0.00 -1.08
## Degree_Engg 0.00 0.00 1.00 1.00 2.76 5.63
## Experience_Yrs 0.00 0.00 3.00 3.00 1.27 1.17
## Entrance_Test* 0.00 1.00 9.00 8.00 -2.52 7.04
## S.TEST 0.00 0.00 1.00 1.00 -1.74 1.02
## Percentile_ET 25.20 0.00 98.69 98.69 -0.74 -0.69
## S.TEST.SCORE 25.20 0.00 98.69 98.69 -0.74 -0.69
## Percent_MBA 6.39 50.83 77.89 27.06 0.34 -0.52
## Specialization_MBA* 0.00 1.00 3.00 2.00 0.70 -0.56
## Marks_Communication 8.90 50.00 88.00 38.00 0.74 -0.25
## Marks_Projectwork 7.41 50.00 87.00 37.00 -0.26 -0.27
## Marks_BOCA 11.86 50.00 96.00 46.00 0.29 -0.85
## Placement* 0.00 1.00 2.00 1.00 -1.48 0.19
## Placement_B 0.00 0.00 1.00 1.00 -1.48 0.19
## Salary 88956.00 0.00 940000.00 940000.00 0.24 1.74
## se
## SlNo 5.72
## Gender* 0.02
## Gender.B 0.02
## Percent_SSC 0.55
## Board_SSC* 0.04
## Board_CBSE 0.02
## Board_ICSE 0.02
## Percent_HSC 0.58
## Board_HSC* 0.04
## Stream_HSC* 0.03
## Percent_Degree 0.45
## Course_Degree* 0.08
## Degree_Engg 0.01
## Experience_Yrs 0.03
## Entrance_Test* 0.07
## S.TEST 0.02
## Percentile_ET 1.58
## S.TEST.SCORE 1.58
## Percent_MBA 0.30
## Specialization_MBA* 0.03
## Marks_Communication 0.45
## Marks_Projectwork 0.36
## Marks_BOCA 0.48
## Placement* 0.02
## Placement_B 0.02
## Salary 6994.72
3a. To calculate the median salary of all students in the data sample:
median(mba.df[,"Salary"])
## [1] 240000
Therefore, the median salary is 240000.
3b. To calculate the percentage of students who were placed:
prop.table(table(mba.df$Placement))*100
##
## Not Placed Placed
## 20.2046 79.7954
Therefore, the percentage of students who were placed = 79.79%.
3c. To create a subset of only placed students:
placed <- mba.df[which(mba.df$Placement == 'Placed'),]
View(placed)
The View() helps us crosscheck that the placed dataframe has the desired values i.e. only of students who were placed.
3d. To calculate the median salary of only placed students, we check the median salary in the placed dataframe:
median(placed[,"Salary"])
## [1] 260000
Thus, we see, that since the unplaced students’ records are not considered, the median salary rises to 260000/-.
3e. To create a table that displays the mean salaries by gender:
aggregate(placed$Salary,by = list(sex = placed$Gender), mean)
## sex x
## 1 F 253068.0
## 2 M 284241.9
Therefore, we can see that the 2 row table lists the mean salaries for each gender. We have used the placed dataframe to consider only the placed candidates.
3f. To create a histogram to show the performance of MBA students who were places:
hist(placed$Percent_MBA, xlab = "MBA Percentage",ylab = "Count",main = "MBA performance of placed students",breaks = 3,col = "grey")
The histogram is generated as desired.
3g. To create a subset of only unplaced students:
notplaced <- mba.df[which(mba.df$Placement == 'Not Placed'),]
View(notplaced)
The View() helps us crosscheck that the placed dataframe has the desired values i.e. only of students who were not placed.
3h. To split the screen and display the histograms for MBA performance of placed and unplaced students:
par(mfrow = c(1,2))
hist(placed$Percent_MBA, xlab = "MBA Percentage",ylab = "Count",main = "MBA performance of placed students",breaks = 3,col = "grey")
hist(notplaced$Percent_MBA, xlab = "MBA Percentage",ylab = "Count",main = "MBA performance of not placed students",breaks = 3,col = "grey")
The histograms are generated side-by-side as desired.
3i. To create two boxplots showing the comparison of salaries of males and females:
##str(placed)
boxplot(placed$Salary ~ placed$Gender, horizontal = TRUE, yaxt = "n", ylab = "Gender", xlab = "Salary", las =1, main = "Comparison of salaries of males and females")
axis(side = 2, at=c(1,2), labels = c("Females","Males"))
The boxplots share the same axis and are helpful to compare data like median salaries, IQRs for males and females etc.
3j. To create a dataframe “placedET” that contains the data for all candidates who were placed AND gave an MBA entrance test prior to admission to the MBA program:
placedET <- placed[which(placed$Placement == "Placed" & placed$S.TEST > 0),]
View(placedET)
As we note, there are 261 candidates who fulfil both the criterias.
3k. To create a scatterplot matrix for 3 variables - Salary, Percent_MBA, Percentile_ET - of placedET dataframe:
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplotMatrix(formula = ~ Salary + Percent_MBA + Percentile_ET, main = "Scatter Plot Matrix", cex = 0.8, data = placedET, diagonal = "density", spread = FALSE)