This is an R Markdown report on the case study of “A Dean’s Dilemma”. It contains a list of tasks carried out on the dataset which were mentioned in the Week 1 Day 6 task list.
setwd("~/Muyeena/Internship/Deans dillemma")
mbadata = read.csv("deans dilemma.csv")
#View(mbadata)
str(mbadata) ## To get a basic idea about the structure of the dataset
## 'data.frame': 391 obs. of 26 variables:
## $ SlNo : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Gender : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 1 2 2 1 ...
## $ Gender.B : int 0 0 0 0 0 0 1 0 0 1 ...
## $ Percent_SSC : num 62 76.3 72 60 61 ...
## $ Board_SSC : Factor w/ 3 levels "CBSE","ICSE",..: 3 2 3 1 1 2 3 2 1 1 ...
## $ Board_CBSE : int 0 0 0 1 1 0 0 0 1 1 ...
## $ Board_ICSE : int 0 1 0 0 0 1 0 1 0 0 ...
## $ Percent_HSC : num 88 75.3 78 63 55 ...
## $ Board_HSC : Factor w/ 3 levels "CBSE","ISC","Others": 3 3 3 1 2 1 3 2 1 1 ...
## $ Stream_HSC : Factor w/ 3 levels "Arts","Commerce",..: 2 3 2 1 3 2 3 2 2 1 ...
## $ Percent_Degree : num 52 75.5 66.6 58 54 ...
## $ Course_Degree : Factor w/ 7 levels "Arts","Commerce",..: 7 3 4 5 4 2 6 5 2 5 ...
## $ Degree_Engg : int 0 0 1 0 1 0 0 0 0 0 ...
## $ Experience_Yrs : int 0 1 0 0 1 0 2 0 0 1 ...
## $ Entrance_Test : Factor w/ 9 levels "CAT","G-MAT",..: 6 6 7 6 6 7 7 6 6 7 ...
## $ S.TEST : int 1 1 0 1 1 0 0 1 1 0 ...
## $ Percentile_ET : num 55 86.5 0 75 66 ...
## $ S.TEST.SCORE : num 55 86.5 0 75 66 ...
## $ Percent_MBA : num 58.8 66.3 52.9 57.8 59.4 ...
## $ Specialization_MBA : Factor w/ 3 levels "Marketing & Finance",..: 2 1 1 1 2 1 2 1 1 2 ...
## $ Marks_Communication: int 50 69 50 54 52 53 63 74 65 50 ...
## $ Marks_Projectwork : int 65 70 61 66 65 70 56 72 76 59 ...
## $ Marks_BOCA : int 74 75 59 62 67 53 50 50 70 77 ...
## $ Placement : Factor w/ 2 levels "Not Placed","Placed": 2 2 2 2 2 2 2 2 2 2 ...
## $ Placement_B : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Salary : int 270000 200000 240000 250000 180000 300000 260000 235000 425000 240000 ...
head(mbadata) ## To display the first six rows of the dataset
## SlNo Gender Gender.B Percent_SSC Board_SSC Board_CBSE Board_ICSE
## 1 1 M 0 62.00 Others 0 0
## 2 2 M 0 76.33 ICSE 0 1
## 3 3 M 0 72.00 Others 0 0
## 4 4 M 0 60.00 CBSE 1 0
## 5 5 M 0 61.00 CBSE 1 0
## 6 6 M 0 55.00 ICSE 0 1
## Percent_HSC Board_HSC Stream_HSC Percent_Degree Course_Degree
## 1 88.00 Others Commerce 52.00 Science
## 2 75.33 Others Science 75.48 Computer Applications
## 3 78.00 Others Commerce 66.63 Engineering
## 4 63.00 CBSE Arts 58.00 Management
## 5 55.00 ISC Science 54.00 Engineering
## 6 64.00 CBSE Commerce 50.00 Commerce
## Degree_Engg Experience_Yrs Entrance_Test S.TEST Percentile_ET
## 1 0 0 MAT 1 55.0
## 2 0 1 MAT 1 86.5
## 3 1 0 None 0 0.0
## 4 0 0 MAT 1 75.0
## 5 1 1 MAT 1 66.0
## 6 0 0 None 0 0.0
## S.TEST.SCORE Percent_MBA Specialization_MBA Marks_Communication
## 1 55.0 58.80 Marketing & HR 50
## 2 86.5 66.28 Marketing & Finance 69
## 3 0.0 52.91 Marketing & Finance 50
## 4 75.0 57.80 Marketing & Finance 54
## 5 66.0 59.43 Marketing & HR 52
## 6 0.0 56.81 Marketing & Finance 53
## Marks_Projectwork Marks_BOCA Placement Placement_B Salary
## 1 65 74 Placed 1 270000
## 2 70 75 Placed 1 200000
## 3 61 59 Placed 1 240000
## 4 66 62 Placed 1 250000
## 5 65 67 Placed 1 180000
## 6 70 53 Placed 1 300000
The summary function gives the summary of each variable in the data set, including the frequency (or count) for categorical data.
The describe function offers a clean tabular format, giving the statistics of all the variables present in the dataset. It distinguishes the categorical data with an * in the column name.
summary(mbadata)
## SlNo Gender Gender.B Percent_SSC Board_SSC
## Min. : 1.0 F:127 Min. :0.0000 Min. :37.00 CBSE :113
## 1st Qu.: 98.5 M:264 1st Qu.:0.0000 1st Qu.:56.00 ICSE : 77
## Median :196.0 Median :0.0000 Median :64.50 Others:201
## Mean :196.0 Mean :0.3248 Mean :64.65
## 3rd Qu.:293.5 3rd Qu.:1.0000 3rd Qu.:74.00
## Max. :391.0 Max. :1.0000 Max. :87.20
##
## Board_CBSE Board_ICSE Percent_HSC Board_HSC
## Min. :0.000 Min. :0.0000 Min. :40.0 CBSE : 96
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:54.0 ISC : 48
## Median :0.000 Median :0.0000 Median :63.0 Others:247
## Mean :0.289 Mean :0.1969 Mean :63.8
## 3rd Qu.:1.000 3rd Qu.:0.0000 3rd Qu.:72.0
## Max. :1.000 Max. :1.0000 Max. :94.7
##
## Stream_HSC Percent_Degree Course_Degree
## Arts : 18 Min. :35.00 Arts : 13
## Commerce:222 1st Qu.:57.52 Commerce :117
## Science :151 Median :63.00 Computer Applications: 32
## Mean :62.98 Engineering : 37
## 3rd Qu.:69.00 Management :163
## Max. :89.00 Others : 5
## Science : 24
## Degree_Engg Experience_Yrs Entrance_Test S.TEST
## Min. :0.00000 Min. :0.0000 MAT :265 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000 None : 67 1st Qu.:1.0000
## Median :0.00000 Median :0.0000 K-MAT : 24 Median :1.0000
## Mean :0.09463 Mean :0.4783 CAT : 22 Mean :0.8286
## 3rd Qu.:0.00000 3rd Qu.:1.0000 PGCET : 8 3rd Qu.:1.0000
## Max. :1.00000 Max. :3.0000 GCET : 2 Max. :1.0000
## (Other): 3
## Percentile_ET S.TEST.SCORE Percent_MBA
## Min. : 0.00 Min. : 0.00 Min. :50.83
## 1st Qu.:41.19 1st Qu.:41.19 1st Qu.:57.20
## Median :62.00 Median :62.00 Median :61.01
## Mean :54.93 Mean :54.93 Mean :61.67
## 3rd Qu.:78.00 3rd Qu.:78.00 3rd Qu.:66.02
## Max. :98.69 Max. :98.69 Max. :77.89
##
## Specialization_MBA Marks_Communication Marks_Projectwork
## Marketing & Finance:222 Min. :50.00 Min. :50.00
## Marketing & HR :156 1st Qu.:53.00 1st Qu.:64.00
## Marketing & IB : 13 Median :58.00 Median :69.00
## Mean :60.54 Mean :68.36
## 3rd Qu.:67.00 3rd Qu.:74.00
## Max. :88.00 Max. :87.00
##
## Marks_BOCA Placement Placement_B Salary
## Min. :50.00 Not Placed: 79 Min. :0.000 Min. : 0
## 1st Qu.:57.00 Placed :312 1st Qu.:1.000 1st Qu.:172800
## Median :63.00 Median :1.000 Median :240000
## Mean :64.38 Mean :0.798 Mean :219078
## 3rd Qu.:72.50 3rd Qu.:1.000 3rd Qu.:300000
## Max. :96.00 Max. :1.000 Max. :940000
##
library(psych) ## The describe function is present under the package psych
describe(mbadata)
## vars n mean sd median trimmed
## SlNo 1 391 196.00 113.02 196.00 196.00
## Gender* 2 391 1.68 0.47 2.00 1.72
## Gender.B 3 391 0.32 0.47 0.00 0.28
## Percent_SSC 4 391 64.65 10.96 64.50 64.76
## Board_SSC* 5 391 2.23 0.87 3.00 2.28
## Board_CBSE 6 391 0.29 0.45 0.00 0.24
## Board_ICSE 7 391 0.20 0.40 0.00 0.12
## Percent_HSC 8 391 63.80 11.42 63.00 63.34
## Board_HSC* 9 391 2.39 0.85 3.00 2.48
## Stream_HSC* 10 391 2.34 0.56 2.00 2.36
## Percent_Degree 11 391 62.98 8.92 63.00 62.91
## Course_Degree* 12 391 3.85 1.61 4.00 3.81
## Degree_Engg 13 391 0.09 0.29 0.00 0.00
## Experience_Yrs 14 391 0.48 0.67 0.00 0.36
## Entrance_Test* 15 391 5.85 1.35 6.00 6.08
## S.TEST 16 391 0.83 0.38 1.00 0.91
## Percentile_ET 17 391 54.93 31.17 62.00 56.87
## S.TEST.SCORE 18 391 54.93 31.17 62.00 56.87
## Percent_MBA 19 391 61.67 5.85 61.01 61.45
## Specialization_MBA* 20 391 1.47 0.56 1.00 1.42
## Marks_Communication 21 391 60.54 8.82 58.00 59.68
## Marks_Projectwork 22 391 68.36 7.15 69.00 68.60
## Marks_BOCA 23 391 64.38 9.58 63.00 64.08
## Placement* 24 391 1.80 0.40 2.00 1.87
## Placement_B 25 391 0.80 0.40 1.00 0.87
## Salary 26 391 219078.26 138311.65 240000.00 217011.50
## mad min max range skew kurtosis
## SlNo 145.29 1.00 391.00 390.00 0.00 -1.21
## Gender* 0.00 1.00 2.00 1.00 -0.75 -1.45
## Gender.B 0.00 0.00 1.00 1.00 0.75 -1.45
## Percent_SSC 12.60 37.00 87.20 50.20 -0.06 -0.72
## Board_SSC* 0.00 1.00 3.00 2.00 -0.45 -1.53
## Board_CBSE 0.00 0.00 1.00 1.00 0.93 -1.14
## Board_ICSE 0.00 0.00 1.00 1.00 1.52 0.31
## Percent_HSC 13.34 40.00 94.70 54.70 0.29 -0.67
## Board_HSC* 0.00 1.00 3.00 2.00 -0.83 -1.13
## Stream_HSC* 0.00 1.00 3.00 2.00 -0.12 -0.72
## Percent_Degree 8.90 35.00 89.00 54.00 0.05 0.24
## Course_Degree* 1.48 1.00 7.00 6.00 0.00 -1.08
## Degree_Engg 0.00 0.00 1.00 1.00 2.76 5.63
## Experience_Yrs 0.00 0.00 3.00 3.00 1.27 1.17
## Entrance_Test* 0.00 1.00 9.00 8.00 -2.52 7.04
## S.TEST 0.00 0.00 1.00 1.00 -1.74 1.02
## Percentile_ET 25.20 0.00 98.69 98.69 -0.74 -0.69
## S.TEST.SCORE 25.20 0.00 98.69 98.69 -0.74 -0.69
## Percent_MBA 6.39 50.83 77.89 27.06 0.34 -0.52
## Specialization_MBA* 0.00 1.00 3.00 2.00 0.70 -0.56
## Marks_Communication 8.90 50.00 88.00 38.00 0.74 -0.25
## Marks_Projectwork 7.41 50.00 87.00 37.00 -0.26 -0.27
## Marks_BOCA 11.86 50.00 96.00 46.00 0.29 -0.85
## Placement* 0.00 1.00 2.00 1.00 -1.48 0.19
## Placement_B 0.00 0.00 1.00 1.00 -1.48 0.19
## Salary 88956.00 0.00 940000.00 940000.00 0.24 1.74
## se
## SlNo 5.72
## Gender* 0.02
## Gender.B 0.02
## Percent_SSC 0.55
## Board_SSC* 0.04
## Board_CBSE 0.02
## Board_ICSE 0.02
## Percent_HSC 0.58
## Board_HSC* 0.04
## Stream_HSC* 0.03
## Percent_Degree 0.45
## Course_Degree* 0.08
## Degree_Engg 0.01
## Experience_Yrs 0.03
## Entrance_Test* 0.07
## S.TEST 0.02
## Percentile_ET 1.58
## S.TEST.SCORE 1.58
## Percent_MBA 0.30
## Specialization_MBA* 0.03
## Marks_Communication 0.45
## Marks_Projectwork 0.36
## Marks_BOCA 0.48
## Placement* 0.02
## Placement_B 0.02
## Salary 6994.72
R has an in-built function called median, which gives the median of any variable in the dataset. The same function has been called for this task.
median(mbadata$Salary)
## [1] 240000
The median of all the students in the data set is 240000.
This task requires :
tplaced = table(mbadata$Placement)
tplaced
##
## Not Placed Placed
## 79 312
p.tplaced = round(prop.table(tplaced)*100, 2)
p.tplaced
##
## Not Placed Placed
## 20.2 79.8
Therefore, 79.8% of students were placed
This task uses the which function to divide the dataset based on a condition, and all the values confirming to the new condition are added in the new dataframe. The dim function is used to calculate the dimensions of the new dataframe. We then compare the number of rows in this dataframe with the total number of students placed (calculated above), as a basic verification method.
placed = mbadata[which(mbadata$Placement_B == 1),]
dim(placed)
## [1] 312 26
The same median function is used, but now we use the dataframe placed.
median(placed$Salary) ## Notice that we have used the "placed" dataset.
## [1] 260000
The median salary of all the students who were placed is 2.610^{5}
For this task we use the aggregate function.
aggregate(placed$Salary, by=list(Gender = placed$Gender), mean)
## Gender x
## 1 F 253068.0
## 2 M 284241.9
The given histogram can be achieved by using the hist function, with various arguments to get it in the same format as mentioned.
hist(placed$Percent_MBA, ## The variable for which the histogram is required
xlab = "MBA Percentage", ## x-axis label
ylab = "Count", ## Y-axis Label
breaks = 3, ## Number of bars
col = "grey", ## Colour of the histogram
main = "MBA Performance of placed Students") ## Main title of the histogram
Similar to task 3b. Once again dim function is used to do a quick manual check that the subset dataframe created is correct.
notplaced = mbadata[which(mbadata$Placement_B == 0),]
dim(notplaced)
## [1] 79 26
To have multi-panel plots we use the par function, with mfrow argument.
par(mfrow=c(1,2), mai=c(1,1,1,1)) ## The first number '1' indicates one row and '2' indicates two columns. "mai" argument is the margin in inches
with(placed, hist(Percent_MBA,
xlab = "MBA Percentage",
ylab = "Count",
breaks = 3,
col = "grey",
main = "MBA Performance of placed Students"))
with(notplaced, hist(Percent_MBA,
xlab = "MBA Percentage",
ylab = "Count",
breaks = 3,
col = "grey",
main = "MBA Performance of not placed Students"))
par(mfrow=c(1,1))
We use the boxplot function, by giving in the two variable Salary and Gender in the placed dataset.
boxplot(Salary ~ Gender.B, data = placed,
horizontal = TRUE,
yaxt = "n",
ylab = "Gender",
xlab = "Salary",
main = "Comparison of Salaries of Males & Females")
axis(side = 2, at=c(1,2), labels = c("Males","Females"))
This dataframe should represent :
We use the same which function to create a subset, but with the placed dataframe.
placedET = placed[which(placed$S.TEST == 1),]
table(placed$S.TEST) ## To find the frequency of students in *placed* dataset who gave some test or the other
##
## 0 1
## 51 261
dim(placedET) ## The dimensions of the new dataframe so that we can manually compare it with the above value.
## [1] 261 26
For this task we use the scatterplotMatrix function present in package.
library(car)
scatterplotMatrix(formula = ~ Salary + Percent_MBA + Percentile_ET,
cex = 0.6,
data = placedET)
With this task, we have come to the end of the assignment.