This is an assignment given on Week 1, Day 6 of the Data Analytics Internship under Prof. Sameer Mathur, IIML. TASK 2a : Download and review the Data - Dean’s Dilemma.csv data file associated with this case.
TASK 2b : Read the data set into RStudio.
setwd("C:/Users/Krushna/Downloads/UDEMY")
dean.df <- read.csv(paste("Data - Deans Dilemma.csv", sep=""))
View(dean.df)
TASK 2c : Create summary statistics for the important variables in the dataset.
summary(dean.df)
Describing the data
library(psych)
describe(dean.df)
TASK 3a : Use R to calculate the median salary of all the students in the data sample.
median<-median(dean.df$Salary)
median
TASK 3b : Use R to calculate the percentage of students who were placed, correct to 2 decimal places.
placed<-with(dean.df,table(dean.df$Placement))
prop.table(placed)*100
TASK 3c : Use R to create a dataframe called placed, that contains a subset of only those students who were successfully placed.
placed<-dean.df[which(dean.df$Placement=="Placed"),]
View(placed)
TASK 3d : Use R to find the median salary of students who were placed.
mediansal<-median(placed$Salary)
mediansal
TASK 3e : Use R to create a table showing the mean salary of males and females, who were placed.
mytable<-aggregate(placed$Salary, by=list(placed$Gender),mean)
mytable
TASK 3f : Use R to generate the following histogram showing a breakup of the MBA performance of the students who were placed.
hist(placed$Percent_MBA, xlab = "MBA Percentage", ylab = "Count", main = "MBA Performance of Placed Students", breaks=3, col="tan3")
TASK 3g : Create a dataframe called notplaced, that contains a subset of only those students who were NOT placed after their MBA.
notplaced<-dean.df[which(dean.df$Placement=="Not Placed"),]
View(notplaced)
TASK 3h : Draw two histograms side-by-side, visually comparing the MBA performance of Placed and Not Placed students.
par(mfrow=c(1,2))
with(placed,hist(placed$Percent_MBA, xlab = "MBA Percentage", ylab = "Count", main = "Placed Students", breaks=3, col="violetred3"))
with(notplaced,hist(notplaced$Percent_MBA, xlab = "MBA Percentage", ylab = "Count", main = "Not Placed Students", breaks=3, col="turquoise3"))
TASK 3i : Use R to draw two boxplots, one below the other, comparing the distribution of salaries of males and females who were placed.
boxplot(placed$Salary~placed$Gender, horizontal=TRUE, xlab="Salaries", ylab="Gender", main="Comaprison of Salaries of Males and Females")
TASK 3j : Create a dataframe called placedET, representing students who were placed after the MBA and who also gave some MBA entrance test before admission into the MBA program.
placedET<-placed[which(placed$Entrance_Test!="None"),]
View(placedET)
TASK 3k : Draw a Scatter Plot Matrix for 3 variables – {Salary, Percent_MBA, Percentile_ET} using the dataframe placedET.
library(car)
scatterplotMatrix(formula=~Salary+Percent_MBA+Percentile_ET, data=placedET)