The goal of this project is to understand the best practices to coach to the Managers from Crossover company about strategies to improve productivity.
First the five data frames are created, cleaned and reshaped. Then the week column is added for the data frame of each week
knitr::opts_chunk$set(echo = T)
setwd("D:/personal/data science/prueba Cross/")
# Open and Reshape the Data of the first four sheets
DF.Total<-NULL
for (i in 1:4){
DF<-read.xlsx("Data Set Cross Over.xlsx", sheetIndex = i, header=T, colIndex = c(1:7))
colnames(DF)<-gsub("\\.","",colnames(DF))
## reshape the Data
DF.W<- reshape(DF,
varying = colnames(DF)[3:7],
v.names = "time",
timevar = "subject",
times = colnames(DF)[3:7],
direction = "long",
new.row.names = NULL
)
DF.W$week<-paste0("Week.",i)
DF.Total<-rbind(DF.Total,DF.W)
}
DF.Total<-select(DF.Total, -one_of("id"))
DF.Total<-DF.Total[which(DF.Total$subject!="Total"),]
DF.Total$new.var<-paste0(DF.Total$subject,".",DF.Total$week)
## Now metrics data frame is created and reshaped
metrics<-read.xlsx("Data Set Cross Over.xlsx", sheetName = "Metric Performance", header=T, rowIndex=c(1:5),colIndex = c(1:4))
colnames(metrics)[1]<-"subject"
## reshape metrics
metrics$subject<-gsub("\\s+","",metrics$subject)
metrics2<- reshape(metrics,
varying = c("Week.1","Week.2","Week.3"),
v.names = "metrics",
timevar = "week",
times = c("Week.1","Week.2","Week.3"),
ids = c("Dan", "Charles", "Anna", "Nate"),
direction = "long",
new.row.names = NULL)
## Data is cleaned
metrics2$new.var<-paste0(metrics2$subject,".",metrics2$week)
metrics2<-select(metrics2,-one_of("id"))
First at all, I plot the correlation of categories with metrics. Then, I filter categories with high correlation with metrics and a significant mean of time.
## DF.Total and metrics2 are merged
DF<-merge(DF.Total,metrics2,by.x="new.var", by.y="new.var", all.x=T)
DF<-DF[,c(2:6,9)]
category.table<-melt(DF, id=c(2,3), measure=c(4))
category.DF<-data.frame(cast(category.table, formula = subject.x ~ Category, fun.aggregate=sum))
subject<-category.DF$subject.x
column.names<-colnames(category.DF)
## Function time.percentage is created to transform time spent in each category.
time.percentage<- function(x){
return(round(x*100/sum(x),2))
}
## The data is transformed as a percentage of the Total Time dedicated to every category
category.DF<-t(apply(category.DF[,2:ncol(category.DF)],1, function(x) time.percentage(x)))
category.DF<-data.frame(subject=subject,category.DF)
metrics$metrics<-apply(metrics[,2:4],1,mean)
category.DF<-merge(category.DF,metrics,by.x="subject", by.y="subject")
Now that category is shown as a percentage of time for each manager, the correlation plot helps to find which categories have a higher correlation with the metric of performance of the team.
corrplot.mixed(cor(category.DF[,-c(1,16:18)]), sig.level=0.01)
Although there are a lot of correlations among categories, the first aproach to Q1 is in last row of the correlation plot. The categories which seem to contribute to a higher correlation with metrics of the team are: Chat(100%), Graphic Design(82%), Human Resources(91%), Support(87%) and Virtualization (96%). However the mean time of each category will help to filter the most relevant.
## The mean of the first three weeks is created for each manager
category.manager<-data.frame(cast(category.table, formula = subject.x ~ Category, fun.aggregate=mean))
t(category.manager)
## 1 2 3 4
## subject.x "Anna" "Charles" "Dan" "Nate"
## Chat " 72.2" "118.6" "211.4" "155.4"
## Development "19.3" "20.1" " 0.0" " 5.1"
## Distraction " 0.0" " 0.8" " 0.2" "10.5"
## Email "222.7143" "115.1429" "216.0000" "327.8571"
## Graphic.Design "0" "0" "1" "0"
## Human.Resources "56.48" "58.36" "72.32" "66.04"
## Learning...Training " 0.3333333" "13.6666667" " 0.0000000" " 0.0000000"
## Manual.Time " 0.00" "144.00" " 57.75" " 12.00"
## Meeting "137.70588" " 78.94118" " 84.76471" "105.29412"
## Office "29.02899" "30.36232" "26.20290" "26.50725"
## Other "0.7159533" "0.7237354" "0.8015564" "2.0583658"
## Sales...Marketing " 9.50" " 0.50" " 1.00" "23.25"
## Support "0.0" "0.5" "1.0" "0.0"
## Virtualization " 3.25" "12.25" "30.25" "23.25"
When the category.manager table and correlation plot are checked, the relevant categories correlated to higher metrics and significant amount of time are: Chat, Human.Resources and Virtualization.
It is important to dismiss or decrease time in categories with negative or insignificant influence on metrics.
In the correlation plot is shown categories with negative correlation with metrics which must be avoided whenever is possible. For example, Development (-70%) and meeting (-87%). Finally:
The idea is to help to Anna to understand how Dan does it, because he has the outstanding performance of all the team.
ggplot(data=DF,aes(Category, time, fill=subject.x))+geom_bar(stat="identity")+facet_grid(subject.x~.)+theme(legend.position="none")+ theme(axis.text.x=element_text(size=7))+ggtitle("Time spent by Category")
Chat category of Anna is the lowest of all the managers. The same happens with the Anna´s Virtualization category. Anna should focus more in the Human.Resources category.
Anna is the manager who spent more time in meeting of all the managers. The recommendation is decrease it.
I recommend she starts doubling the effort in Chat category because is highly and positively correlated to metrics and I think that chat category explains most of the success of Dan which spends almost three times more time in Chat category.