Email: go_moreno@hotmail.com

mobile: +57 317-746-8325

LinkedIn Profile: www.linkedin.com/in/gonzalomoreno25

Productivity Coach Analysis

The goal of this project is to understand the best practices to coach to the Managers from Crossover company about strategies to improve productivity.

Extraction , Reshaping and Cleaning of data

First the five data frames are created, cleaned and reshaped. Then the week column is added for the data frame of each week

knitr::opts_chunk$set(echo = T)


setwd("D:/personal/data science/prueba Cross/")

# Open and Reshape the Data of the first four sheets

DF.Total<-NULL

for (i in 1:4){
 
  DF<-read.xlsx("Data Set Cross Over.xlsx", sheetIndex = i, header=T, colIndex = c(1:7))
  colnames(DF)<-gsub("\\.","",colnames(DF))
  ## reshape the Data
  DF.W<- reshape(DF, 
       varying = colnames(DF)[3:7], 
       v.names = "time",
       timevar = "subject", 
       times = colnames(DF)[3:7], 
       direction = "long",
       new.row.names = NULL
       )
 
  DF.W$week<-paste0("Week.",i) 
 
 
  DF.Total<-rbind(DF.Total,DF.W)
  
  
}

DF.Total<-select(DF.Total, -one_of("id"))
DF.Total<-DF.Total[which(DF.Total$subject!="Total"),]
DF.Total$new.var<-paste0(DF.Total$subject,".",DF.Total$week)


## Now metrics data frame is created and reshaped

metrics<-read.xlsx("Data Set Cross Over.xlsx", sheetName = "Metric Performance", header=T, rowIndex=c(1:5),colIndex = c(1:4))
colnames(metrics)[1]<-"subject"


## reshape metrics
metrics$subject<-gsub("\\s+","",metrics$subject)
metrics2<- reshape(metrics, 
  varying = c("Week.1","Week.2","Week.3"), 
  v.names = "metrics",
  timevar = "week", 
  times = c("Week.1","Week.2","Week.3"), 
  ids = c("Dan", "Charles", "Anna", "Nate"),
  direction = "long",
  new.row.names = NULL)


## Data is cleaned
metrics2$new.var<-paste0(metrics2$subject,".",metrics2$week)
metrics2<-select(metrics2,-one_of("id"))

Questions

Q1. What kind of time distribution (% of time in each category) correlates with high metrics for this team?

First at all, I plot the correlation of categories with metrics. Then, I filter categories with high correlation with metrics and a significant mean of time.

## DF.Total and metrics2 are merged
DF<-merge(DF.Total,metrics2,by.x="new.var", by.y="new.var", all.x=T)
DF<-DF[,c(2:6,9)]

category.table<-melt(DF, id=c(2,3), measure=c(4))
category.DF<-data.frame(cast(category.table, formula = subject.x ~ Category, fun.aggregate=sum))


subject<-category.DF$subject.x
column.names<-colnames(category.DF)

## Function time.percentage is created to transform time spent in each category.
time.percentage<- function(x){
return(round(x*100/sum(x),2))
}

## The data is transformed as a percentage of the Total Time dedicated to every category

category.DF<-t(apply(category.DF[,2:ncol(category.DF)],1, function(x) time.percentage(x)))
category.DF<-data.frame(subject=subject,category.DF)


metrics$metrics<-apply(metrics[,2:4],1,mean)
category.DF<-merge(category.DF,metrics,by.x="subject", by.y="subject")

Now that category is shown as a percentage of time for each manager, the correlation plot helps to find which categories have a higher correlation with the metric of performance of the team.

corrplot.mixed(cor(category.DF[,-c(1,16:18)]), sig.level=0.01)

Although there are a lot of correlations among categories, the first aproach to Q1 is in last row of the correlation plot. The categories which seem to contribute to a higher correlation with metrics of the team are: Chat(100%), Graphic Design(82%), Human Resources(91%), Support(87%) and Virtualization (96%). However the mean time of each category will help to filter the most relevant.

## The mean of the first three weeks is created for each manager
category.manager<-data.frame(cast(category.table, formula = subject.x ~ Category, fun.aggregate=mean))
t(category.manager)
##                     1            2            3            4           
## subject.x           "Anna"       "Charles"    "Dan"        "Nate"      
## Chat                " 72.2"      "118.6"      "211.4"      "155.4"     
## Development         "19.3"       "20.1"       " 0.0"       " 5.1"      
## Distraction         " 0.0"       " 0.8"       " 0.2"       "10.5"      
## Email               "222.7143"   "115.1429"   "216.0000"   "327.8571"  
## Graphic.Design      "0"          "0"          "1"          "0"         
## Human.Resources     "56.48"      "58.36"      "72.32"      "66.04"     
## Learning...Training " 0.3333333" "13.6666667" " 0.0000000" " 0.0000000"
## Manual.Time         "  0.00"     "144.00"     " 57.75"     " 12.00"    
## Meeting             "137.70588"  " 78.94118"  " 84.76471"  "105.29412" 
## Office              "29.02899"   "30.36232"   "26.20290"   "26.50725"  
## Other               "0.7159533"  "0.7237354"  "0.8015564"  "2.0583658" 
## Sales...Marketing   " 9.50"      " 0.50"      " 1.00"      "23.25"     
## Support             "0.0"        "0.5"        "1.0"        "0.0"       
## Virtualization      " 3.25"      "12.25"      "30.25"      "23.25"

When the category.manager table and correlation plot are checked, the relevant categories correlated to higher metrics and significant amount of time are: Chat, Human.Resources and Virtualization.

Q2.a) What time distribution is ideal for this team? How do you know that?

It is important to dismiss or decrease time in categories with negative or insignificant influence on metrics.

In the correlation plot is shown categories with negative correlation with metrics which must be avoided whenever is possible. For example, Development (-70%) and meeting (-87%). Finally:

  1. Categories highly significants: Chat, Human Resources and Virtualization.
  2. Categories to delegate or decrease time spent: Distraction, email, learning, manual time, Graphic Design and Support.
  3. Categories to avoid: Development and meeting.

Q2.b) How can the manager coach her lowest performer (Anna) to make her the highest performer next week?

The idea is to help to Anna to understand how Dan does it, because he has the outstanding performance of all the team.

ggplot(data=DF,aes(Category, time, fill=subject.x))+geom_bar(stat="identity")+facet_grid(subject.x~.)+theme(legend.position="none")+ theme(axis.text.x=element_text(size=7))+ggtitle("Time spent by Category")

Chat category of Anna is the lowest of all the managers. The same happens with the Anna´s Virtualization category. Anna should focus more in the Human.Resources category.

Anna is the manager who spent more time in meeting of all the managers. The recommendation is decrease it.

Q2.c)If this manager had to push this team to increase productivity by 50% (i.e increase metric by 50%) from Week 3, what should she do? What coaching will you give her?

I recommend she starts doubling the effort in Chat category because is highly and positively correlated to metrics and I think that chat category explains most of the success of Dan which spends almost three times more time in Chat category.