In this document, we will try to explore data of manpower time record from a Pipeline Project. This data is recorded based on manpower check in and check out time and location. We will explore and analyze the data using base and statistical plotting in R to answer the following business questions :
Business Question 1. How many personnel are involved in the project and what is the daily average personnel? 2. Which company has the largest personnel involved in this project? 3. Who is the best personnel in terms of never check in late to the project?
Import all required library
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
Read the data
## project_name date_record end_of_week personnel_name
## 1 PIPELINE PROJECT 6/28/2024 6/28/2024 ARI AGUSTIAN
## 2 PIPELINE PROJECT 6/28/2024 6/28/2024 OMA LESMANA
## 3 PIPELINE PROJECT 6/28/2024 6/28/2024 YULIAN MAULANA FAJRI
## 4 PIPELINE PROJECT 6/28/2024 6/28/2024 NENG SASA
## 5 PIPELINE PROJECT 6/27/2024 6/28/2024 RAMA HAMZA PUTRA
## 6 PIPELINE PROJECT 6/27/2024 6/28/2024 ADEN SOMANTRI
## 7 PIPELINE PROJECT 6/27/2024 6/28/2024 SAEPUL ROHMAN
## 8 PIPELINE PROJECT 6/27/2024 6/28/2024 ARI AGUSTIAN
## 9 PIPELINE PROJECT 6/27/2024 6/28/2024 ARIF AKBAR MUTAQIN
## 10 PIPELINE PROJECT 6/27/2024 6/28/2024 OMA LESMANA
## day_shift_or_night_shift location_lat location_long gate_location
## 1 DAY SHIFT -6.748511 106.6681 Gate 1
## 2 NIGHT SHIFT -6.750850 106.6657 Gate 2
## 3 DAY SHIFT -6.748511 106.6681 Gate 1
## 4 DAY SHIFT -6.748511 106.6681 Gate 1
## 5 DAY SHIFT -6.748511 106.6681 Gate 1
## 6 NIGHT SHIFT -6.750850 106.6657 Gate 2
## 7 DAY SHIFT -6.748511 106.6681 Gate 1
## 8 DAY SHIFT -6.748511 106.6681 Gate 1
## 9 DAY SHIFT -6.748511 106.6681 Gate 1
## 10 DAY SHIFT -6.748511 106.6681 Gate 1
## clock_in clock_out effective_working_hours total_time
## 1 6/28/2024 8:09:31 AM 6/28/2024 5:45:00 PM 10 10
## 2 6/28/2024 6:23:10 PM 6/29/2024 6:45:00 AM 12 12
## 3 6/28/2024 7:59:55 AM 6/28/2024 5:45:00 PM 10 10
## 4 6/28/2024 8:44:08 AM 6/28/2024 5:45:00 PM 9 9
## 5 6/27/2024 8:04:41 AM 6/27/2024 5:45:00 PM 10 10
## 6 6/27/2024 5:46:45 PM 6/28/2024 6:45:00 AM 12 13
## 7 6/27/2024 8:08:48 AM 6/27/2024 5:45:00 PM 10 10
## 8 6/27/2024 8:18:01 AM 6/27/2024 5:45:00 PM 9 9
## 9 6/27/2024 7:54:11 AM 6/27/2024 5:45:00 PM 10 10
## 10 6/27/2024 7:51:56 AM 6/27/2024 5:45:00 PM 10 10
## direct_or_indirect company group id user_id
## 1 DIRECT WORKERS PT. A SUBCONTRACTOR 40732940 120
## 2 DIRECT WORKERS PT. B SUBCONTRACTOR 40748588 476
## 3 DIRECT WORKERS PT. C SUBCONTRACTOR 40772795 1077
## 4 DIRECT WORKERS PT. B SUBCONTRACTOR 40777347 6
## 5 INDIRECT WORKERS PT. A SUBCONTRACTOR 40714488 1078
## 6 INDIRECT WORKERS PT. B SUBCONTRACTOR 40720293 1081
## 7 DIRECT WORKERS PT. A SUBCONTRACTOR 40726864 28
## 8 DIRECT WORKERS PT. A SUBCONTRACTOR 40733111 120
## 9 DIRECT WORKERS PT. C SUBCONTRACTOR 40747599 1083
## 10 DIRECT WORKERS PT. B SUBCONTRACTOR 40748487 476
Let’s change the data type of the following columns 1. Column date_record, end_of_week to date -> using lubridate mdy() 2. gate_location and company to factor
data$date_record <- mdy(data$date_record)
data$end_of_week <- mdy(data$end_of_week)
data$clock_in <- mdy_hms(data$clock_in)
data$clock_out <- mdy_hms(data$clock_out)
data$gate_location <- as.factor(data$gate_location)
data$company <- as.factor(data$company)
str(data)## 'data.frame': 66685 obs. of 17 variables:
## $ project_name : chr "PIPELINE PROJECT" "PIPELINE PROJECT" "PIPELINE PROJECT" "PIPELINE PROJECT" ...
## $ date_record : Date, format: "2024-06-28" "2024-06-28" ...
## $ end_of_week : Date, format: "2024-06-28" "2024-06-28" ...
## $ personnel_name : chr "ARI AGUSTIAN" "OMA LESMANA" "YULIAN MAULANA FAJRI" "NENG SASA" ...
## $ day_shift_or_night_shift: chr "DAY SHIFT" "NIGHT SHIFT" "DAY SHIFT" "DAY SHIFT" ...
## $ location_lat : num -6.75 -6.75 -6.75 -6.75 -6.75 ...
## $ location_long : num 107 107 107 107 107 ...
## $ gate_location : Factor w/ 2 levels "Gate 1","Gate 2": 1 2 1 1 1 2 1 1 1 1 ...
## $ clock_in : POSIXct, format: "2024-06-28 08:09:31" "2024-06-28 18:23:10" ...
## $ clock_out : POSIXct, format: "2024-06-28 17:45:00" "2024-06-29 06:45:00" ...
## $ effective_working_hours : int 10 12 10 9 10 12 10 9 10 10 ...
## $ total_time : int 10 12 10 9 10 13 10 9 10 10 ...
## $ direct_or_indirect : chr "DIRECT WORKERS" "DIRECT WORKERS" "DIRECT WORKERS" "DIRECT WORKERS" ...
## $ company : Factor w/ 18 levels "PT. A","PT. B",..: 1 2 3 2 1 2 1 1 3 2 ...
## $ group : chr "SUBCONTRACTOR" "SUBCONTRACTOR" "SUBCONTRACTOR" "SUBCONTRACTOR" ...
## $ id : int 40732940 40748588 40772795 40777347 40714488 40720293 40726864 40733111 40747599 40748487 ...
## $ user_id : int 120 476 1077 6 1078 1081 28 120 1083 476 ...
Add column clockin_hour
## 'data.frame': 66685 obs. of 18 variables:
## $ project_name : chr "PIPELINE PROJECT" "PIPELINE PROJECT" "PIPELINE PROJECT" "PIPELINE PROJECT" ...
## $ date_record : Date, format: "2024-06-28" "2024-06-28" ...
## $ end_of_week : Date, format: "2024-06-28" "2024-06-28" ...
## $ personnel_name : chr "ARI AGUSTIAN" "OMA LESMANA" "YULIAN MAULANA FAJRI" "NENG SASA" ...
## $ day_shift_or_night_shift: chr "DAY SHIFT" "NIGHT SHIFT" "DAY SHIFT" "DAY SHIFT" ...
## $ location_lat : num -6.75 -6.75 -6.75 -6.75 -6.75 ...
## $ location_long : num 107 107 107 107 107 ...
## $ gate_location : Factor w/ 2 levels "Gate 1","Gate 2": 1 2 1 1 1 2 1 1 1 1 ...
## $ clock_in : POSIXct, format: "2024-06-28 08:09:31" "2024-06-28 18:23:10" ...
## $ clock_out : POSIXct, format: "2024-06-28 17:45:00" "2024-06-29 06:45:00" ...
## $ effective_working_hours : int 10 12 10 9 10 12 10 9 10 10 ...
## $ total_time : int 10 12 10 9 10 13 10 9 10 10 ...
## $ direct_or_indirect : chr "DIRECT WORKERS" "DIRECT WORKERS" "DIRECT WORKERS" "DIRECT WORKERS" ...
## $ company : Factor w/ 18 levels "PT. A","PT. B",..: 1 2 3 2 1 2 1 1 3 2 ...
## $ group : chr "SUBCONTRACTOR" "SUBCONTRACTOR" "SUBCONTRACTOR" "SUBCONTRACTOR" ...
## $ id : int 40732940 40748588 40772795 40777347 40714488 40720293 40726864 40733111 40747599 40748487 ...
## $ user_id : int 120 476 1077 6 1078 1081 28 120 1083 476 ...
## $ clockin_hour : int 8 18 7 8 8 17 8 8 7 7 ...
make function checkin_category to conver check in hours
to be categorical as below - 5am to 7am –> not late (day shift) - 7am
to 9am –> late (day shift) - 9am to 5pm –> very late (day shift) -
5pm to 7pm –> not late (night shift) - 7pm to 9pm –> late (night
shift) - 9pm to 5am –> very late (night shift)
checkin_category <- function(x) {
if (x <= 7)
{x <- "5am to 7am"}
else if (x > 7 & x <= 9)
{x <- "7am to 9am"}
else if (x > 9 & x <= 17)
{x <- "9am to 5pm"}
else if (x > 17 & x <= 19)
{x <- "5pm to 7pm"}
else if (x > 19 & x <= 21)
{x <- "7pm to 9pm"}
else
{x <- "9pm to 5am"}
}Create column clockin_cat
## project_name date_record end_of_week personnel_name
## 1 PIPELINE PROJECT 2024-06-28 2024-06-28 ARI AGUSTIAN
## 2 PIPELINE PROJECT 2024-06-28 2024-06-28 OMA LESMANA
## 3 PIPELINE PROJECT 2024-06-28 2024-06-28 YULIAN MAULANA FAJRI
## 4 PIPELINE PROJECT 2024-06-28 2024-06-28 NENG SASA
## 5 PIPELINE PROJECT 2024-06-27 2024-06-28 RAMA HAMZA PUTRA
## 6 PIPELINE PROJECT 2024-06-27 2024-06-28 ADEN SOMANTRI
## day_shift_or_night_shift location_lat location_long gate_location
## 1 DAY SHIFT -6.748511 106.6681 Gate 1
## 2 NIGHT SHIFT -6.750850 106.6657 Gate 2
## 3 DAY SHIFT -6.748511 106.6681 Gate 1
## 4 DAY SHIFT -6.748511 106.6681 Gate 1
## 5 DAY SHIFT -6.748511 106.6681 Gate 1
## 6 NIGHT SHIFT -6.750850 106.6657 Gate 2
## clock_in clock_out effective_working_hours total_time
## 1 2024-06-28 08:09:31 2024-06-28 17:45:00 10 10
## 2 2024-06-28 18:23:10 2024-06-29 06:45:00 12 12
## 3 2024-06-28 07:59:55 2024-06-28 17:45:00 10 10
## 4 2024-06-28 08:44:08 2024-06-28 17:45:00 9 9
## 5 2024-06-27 08:04:41 2024-06-27 17:45:00 10 10
## 6 2024-06-27 17:46:45 2024-06-28 06:45:00 12 13
## direct_or_indirect company group id user_id clockin_hour
## 1 DIRECT WORKERS PT. A SUBCONTRACTOR 40732940 120 8
## 2 DIRECT WORKERS PT. B SUBCONTRACTOR 40748588 476 18
## 3 DIRECT WORKERS PT. C SUBCONTRACTOR 40772795 1077 7
## 4 DIRECT WORKERS PT. B SUBCONTRACTOR 40777347 6 8
## 5 INDIRECT WORKERS PT. A SUBCONTRACTOR 40714488 1078 8
## 6 INDIRECT WORKERS PT. B SUBCONTRACTOR 40720293 1081 17
## clockin_cat
## 1 7am to 9am
## 2 5pm to 7pm
## 3 5am to 7am
## 4 7am to 9am
## 5 7am to 9am
## 6 9am to 5pm
To answer how many personnel are involved in the project, we use length() and unique()
## [1] 846
In total there are 846 personnel involved in the project so far.
To answer what is the weekly number of personnel, we use aggregate with FUN = length
avg_daily_pers <- aggregate(user_id ~ end_of_week,
data = data,
FUN = length)
#using subsetting to filter only the year 2024
avg_daily_pers <- avg_daily_pers[year(avg_daily_pers$end_of_week) == 2024, ]
avg_daily_pers## end_of_week user_id
## 113 2024-01-05 32
## 114 2024-01-12 59
## 115 2024-01-19 49
## 116 2024-01-26 39
## 117 2024-02-02 39
## 118 2024-02-09 35
## 119 2024-02-16 23
## 120 2024-02-23 36
## 121 2024-03-01 31
## 122 2024-03-08 22
## 123 2024-03-15 17
## 124 2024-03-22 30
## 125 2024-03-29 24
## 126 2024-04-05 46
## 127 2024-04-12 25
## 128 2024-04-19 54
## 129 2024-04-26 44
## 130 2024-05-03 37
## 131 2024-05-10 45
## 132 2024-05-17 40
## 133 2024-05-24 43
## 134 2024-05-31 46
## 135 2024-06-07 40
## 136 2024-06-14 44
## 137 2024-06-21 25
## 138 2024-06-28 40
Then we will create a barplot to see the distribution of weekly manpower to see when is the date of peak manpower loading.
barplot(avg_daily_pers$user_id, names.arg = avg_daily_pers$end_of_week,
main = "Daily Manpower Count",
xlab = "Date",
ylab = "Unique Manpower Count",
col = "lightblue",
las = 2)To answer this, first we will create histogram of personnel checkin time to see the most crowded time of clock in is between 6am to 7am where there is also people checking in after 9am
There are 864 personnel work in that project since the beginning until now, while the peak manpower loading in this year is on 12 January 2024. Analysing the checkin time, looks like most of the personnel works day shift and the most crowded checkin time is between 5am to 6am, but there are also still some people checking it late (after 9am for day shift).
Using base R and plot function can also help to make data analysis however to make it more interactive, it is better to user ggplot and plotly, also using dplyr will make it easier.