0.1 Overview

In this document, we will try to explore data of manpower time record from a Pipeline Project. This data is recorded based on manpower check in and check out time and location. We will explore and analyze the data using base and statistical plotting in R to answer the following business questions :

Business Question 1. How many personnel are involved in the project and what is the daily average personnel? 2. Which company has the largest personnel involved in this project? 3. Who is the best personnel in terms of never check in late to the project?

0.2 Data Preparation

Import all required library

library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(dplyr, warn.conflicts =  FALSE)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(glue)
library(scales)
library(tidygeocoder)
library(leaflet)

Read the data

data <- read.csv("data_input/site_personnel_daily_absensi_new.csv")
head(data, 10)
##        project_name date_record end_of_week       personnel_name
## 1  PIPELINE PROJECT   6/28/2024   6/28/2024         ARI AGUSTIAN
## 2  PIPELINE PROJECT   6/28/2024   6/28/2024          OMA LESMANA
## 3  PIPELINE PROJECT   6/28/2024   6/28/2024 YULIAN MAULANA FAJRI
## 4  PIPELINE PROJECT   6/28/2024   6/28/2024            NENG SASA
## 5  PIPELINE PROJECT   6/27/2024   6/28/2024     RAMA HAMZA PUTRA
## 6  PIPELINE PROJECT   6/27/2024   6/28/2024        ADEN SOMANTRI
## 7  PIPELINE PROJECT   6/27/2024   6/28/2024        SAEPUL ROHMAN
## 8  PIPELINE PROJECT   6/27/2024   6/28/2024         ARI AGUSTIAN
## 9  PIPELINE PROJECT   6/27/2024   6/28/2024   ARIF AKBAR MUTAQIN
## 10 PIPELINE PROJECT   6/27/2024   6/28/2024          OMA LESMANA
##    day_shift_or_night_shift location_lat location_long gate_location
## 1                 DAY SHIFT    -6.748511      106.6681        Gate 1
## 2               NIGHT SHIFT    -6.750850      106.6657        Gate 2
## 3                 DAY SHIFT    -6.748511      106.6681        Gate 1
## 4                 DAY SHIFT    -6.748511      106.6681        Gate 1
## 5                 DAY SHIFT    -6.748511      106.6681        Gate 1
## 6               NIGHT SHIFT    -6.750850      106.6657        Gate 2
## 7                 DAY SHIFT    -6.748511      106.6681        Gate 1
## 8                 DAY SHIFT    -6.748511      106.6681        Gate 1
## 9                 DAY SHIFT    -6.748511      106.6681        Gate 1
## 10                DAY SHIFT    -6.748511      106.6681        Gate 1
##                clock_in            clock_out effective_working_hours total_time
## 1  6/28/2024 8:09:31 AM 6/28/2024 5:45:00 PM                      10         10
## 2  6/28/2024 6:23:10 PM 6/29/2024 6:45:00 AM                      12         12
## 3  6/28/2024 7:59:55 AM 6/28/2024 5:45:00 PM                      10         10
## 4  6/28/2024 8:44:08 AM 6/28/2024 5:45:00 PM                       9          9
## 5  6/27/2024 8:04:41 AM 6/27/2024 5:45:00 PM                      10         10
## 6  6/27/2024 5:46:45 PM 6/28/2024 6:45:00 AM                      12         13
## 7  6/27/2024 8:08:48 AM 6/27/2024 5:45:00 PM                      10         10
## 8  6/27/2024 8:18:01 AM 6/27/2024 5:45:00 PM                       9          9
## 9  6/27/2024 7:54:11 AM 6/27/2024 5:45:00 PM                      10         10
## 10 6/27/2024 7:51:56 AM 6/27/2024 5:45:00 PM                      10         10
##    direct_or_indirect company         group       id user_id
## 1      DIRECT WORKERS   PT. A SUBCONTRACTOR 40732940     120
## 2      DIRECT WORKERS   PT. B SUBCONTRACTOR 40748588     476
## 3      DIRECT WORKERS   PT. C SUBCONTRACTOR 40772795    1077
## 4      DIRECT WORKERS   PT. B SUBCONTRACTOR 40777347       6
## 5    INDIRECT WORKERS   PT. A SUBCONTRACTOR 40714488    1078
## 6    INDIRECT WORKERS   PT. B SUBCONTRACTOR 40720293    1081
## 7      DIRECT WORKERS   PT. A SUBCONTRACTOR 40726864      28
## 8      DIRECT WORKERS   PT. A SUBCONTRACTOR 40733111     120
## 9      DIRECT WORKERS   PT. C SUBCONTRACTOR 40747599    1083
## 10     DIRECT WORKERS   PT. B SUBCONTRACTOR 40748487     476

0.3 Data Wrangling

0.3.1 Change data type

Let’s change the data type of the following columns 1. Column date_record, end_of_week to date -> using lubridate mdy() 2. gate_location and company to factor

data$date_record <- mdy(data$date_record)
data$end_of_week <- mdy(data$end_of_week)
data$clock_in <-  mdy_hms(data$clock_in)
data$clock_out <-  mdy_hms(data$clock_out)
data$gate_location <- as.factor(data$gate_location)
data$company <- as.factor(data$company)

str(data)
## 'data.frame':    66685 obs. of  17 variables:
##  $ project_name            : chr  "PIPELINE PROJECT" "PIPELINE PROJECT" "PIPELINE PROJECT" "PIPELINE PROJECT" ...
##  $ date_record             : Date, format: "2024-06-28" "2024-06-28" ...
##  $ end_of_week             : Date, format: "2024-06-28" "2024-06-28" ...
##  $ personnel_name          : chr  "ARI AGUSTIAN" "OMA LESMANA" "YULIAN MAULANA FAJRI" "NENG SASA" ...
##  $ day_shift_or_night_shift: chr  "DAY SHIFT" "NIGHT SHIFT" "DAY SHIFT" "DAY SHIFT" ...
##  $ location_lat            : num  -6.75 -6.75 -6.75 -6.75 -6.75 ...
##  $ location_long           : num  107 107 107 107 107 ...
##  $ gate_location           : Factor w/ 2 levels "Gate 1","Gate 2": 1 2 1 1 1 2 1 1 1 1 ...
##  $ clock_in                : POSIXct, format: "2024-06-28 08:09:31" "2024-06-28 18:23:10" ...
##  $ clock_out               : POSIXct, format: "2024-06-28 17:45:00" "2024-06-29 06:45:00" ...
##  $ effective_working_hours : int  10 12 10 9 10 12 10 9 10 10 ...
##  $ total_time              : int  10 12 10 9 10 13 10 9 10 10 ...
##  $ direct_or_indirect      : chr  "DIRECT WORKERS" "DIRECT WORKERS" "DIRECT WORKERS" "DIRECT WORKERS" ...
##  $ company                 : Factor w/ 18 levels "PT. A","PT. B",..: 1 2 3 2 1 2 1 1 3 2 ...
##  $ group                   : chr  "SUBCONTRACTOR" "SUBCONTRACTOR" "SUBCONTRACTOR" "SUBCONTRACTOR" ...
##  $ id                      : int  40732940 40748588 40772795 40777347 40714488 40720293 40726864 40733111 40747599 40748487 ...
##  $ user_id                 : int  120 476 1077 6 1078 1081 28 120 1083 476 ...

0.3.2 Feature Engineering

Add column clockin_hour

data$clockin_hour <- hour(data$clock_in)
str(data)
## 'data.frame':    66685 obs. of  18 variables:
##  $ project_name            : chr  "PIPELINE PROJECT" "PIPELINE PROJECT" "PIPELINE PROJECT" "PIPELINE PROJECT" ...
##  $ date_record             : Date, format: "2024-06-28" "2024-06-28" ...
##  $ end_of_week             : Date, format: "2024-06-28" "2024-06-28" ...
##  $ personnel_name          : chr  "ARI AGUSTIAN" "OMA LESMANA" "YULIAN MAULANA FAJRI" "NENG SASA" ...
##  $ day_shift_or_night_shift: chr  "DAY SHIFT" "NIGHT SHIFT" "DAY SHIFT" "DAY SHIFT" ...
##  $ location_lat            : num  -6.75 -6.75 -6.75 -6.75 -6.75 ...
##  $ location_long           : num  107 107 107 107 107 ...
##  $ gate_location           : Factor w/ 2 levels "Gate 1","Gate 2": 1 2 1 1 1 2 1 1 1 1 ...
##  $ clock_in                : POSIXct, format: "2024-06-28 08:09:31" "2024-06-28 18:23:10" ...
##  $ clock_out               : POSIXct, format: "2024-06-28 17:45:00" "2024-06-29 06:45:00" ...
##  $ effective_working_hours : int  10 12 10 9 10 12 10 9 10 10 ...
##  $ total_time              : int  10 12 10 9 10 13 10 9 10 10 ...
##  $ direct_or_indirect      : chr  "DIRECT WORKERS" "DIRECT WORKERS" "DIRECT WORKERS" "DIRECT WORKERS" ...
##  $ company                 : Factor w/ 18 levels "PT. A","PT. B",..: 1 2 3 2 1 2 1 1 3 2 ...
##  $ group                   : chr  "SUBCONTRACTOR" "SUBCONTRACTOR" "SUBCONTRACTOR" "SUBCONTRACTOR" ...
##  $ id                      : int  40732940 40748588 40772795 40777347 40714488 40720293 40726864 40733111 40747599 40748487 ...
##  $ user_id                 : int  120 476 1077 6 1078 1081 28 120 1083 476 ...
##  $ clockin_hour            : int  8 18 7 8 8 17 8 8 7 7 ...

make function checkin_category to conver check in hours to be categorical as below - 5am to 7am –> not late (day shift) - 7am to 9am –> late (day shift) - 9am to 5pm –> very late (day shift) - 5pm to 7pm –> not late (night shift) - 7pm to 9pm –> late (night shift) - 9pm to 5am –> very late (night shift)

checkin_category <- function(x) {
  if (x <= 7) 
    {x <- "5am to 7am"}
  else if (x > 7 & x <= 9)
  {x <- "7am to 9am"}
  else if (x > 9 & x <= 17)
  {x <- "9am to 5pm"}
  else if (x > 17 & x <= 19)
  {x <- "5pm to 7pm"}
  else if (x > 19 & x <= 21)
    {x <- "7pm to 9pm"}
  else 
    {x <- "9pm to 5am"}
}

Create column clockin_cat

data$clockin_cat <- sapply(X = data$clockin_hour, 
                            FUN = checkin_category)
head(data)
##       project_name date_record end_of_week       personnel_name
## 1 PIPELINE PROJECT  2024-06-28  2024-06-28         ARI AGUSTIAN
## 2 PIPELINE PROJECT  2024-06-28  2024-06-28          OMA LESMANA
## 3 PIPELINE PROJECT  2024-06-28  2024-06-28 YULIAN MAULANA FAJRI
## 4 PIPELINE PROJECT  2024-06-28  2024-06-28            NENG SASA
## 5 PIPELINE PROJECT  2024-06-27  2024-06-28     RAMA HAMZA PUTRA
## 6 PIPELINE PROJECT  2024-06-27  2024-06-28        ADEN SOMANTRI
##   day_shift_or_night_shift location_lat location_long gate_location
## 1                DAY SHIFT    -6.748511      106.6681        Gate 1
## 2              NIGHT SHIFT    -6.750850      106.6657        Gate 2
## 3                DAY SHIFT    -6.748511      106.6681        Gate 1
## 4                DAY SHIFT    -6.748511      106.6681        Gate 1
## 5                DAY SHIFT    -6.748511      106.6681        Gate 1
## 6              NIGHT SHIFT    -6.750850      106.6657        Gate 2
##              clock_in           clock_out effective_working_hours total_time
## 1 2024-06-28 08:09:31 2024-06-28 17:45:00                      10         10
## 2 2024-06-28 18:23:10 2024-06-29 06:45:00                      12         12
## 3 2024-06-28 07:59:55 2024-06-28 17:45:00                      10         10
## 4 2024-06-28 08:44:08 2024-06-28 17:45:00                       9          9
## 5 2024-06-27 08:04:41 2024-06-27 17:45:00                      10         10
## 6 2024-06-27 17:46:45 2024-06-28 06:45:00                      12         13
##   direct_or_indirect company         group       id user_id clockin_hour
## 1     DIRECT WORKERS   PT. A SUBCONTRACTOR 40732940     120            8
## 2     DIRECT WORKERS   PT. B SUBCONTRACTOR 40748588     476           18
## 3     DIRECT WORKERS   PT. C SUBCONTRACTOR 40772795    1077            7
## 4     DIRECT WORKERS   PT. B SUBCONTRACTOR 40777347       6            8
## 5   INDIRECT WORKERS   PT. A SUBCONTRACTOR 40714488    1078            8
## 6   INDIRECT WORKERS   PT. B SUBCONTRACTOR 40720293    1081           17
##   clockin_cat
## 1  7am to 9am
## 2  5pm to 7pm
## 3  5am to 7am
## 4  7am to 9am
## 5  7am to 9am
## 6  9am to 5pm

0.4 Data Visualization

0.4.1 Visualization #1 : How many personnel are involved in the project so far and what is the weekly personnel in the year 2024?

To answer how many personnel are involved in the project, we use length() and unique()

total_personnel_involved <- length(unique(data$user_id))
total_personnel_involved
## [1] 846

In total there are 846 personnel involved in the project so far.

To answer what is the weekly number of personnel, we use aggregate with FUN = length

avg_daily_pers <- aggregate(user_id ~ end_of_week,
                            data = data,
                            FUN = length)
#using subsetting to filter only the year 2024

avg_daily_pers <- avg_daily_pers[year(avg_daily_pers$end_of_week) == 2024, ]
avg_daily_pers
##     end_of_week user_id
## 113  2024-01-05      32
## 114  2024-01-12      59
## 115  2024-01-19      49
## 116  2024-01-26      39
## 117  2024-02-02      39
## 118  2024-02-09      35
## 119  2024-02-16      23
## 120  2024-02-23      36
## 121  2024-03-01      31
## 122  2024-03-08      22
## 123  2024-03-15      17
## 124  2024-03-22      30
## 125  2024-03-29      24
## 126  2024-04-05      46
## 127  2024-04-12      25
## 128  2024-04-19      54
## 129  2024-04-26      44
## 130  2024-05-03      37
## 131  2024-05-10      45
## 132  2024-05-17      40
## 133  2024-05-24      43
## 134  2024-05-31      46
## 135  2024-06-07      40
## 136  2024-06-14      44
## 137  2024-06-21      25
## 138  2024-06-28      40

Then we will create a barplot to see the distribution of weekly manpower to see when is the date of peak manpower loading.

barplot(avg_daily_pers$user_id, names.arg = avg_daily_pers$end_of_week, 
        main = "Daily Manpower Count",
        xlab = "Date",
        ylab = "Unique Manpower Count",
        col = "lightblue",
        las = 2)

0.4.2 Visualization #2 : when is the most crowded time of clock in and is there any personnel that checkin late?

To answer this, first we will create histogram of personnel checkin time to see the most crowded time of clock in is between 6am to 7am where there is also people checking in after 9am

hist(data$clockin_hour, breaks = 24, xlim = c(0, 24))
axis(1, at = seq(0, 24, by = 1))

0.5 Conclusion

There are 864 personnel work in that project since the beginning until now, while the peak manpower loading in this year is on 12 January 2024. Analysing the checkin time, looks like most of the personnel works day shift and the most crowded checkin time is between 5am to 6am, but there are also still some people checking it late (after 9am for day shift).

Using base R and plot function can also help to make data analysis however to make it more interactive, it is better to user ggplot and plotly, also using dplyr will make it easier.