Laporan ini bertujuan untuk melakukan analisis data karyawan
berdasarkan dataset employee_raw_dataset.csv. Analisis
dilakukan menggunakan bahasa pemrograman R dan disusun dalam format R
Markdown.
Tahapan analisis meliputi:
Dataset berisi informasi terkait:
# Library utama
library(tidyverse)
library(janitor)
library(skimr)
library(scales)
library(corrplot)
library(knitr)
library(kableExtra)
# Import dataset
employee <- read.csv("employee_raw_dataset - employee_raw_dataset.csv")
# Melihat struktur data
str(employee)
## 'data.frame': 510 obs. of 13 variables:
## $ Employee_ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Gender : chr "female" "Female" "MALE" "Female" ...
## $ Department : chr "Operations" "Hr" "Risk" "IT" ...
## $ Education_Level : chr "Master" "Bachelor" "Bachelor" "Bachelor" ...
## $ Job_Level : chr "Junior" "Officer" "Junior" "Manager" ...
## $ Age : int 15 70 32 27 16 41 29 27 35 38 ...
## $ Years_Experience : int 7 12 18 2 14 9 8 7 1 12 ...
## $ Training_Hours : num 25.6 10.2 33.2 13.8 104.1 ...
## $ Monthly_Salary : int 10702825 15735781 -5000000 21446446 10685352 4095998 6268244 12047713 16111528 9389963 ...
## $ Performance_Score: num 99.5 76.6 86 120 73.2 83.1 79.2 87.1 71.4 58.8 ...
## $ Attendance_Rate : num 98 85.5 91.6 85.7 130 ...
## $ Project_Completed: int 6 2 5 3 2 5 5 3 4 2 ...
## $ Promotion_Last_3Y: int 1 0 0 0 1 1 0 1 0 0 ...
## Employee_ID Gender Department Education_Level Job_Level Age
## 1 1 female Operations Master Junior 15
## 2 2 Female Hr Bachelor Officer 70
## 3 3 MALE Risk Bachelor Junior 32
## 4 4 Female IT Bachelor Manager 27
## 5 5 Female Finance Bachelor Junior 16
## 6 6 Female Operations PhD Senior Manager 41
## 7 7 MALE Hr Bachelor Officer 29
## 8 8 MALE IT Bachelor Manager 27
## 9 9 MALE Hr Master Supervisor 35
## 10 10 Female Compliance Bachelor Supervisor 38
## Years_Experience Training_Hours Monthly_Salary Performance_Score
## 1 7 25.6 10702825 99.5
## 2 12 10.2 15735781 76.6
## 3 18 33.2 -5000000 86.0
## 4 2 13.8 21446446 120.0
## 5 14 104.1 10685352 73.2
## 6 9 16.0 4095998 83.1
## 7 8 16.7 6268244 79.2
## 8 7 22.3 12047713 87.1
## 9 1 4.8 16111528 71.4
## 10 12 32.8 9389963 58.8
## Attendance_Rate Project_Completed Promotion_Last_3Y
## 1 98.0 6 1
## 2 85.5 2 0
## 3 91.6 5 0
## 4 85.7 3 0
## 5 130.0 2 1
## 6 90.9 5 1
## 7 92.8 5 0
## 8 95.4 3 1
## 9 106.8 4 0
## 10 94.3 2 0
## [1] 510 13
## [1] "Employee_ID" "Gender" "Department"
## [4] "Education_Level" "Job_Level" "Age"
## [7] "Years_Experience" "Training_Hours" "Monthly_Salary"
## [10] "Performance_Score" "Attendance_Rate" "Project_Completed"
## [13] "Promotion_Last_3Y"
| Name | employee |
| Number of rows | 510 |
| Number of columns | 13 |
| _______________________ | |
| Column type frequency: | |
| character | 4 |
| numeric | 9 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Gender | 0 | 1 | 1 | 7 | 0 | 8 | 0 |
| Department | 0 | 1 | 0 | 10 | 15 | 14 | 0 |
| Education_Level | 0 | 1 | 1 | 15 | 0 | 8 | 0 |
| Job_Level | 0 | 1 | 6 | 14 | 0 | 5 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Employee_ID | 0 | 1.00 | 245.65 | 147.12 | 1.00e+00 | 118.25 | 245.5 | 372.75 | 500.0 | ▇▇▇▇▇ |
| Age | 0 | 1.00 | 33.73 | 8.87 | 1.10e+01 | 28.00 | 33.0 | 40.00 | 70.0 | ▂▇▇▁▁ |
| Years_Experience | 0 | 1.00 | 8.05 | 4.94 | -6.00e+00 | 5.00 | 8.0 | 11.00 | 22.0 | ▁▅▇▅▁ |
| Training_Hours | 35 | 0.93 | 20.91 | 15.83 | 2.00e-01 | 9.70 | 17.1 | 27.90 | 104.1 | ▇▃▁▁▁ |
| Monthly_Salary | 25 | 0.95 | 12492230.33 | 5094031.94 | -5.00e+06 | 9554922.00 | 12061459.0 | 15254192.00 | 44648274.0 | ▁▇▃▁▁ |
| Performance_Score | 0 | 1.00 | 77.74 | 11.97 | 3.96e+01 | 70.90 | 77.3 | 84.50 | 120.0 | ▁▃▇▂▁ |
| Attendance_Rate | 0 | 1.00 | 92.94 | 5.61 | 7.78e+01 | 89.53 | 92.9 | 96.00 | 130.0 | ▂▇▁▁▁ |
| Project_Completed | 0 | 1.00 | 5.07 | 2.30 | 0.00e+00 | 4.00 | 5.0 | 6.00 | 13.0 | ▂▇▅▂▁ |
| Promotion_Last_3Y | 0 | 1.00 | 0.24 | 0.43 | 0.00e+00 | 0.00 | 0.0 | 0.00 | 1.0 | ▇▁▁▁▂ |
Pada tahap ini dilakukan pembersihan data untuk memastikan kualitas data sebelum dianalisis.
## Employee_ID Gender Department Education_Level
## 0 0 0 0
## Job_Level Age Years_Experience Training_Hours
## 0 0 0 35
## Monthly_Salary Performance_Score Attendance_Rate Project_Completed
## 25 0 0 0
## Promotion_Last_3Y
## 0
## [1] "employee_id" "gender" "department"
## [4] "education_level" "job_level" "age"
## [7] "years_experience" "training_hours" "monthly_salary"
## [10] "performance_score" "attendance_rate" "project_completed"
## [13] "promotion_last_3y"
Dataset memiliki penulisan gender yang tidak konsisten seperti:
Maka dilakukan standardisasi.
## [1] "Female" "Male" "unknown" "-"
## [1] "Operations" "Hr" "Risk" "It" "Finance"
## [6] "Compliance" "Marketing" "Xdept" "" "Unknown"
## [11] "999" "???"
Beberapa variabel memiliki nilai yang tidak realistis, seperti:
## employee_id gender department education_level
## Min. : 1.0 Length:510 Length:510 Length:510
## 1st Qu.:118.2 Class :character Class :character Class :character
## Median :245.5 Mode :character Mode :character Mode :character
## Mean :245.6
## 3rd Qu.:372.8
## Max. :500.0
##
## job_level age years_experience training_hours
## Length:510 Min. :11.00 Min. :-6.000 Min. : 0.20
## Class :character 1st Qu.:28.00 1st Qu.: 5.000 1st Qu.: 9.70
## Mode :character Median :33.00 Median : 8.000 Median : 17.10
## Mean :33.73 Mean : 8.049 Mean : 20.91
## 3rd Qu.:40.00 3rd Qu.:11.000 3rd Qu.: 27.90
## Max. :70.00 Max. :22.000 Max. :104.10
## NA's :35
## monthly_salary performance_score attendance_rate project_completed
## Min. :-5000000 Min. : 39.60 Min. : 77.80 Min. : 0.000
## 1st Qu.: 9554922 1st Qu.: 70.90 1st Qu.: 89.53 1st Qu.: 4.000
## Median :12061459 Median : 77.30 Median : 92.90 Median : 5.000
## Mean :12492230 Mean : 77.74 Mean : 92.94 Mean : 5.073
## 3rd Qu.:15254192 3rd Qu.: 84.50 3rd Qu.: 96.00 3rd Qu.: 6.000
## Max. :44648274 Max. :120.00 Max. :130.00 Max. :13.000
## NA's :25
## promotion_last_3y
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.2392
## 3rd Qu.:0.0000
## Max. :1.0000
##
## [1] 423 13
## employee_id gender department education_level
## Min. : 6.0 Length:423 Length:423 Length:423
## 1st Qu.:134.5 Class :character Class :character Class :character
## Median :256.0 Mode :character Mode :character Mode :character
## Mean :253.6
## 3rd Qu.:371.5
## Max. :500.0
##
## job_level age years_experience training_hours
## Length:423 Min. :18.00 Min. :-6.000 Min. : 0.200
## Class :character 1st Qu.:29.00 1st Qu.: 5.000 1st Qu.: 9.425
## Mode :character Median :33.00 Median : 8.000 Median :16.850
## Mean :34.07 Mean : 8.026 Mean :20.235
## 3rd Qu.:40.00 3rd Qu.:11.000 3rd Qu.:27.325
## Max. :58.00 Max. :22.000 Max. :98.200
## NA's :33
## monthly_salary performance_score attendance_rate project_completed
## Min. : 1403602 Min. :39.60 Min. : 77.80 Min. : 0.000
## 1st Qu.: 9562876 1st Qu.:70.55 1st Qu.: 89.50 1st Qu.: 4.000
## Median :12057961 Median :77.20 Median : 92.70 Median : 5.000
## Mean :12593885 Mean :76.82 Mean : 92.29 Mean : 5.099
## 3rd Qu.:15215972 3rd Qu.:83.80 3rd Qu.: 95.70 3rd Qu.: 6.000
## Max. :44648274 Max. :99.20 Max. :100.00 Max. :12.000
##
## promotion_last_3y
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.2222
## 3rd Qu.:0.0000
## Max. :1.0000
##
Tahap ini bertujuan untuk membuat fitur baru yang dapat membantu analisis.
## [1] "Senior" "Adult" "Adult" "Adult" "Adult" "Adult"
## [1] "employee_id" "gender" "department"
## [4] "education_level" "job_level" "age"
## [7] "years_experience" "training_hours" "monthly_salary"
## [10] "performance_score" "attendance_rate" "project_completed"
## [13] "promotion_last_3y" "age_group" "salary_category"
## [16] "performance_status"
## gender n
## 1 - 2
## 2 Female 255
## 3 Male 164
## 4 unknown 2
Insight:
## # A tibble: 5 × 2
## job_level avg_salary
## <fct> <dbl>
## 1 Manager 14284178.
## 2 Supervisor 12901059.
## 3 Junior 12711203.
## 4 Officer 11709883.
## 5 Senior Manager 10865866.
Insight:
## # A tibble: 12 × 2
## department avg_performance
## <fct> <dbl>
## 1 "999" 83.1
## 2 "It" 79.2
## 3 "Risk" 78.3
## 4 "???" 78.2
## 5 "Finance" 78.1
## 6 "Compliance" 75.9
## 7 "Marketing" 75.9
## 8 "Unknown" 75.8
## 9 "Hr" 75.4
## 10 "Operations" 74.8
## 11 "" 74.0
## 12 "Xdept" 70.5
Insight:
Error sebelumnya terjadi karena kolom promotion_last_3_y
tidak ditemukan pada dataset. Oleh karena itu, analisis dibuat lebih
fleksibel dengan mengecek nama kolom terlebih dahulu.
## [1] "employee_id" "gender" "department"
## [4] "education_level" "job_level" "age"
## [7] "years_experience" "training_hours" "monthly_salary"
## [10] "performance_score" "attendance_rate" "project_completed"
## [13] "promotion_last_3y" "age_group" "salary_category"
## [16] "performance_status"
## Kolom promotion tidak ditemukan pada dataset.
Insight:
Insight:
Berdasarkan proses analisis data yang telah dilakukan, diperoleh beberapa hasil utama:
Analisis data karyawan menggunakan R Programming berhasil dilakukan melalui beberapa tahapan penting mulai dari data cleaning, transformation, exploratory analysis, hingga insight generation.
Hasil analisis menunjukkan bahwa:
Laporan ini dapat digunakan sebagai dasar pengambilan keputusan HR dan pengembangan strategi manajemen karyawan.
Beberapa rekomendasi berdasarkan hasil analisis:
Dataset hasil cleaning akan tersimpan dengan nama:
employee_cleaned.csv