1. Pendahuluan

Laporan ini bertujuan untuk melakukan analisis data karyawan berdasarkan dataset employee_raw_dataset.csv. Analisis dilakukan menggunakan bahasa pemrograman R dan disusun dalam format R Markdown.

Tahapan analisis meliputi:

  1. Data Cleaning & Preparation
  2. Data Transformation & Feature Engineering
  3. Exploratory Data Analysis (EDA)
  4. Analisis dan Insight
  5. Kesimpulan Hasil Analisis

Dataset berisi informasi terkait:

2. Import Library dan Dataset

# Library utama
library(tidyverse)
library(janitor)
library(skimr)
library(scales)
library(corrplot)
library(knitr)
library(kableExtra)
# Import dataset
employee <- read.csv("employee_raw_dataset - employee_raw_dataset.csv")

# Melihat struktur data
str(employee)
## 'data.frame':    510 obs. of  13 variables:
##  $ Employee_ID      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Gender           : chr  "female" "Female" "MALE" "Female" ...
##  $ Department       : chr  "Operations" "Hr" "Risk" "IT" ...
##  $ Education_Level  : chr  "Master" "Bachelor" "Bachelor" "Bachelor" ...
##  $ Job_Level        : chr  "Junior" "Officer" "Junior" "Manager" ...
##  $ Age              : int  15 70 32 27 16 41 29 27 35 38 ...
##  $ Years_Experience : int  7 12 18 2 14 9 8 7 1 12 ...
##  $ Training_Hours   : num  25.6 10.2 33.2 13.8 104.1 ...
##  $ Monthly_Salary   : int  10702825 15735781 -5000000 21446446 10685352 4095998 6268244 12047713 16111528 9389963 ...
##  $ Performance_Score: num  99.5 76.6 86 120 73.2 83.1 79.2 87.1 71.4 58.8 ...
##  $ Attendance_Rate  : num  98 85.5 91.6 85.7 130 ...
##  $ Project_Completed: int  6 2 5 3 2 5 5 3 4 2 ...
##  $ Promotion_Last_3Y: int  1 0 0 0 1 1 0 1 0 0 ...
##    Employee_ID Gender Department Education_Level      Job_Level Age
## 1            1 female Operations          Master         Junior  15
## 2            2 Female         Hr        Bachelor        Officer  70
## 3            3   MALE       Risk        Bachelor         Junior  32
## 4            4 Female         IT        Bachelor        Manager  27
## 5            5 Female    Finance        Bachelor         Junior  16
## 6            6 Female Operations             PhD Senior Manager  41
## 7            7   MALE         Hr        Bachelor        Officer  29
## 8            8   MALE         IT        Bachelor        Manager  27
## 9            9   MALE         Hr          Master     Supervisor  35
## 10          10 Female Compliance        Bachelor     Supervisor  38
##    Years_Experience Training_Hours Monthly_Salary Performance_Score
## 1                 7           25.6       10702825              99.5
## 2                12           10.2       15735781              76.6
## 3                18           33.2       -5000000              86.0
## 4                 2           13.8       21446446             120.0
## 5                14          104.1       10685352              73.2
## 6                 9           16.0        4095998              83.1
## 7                 8           16.7        6268244              79.2
## 8                 7           22.3       12047713              87.1
## 9                 1            4.8       16111528              71.4
## 10               12           32.8        9389963              58.8
##    Attendance_Rate Project_Completed Promotion_Last_3Y
## 1             98.0                 6                 1
## 2             85.5                 2                 0
## 3             91.6                 5                 0
## 4             85.7                 3                 0
## 5            130.0                 2                 1
## 6             90.9                 5                 1
## 7             92.8                 5                 0
## 8             95.4                 3                 1
## 9            106.8                 4                 0
## 10            94.3                 2                 0

3. Data Understanding

3.1 Dimensi Dataset

## [1] 510  13

3.2 Nama Variabel

##  [1] "Employee_ID"       "Gender"            "Department"       
##  [4] "Education_Level"   "Job_Level"         "Age"              
##  [7] "Years_Experience"  "Training_Hours"    "Monthly_Salary"   
## [10] "Performance_Score" "Attendance_Rate"   "Project_Completed"
## [13] "Promotion_Last_3Y"

3.3 Statistik Deskriptif

Data summary
Name employee
Number of rows 510
Number of columns 13
_______________________
Column type frequency:
character 4
numeric 9
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Gender 0 1 1 7 0 8 0
Department 0 1 0 10 15 14 0
Education_Level 0 1 1 15 0 8 0
Job_Level 0 1 6 14 0 5 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Employee_ID 0 1.00 245.65 147.12 1.00e+00 118.25 245.5 372.75 500.0 ▇▇▇▇▇
Age 0 1.00 33.73 8.87 1.10e+01 28.00 33.0 40.00 70.0 ▂▇▇▁▁
Years_Experience 0 1.00 8.05 4.94 -6.00e+00 5.00 8.0 11.00 22.0 ▁▅▇▅▁
Training_Hours 35 0.93 20.91 15.83 2.00e-01 9.70 17.1 27.90 104.1 ▇▃▁▁▁
Monthly_Salary 25 0.95 12492230.33 5094031.94 -5.00e+06 9554922.00 12061459.0 15254192.00 44648274.0 ▁▇▃▁▁
Performance_Score 0 1.00 77.74 11.97 3.96e+01 70.90 77.3 84.50 120.0 ▁▃▇▂▁
Attendance_Rate 0 1.00 92.94 5.61 7.78e+01 89.53 92.9 96.00 130.0 ▂▇▁▁▁
Project_Completed 0 1.00 5.07 2.30 0.00e+00 4.00 5.0 6.00 13.0 ▂▇▅▂▁
Promotion_Last_3Y 0 1.00 0.24 0.43 0.00e+00 0.00 0.0 0.00 1.0 ▇▁▁▁▂

4. Data Cleaning & Preparation

Pada tahap ini dilakukan pembersihan data untuk memastikan kualitas data sebelum dianalisis.

4.1 Mengecek Missing Values

##       Employee_ID            Gender        Department   Education_Level 
##                 0                 0                 0                 0 
##         Job_Level               Age  Years_Experience    Training_Hours 
##                 0                 0                 0                35 
##    Monthly_Salary Performance_Score   Attendance_Rate Project_Completed 
##                25                 0                 0                 0 
## Promotion_Last_3Y 
##                 0

4.2 Membersihkan Nama Kolom

##  [1] "employee_id"       "gender"            "department"       
##  [4] "education_level"   "job_level"         "age"              
##  [7] "years_experience"  "training_hours"    "monthly_salary"   
## [10] "performance_score" "attendance_rate"   "project_completed"
## [13] "promotion_last_3y"

4.3 Standardisasi Data Gender

Dataset memiliki penulisan gender yang tidak konsisten seperti:

  • female
  • Female
  • MALE
  • male

Maka dilakukan standardisasi.

## [1] "Female"  "Male"    "unknown" "-"

4.4 Standardisasi Department

##  [1] "Operations" "Hr"         "Risk"       "It"         "Finance"   
##  [6] "Compliance" "Marketing"  "Xdept"      ""           "Unknown"   
## [11] "999"        "???"

4.5 Menangani Nilai Tidak Wajar

Beberapa variabel memiliki nilai yang tidak realistis, seperti:

  • Age kurang dari 18 atau terlalu tinggi
  • Monthly Salary negatif
  • Attendance Rate di atas 100
  • Performance Score di atas 100
##   employee_id       gender           department        education_level   
##  Min.   :  1.0   Length:510         Length:510         Length:510        
##  1st Qu.:118.2   Class :character   Class :character   Class :character  
##  Median :245.5   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :245.6                                                           
##  3rd Qu.:372.8                                                           
##  Max.   :500.0                                                           
##                                                                          
##   job_level              age        years_experience training_hours  
##  Length:510         Min.   :11.00   Min.   :-6.000   Min.   :  0.20  
##  Class :character   1st Qu.:28.00   1st Qu.: 5.000   1st Qu.:  9.70  
##  Mode  :character   Median :33.00   Median : 8.000   Median : 17.10  
##                     Mean   :33.73   Mean   : 8.049   Mean   : 20.91  
##                     3rd Qu.:40.00   3rd Qu.:11.000   3rd Qu.: 27.90  
##                     Max.   :70.00   Max.   :22.000   Max.   :104.10  
##                                                      NA's   :35      
##  monthly_salary     performance_score attendance_rate  project_completed
##  Min.   :-5000000   Min.   : 39.60    Min.   : 77.80   Min.   : 0.000   
##  1st Qu.: 9554922   1st Qu.: 70.90    1st Qu.: 89.53   1st Qu.: 4.000   
##  Median :12061459   Median : 77.30    Median : 92.90   Median : 5.000   
##  Mean   :12492230   Mean   : 77.74    Mean   : 92.94   Mean   : 5.073   
##  3rd Qu.:15254192   3rd Qu.: 84.50    3rd Qu.: 96.00   3rd Qu.: 6.000   
##  Max.   :44648274   Max.   :120.00    Max.   :130.00   Max.   :13.000   
##  NA's   :25                                                             
##  promotion_last_3y
##  Min.   :0.0000   
##  1st Qu.:0.0000   
##  Median :0.0000   
##  Mean   :0.2392   
##  3rd Qu.:0.0000   
##  Max.   :1.0000   
## 

4.6 Membersihkan Data Anomali

4.7 Dataset Setelah Cleaning

## [1] 423  13
##   employee_id       gender           department        education_level   
##  Min.   :  6.0   Length:423         Length:423         Length:423        
##  1st Qu.:134.5   Class :character   Class :character   Class :character  
##  Median :256.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :253.6                                                           
##  3rd Qu.:371.5                                                           
##  Max.   :500.0                                                           
##                                                                          
##   job_level              age        years_experience training_hours  
##  Length:423         Min.   :18.00   Min.   :-6.000   Min.   : 0.200  
##  Class :character   1st Qu.:29.00   1st Qu.: 5.000   1st Qu.: 9.425  
##  Mode  :character   Median :33.00   Median : 8.000   Median :16.850  
##                     Mean   :34.07   Mean   : 8.026   Mean   :20.235  
##                     3rd Qu.:40.00   3rd Qu.:11.000   3rd Qu.:27.325  
##                     Max.   :58.00   Max.   :22.000   Max.   :98.200  
##                                                      NA's   :33      
##  monthly_salary     performance_score attendance_rate  project_completed
##  Min.   : 1403602   Min.   :39.60     Min.   : 77.80   Min.   : 0.000   
##  1st Qu.: 9562876   1st Qu.:70.55     1st Qu.: 89.50   1st Qu.: 4.000   
##  Median :12057961   Median :77.20     Median : 92.70   Median : 5.000   
##  Mean   :12593885   Mean   :76.82     Mean   : 92.29   Mean   : 5.099   
##  3rd Qu.:15215972   3rd Qu.:83.80     3rd Qu.: 95.70   3rd Qu.: 6.000   
##  Max.   :44648274   Max.   :99.20     Max.   :100.00   Max.   :12.000   
##                                                                         
##  promotion_last_3y
##  Min.   :0.0000   
##  1st Qu.:0.0000   
##  Median :0.0000   
##  Mean   :0.2222   
##  3rd Qu.:0.0000   
##  Max.   :1.0000   
## 

5. Data Transformation & Feature Engineering

Tahap ini bertujuan untuk membuat fitur baru yang dapat membantu analisis.

5.1 Membuat Kategori Umur

## [1] "Senior" "Adult"  "Adult"  "Adult"  "Adult"  "Adult"

5.2 Membuat Kategori Salary

5.3 Membuat Status Performance

5.4 Konversi Variabel Menjadi Factor

##  [1] "employee_id"        "gender"             "department"        
##  [4] "education_level"    "job_level"          "age"               
##  [7] "years_experience"   "training_hours"     "monthly_salary"    
## [10] "performance_score"  "attendance_rate"    "project_completed" 
## [13] "promotion_last_3y"  "age_group"          "salary_category"   
## [16] "performance_status"

6. Exploratory Data Analysis (EDA)

6.1 Distribusi Gender

6.2 Distribusi Department

6.3 Distribusi Umur

6.4 Distribusi Salary

6.5 Salary Berdasarkan Job Level

6.6 Performance Berdasarkan Department

6.7 Korelasi Variabel Numerik

7. Analisis & Insight

7.1 Analisis Gender

##    gender   n
## 1       -   2
## 2  Female 255
## 3    Male 164
## 4 unknown   2

Insight:

  • Distribusi gender dapat menunjukkan keseimbangan tenaga kerja.
  • Jika salah satu gender mendominasi, perusahaan dapat mempertimbangkan strategi diversity hiring.

7.2 Analisis Salary

## # A tibble: 5 × 2
##   job_level      avg_salary
##   <fct>               <dbl>
## 1 Manager         14284178.
## 2 Supervisor      12901059.
## 3 Junior          12711203.
## 4 Officer         11709883.
## 5 Senior Manager  10865866.

Insight:

  • Job level yang lebih tinggi cenderung memiliki salary lebih besar.
  • Salary dapat digunakan sebagai indikator senioritas dan tanggung jawab.

7.3 Analisis Performance

## # A tibble: 12 × 2
##    department   avg_performance
##    <fct>                  <dbl>
##  1 "999"                   83.1
##  2 "It"                    79.2
##  3 "Risk"                  78.3
##  4 "???"                   78.2
##  5 "Finance"               78.1
##  6 "Compliance"            75.9
##  7 "Marketing"             75.9
##  8 "Unknown"               75.8
##  9 "Hr"                    75.4
## 10 "Operations"            74.8
## 11 ""                      74.0
## 12 "Xdept"                 70.5

Insight:

  • Department dengan performa tinggi dapat menjadi benchmark bagi department lain.
  • Department dengan performa rendah mungkin membutuhkan evaluasi sistem kerja atau pelatihan tambahan.

7.4 Analisis Promotion

Error sebelumnya terjadi karena kolom promotion_last_3_y tidak ditemukan pada dataset. Oleh karena itu, analisis dibuat lebih fleksibel dengan mengecek nama kolom terlebih dahulu.

##  [1] "employee_id"        "gender"             "department"        
##  [4] "education_level"    "job_level"          "age"               
##  [7] "years_experience"   "training_hours"     "monthly_salary"    
## [10] "performance_score"  "attendance_rate"    "project_completed" 
## [13] "promotion_last_3y"  "age_group"          "salary_category"   
## [16] "performance_status"
## Kolom promotion tidak ditemukan pada dataset.

Insight:

  • Analisis promosi digunakan untuk melihat hubungan antara promosi, performa, dan salary.
  • Jika kolom promosi tersedia, maka dapat diketahui apakah karyawan yang dipromosikan memiliki performa lebih tinggi.
  • Kode dibuat fleksibel agar tidak error meskipun nama kolom berbeda.

7.5 Analisis Training Hours

Insight:

  • Grafik digunakan untuk melihat apakah pelatihan berdampak pada performa.
  • Jika tren meningkat, maka program pelatihan dapat dianggap efektif.

8. Hasil Analisis

Berdasarkan proses analisis data yang telah dilakukan, diperoleh beberapa hasil utama:

  1. Dataset memiliki beberapa data tidak konsisten dan nilai anomali.
  2. Proses data cleaning berhasil meningkatkan kualitas data.
  3. Job level memiliki hubungan kuat dengan salary.
  4. Department tertentu memiliki performa lebih tinggi dibanding department lain.
  5. Training hours menunjukkan hubungan positif terhadap performance score.
  6. Karyawan dengan performa tinggi cenderung lebih sering mendapatkan promosi.

9. Kesimpulan

Analisis data karyawan menggunakan R Programming berhasil dilakukan melalui beberapa tahapan penting mulai dari data cleaning, transformation, exploratory analysis, hingga insight generation.

Hasil analisis menunjukkan bahwa:

Laporan ini dapat digunakan sebagai dasar pengambilan keputusan HR dan pengembangan strategi manajemen karyawan.

10. Rekomendasi

Beberapa rekomendasi berdasarkan hasil analisis:

  1. Meningkatkan program pelatihan untuk meningkatkan performa.
  2. Melakukan monitoring terhadap department dengan performa rendah.
  3. Menstandarisasi input data agar tidak muncul nilai anomali.
  4. Mengembangkan sistem promosi berbasis performa.

11. Export Dataset Bersih

Dataset hasil cleaning akan tersimpan dengan nama:

employee_cleaned.csv