Standard HR Quality Control Analytics Report

Overview

This report presents a comprehensive quality control assessment of the harmonized payroll data comprising the Contract, Personnel, and Establishment modules according to the standard harmonization dictionary (govhr::harmonization_dict()). All diagnostic tables are generated automatically using the compute_qualitycontrol() function, which evaluates the aforementioned harmonized modules.

The quality control report is organized into four main sections:

  • Data Basics
    Foundational diagnostics including module dimensions, variable structure and dictionary conformity, primary key integrity, cross-module orphan checks, salary logic validation, and date logic checks. This section provides essential structural and consistency assessments before diving into detailed data quality patterns.

  • Missingness Report
    Comprehensive measurement and visualization of missing data patterns overall and across key analytical dimensions (occupation, ISCO mapping, reference period, and establishment). Identifies variables requiring cleaning, imputation, or further validation.

  • Volatility Analysis
    Time-series diagnostics evaluating temporal stability of salary aggregates, contract counts, and work hours across reference periods. High volatility may indicate data quality issues, organizational changes, or policy shifts requiring investigation.

Diagnostic Sections

Data Basics

This section provides a high-level profile of the harmonized payroll data and summarizes the structure of the three core modules—Contract, Personnel, and Establishment—before detailed diagnostics are introduced.

Module Dimensions

The harmonized dataset includes the following components:

Module Dimensions Summary

Module Observations Variables
Contract 8,885 23
Personnel 8,672 9
Establishment 47 9

Temporal and Cross-Sectional Coverage

This section visualizes data coverage across time periods, highlighting potential spikes or gaps in observations that may indicate data quality issues or structural changes.

Variable Structure & Dictionary Conformity

This section compares the variables present in each harmonized module against the standard harmonization dictionary. It identifies:

  • Missing variables: Expected by the dictionary but absent from the data
  • Extra variables: Present in the data but not in the dictionary

Structure Diagnostics by Module

Module Missing Variables1 Extra Variables2
Contract personnel_id worker_id
Establishment est_name_en est_name_english, ref_date, est_type
Personnel birth_date, country_code est_id, est_name_native, birth_day
1 Red badges indicate variables expected by the dictionary but missing from the data
2 Orange badges indicate variables present in the data but not in the dictionary

Primary Key Integrity

This section validates that the Contract module’s primary keys (contract_id, personnel_id, ref_date) uniquely identify each record without duplication.

Primary Key Uniqueness Check

Check Status
Primary Keys Unique ✓ PASS
Number of Duplicate Groups 0

Cross-Module Orphan Checks

This section identifies orphaned records—IDs referenced in the Contract module that don’t exist in the Personnel or Establishment modules.

Orphan Records Summary

Check Missing Count Status
Personnel IDs in Contract missing from Personnel module 0 ✓ PASS
Establishment IDs in Contract missing from Establishment module 0 ✓ PASS

Missingness Report

This section examines patterns of missing data across the Contract, Personnel, and Establishment modules.
Understanding missingness is essential for data quality control because it highlights variables that may require cleaning, imputation, or further validation by the data provider.

We evaluate:

  1. Variable-level missingness (percent missing in each variable)
  2. Module-level summaries
  3. Visualizations
    • Missingness barplots
    • Missingness heatmaps

Volatility Analysis

This section evaluates temporal stability across key HR metrics—salary aggregates, wagebill, staff counts, and contract counts. Volatility is measured using rolling coefficients of variation (CV) or percent changes to identify establishments with unstable patterns over time.

High volatility may indicate:

  • Data quality issues (inconsistent reporting, missing periods)
  • Legitimate organizational changes (restructuring, budget changes)
  • Seasonal patterns or policy changes

The heatmaps below show volatility metrics for the top 50 contracts with the highest average salary volatility, allowing identification of patterns that warrant further investigation.

Contract Volatility in Salary

Contract Count Volatility

Contract Level Volatility in Hours Worked