Introduction

Encryption is critical in safeguarding confidential data in an era where patient confidentiality and data protection are of utmost importance. Healthcare organizations must use strong encryption techniques to stop unauthorized usage when trying to stay compliant with such regulations as HIPAA and GDPR

With a focus on AES encryption techniques, this tutorial will be focusing on encrypting patient data using R and RStudio.

To understand better their role in data protection and to determine the best method of safeguarding patient data, we shall also examine the distinctions between anonymization, hashing, and encryption.

Why Protect Patient Data?

Encryption is required due to the following reasons

Having only authorized individuals in custody of patient data is data confidentiality. Data Integrity: Avoids unauthorized data change. Regulatory Compliance: Adheres to legal requirements to protect patient data (e.g., GDPR, HIPAA).

Anonymization vs. Hashing vs. Encryption: Which is Best?

Feature Anonymization Hashing Encryption
Purpose Irreversibly remove identifiable data to protect privacy Convert data into a fixed-length string for integrity and security Securely transform data so only authorized parties can recover it
Reversibility Irreversible (cannot retrieve original data) Irreversible (one-way function) Reversible (can decrypt using the correct key)
Use Cases Compliance with privacy laws (GDPR, HIPAA), removing PII from datasets Password storage, data integrity verification Secure communication, protecting sensitive data (e.g., credit card info, medical records)
Security Level High (if done correctly) Medium (prone to brute-force attacks without salting) Very High (depends on encryption algorithm and key strength)
Examples Generalization (e.g., replacing birth dates with age ranges), Masking (e.g., replacing names with random identifiers) MD5, SHA-256, SHA-512 AES, DES, Blowfish, RSA

Best Method

Encryption is the best choice for error-free security and compliance since it ensures confidentiality and integrity and supports reversible data protection.

Anonymization is the best choice for irreversible data privacy when patient identities must be totally eliminated.

Hashing can be helpful in ensuring data integrity but no complete protection against unauthorized use.

Conclusion: The best privacy method of safeguarding patient data while allowing approved use as needed is encryption.

Requirements

Before you begin, ensure that you have the following installed:

Install the required packages:

# Set a CRAN mirror
options(repos = c(CRAN = "https://cran.rstudio.com/"))
install.packages(c("digest", "openssl"))
## Installing packages into 'C:/Users/mario/AppData/Local/R/win-library/4.3'
## (as 'lib' is unspecified)
## package 'digest' successfully unpacked and MD5 sums checked
## Warning: cannot remove prior installation of package 'digest'
## Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
## C:\Users\mario\AppData\Local\R\win-library\4.3\00LOCK\digest\libs\x64\digest.dll
## to C:\Users\mario\AppData\Local\R\win-library\4.3\digest\libs\x64\digest.dll:
## Permission denied
## Warning: restored 'digest'
## package 'openssl' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\mario\AppData\Local\Temp\Rtmp8OhKCW\downloaded_packages

Loading and Inspecting Patient Data

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.3
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'dplyr' was built under R version 4.3.3
## Warning: package 'forcats' was built under R version 4.3.3
## Warning: package 'lubridate' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(digest)
## Warning: package 'digest' was built under R version 4.3.3
library(openssl)
## Warning: package 'openssl' was built under R version 4.3.3
## Linking to: OpenSSL 3.1.4 24 Oct 2023
## 
## Attaching package: 'openssl'
## 
## The following object is masked from 'package:digest':
## 
##     sha1
library(DT)
## Warning: package 'DT' was built under R version 4.3.3
# Load sample data
patient_data <- tibble(
  PatientID = 1:10,
  PatientGender = c("Male", "Female", "Female", "Male", "Male", "Female", "Male", "Female", "Male", "Female"),
  PatientAge = c(34, 29, 45, 50, 38, 42, 31, 46, 55, 37),
  Diagnosis = c("Hypertension", "Diabetes", "Asthma", "Cancer", "Hypertension", "Diabetes", "Asthma", "Cancer", "Hypertension", "Diabetes")
)

Using datatable to Display Data

datatable(patient_data)

Encryption Techniques

a) AES Encryption

AES (Advanced Encryption Standard) is a symmetric-key algorithm commonly used for encrypting structured data.

# Generate a secret key and IV
key <- rand_bytes(32) # AES key should be 16, 24, or 32 bytes
iv <- rand_bytes(16)  # AES block size is 16 bytes

# Encrypt PatientID using AES
aes_encrypt <- function(text, key, iv) {
  raw <- charToRaw(text)
  encrypted <- aes_cbc_encrypt(raw, key, iv)
  return(base64_encode(encrypted))
}

patient_data_aes <- patient_data %>%
  mutate(PatientID_Encrypted = sapply(as.character(PatientID), function(x) aes_encrypt(x, key, iv))) %>%
  select(-PatientID)

# Display encrypted data
datatable(patient_data_aes, options = list(pageLength = 5))

b) XAES Encryption

XAES (Extended AES) improves AES encryption by applying multiple iterations.

# Encrypt PatientID using XAES (Multiple AES Rounds)
xaes_encrypt <- function(text, key, iv, rounds = 5) {
  raw <- charToRaw(text)
  encrypted <- raw
  for (i in 1:rounds) {
    encrypted <- aes_cbc_encrypt(encrypted, key, iv)
  }
  return(base64_encode(encrypted))
}

patient_data_xaes <- patient_data %>%
  mutate(PatientID_Encrypted = sapply(as.character(PatientID), function(x) xaes_encrypt(x, key, iv, rounds = 5))) %>%
  select(-PatientID)

# Display encrypted data
datatable(patient_data_xaes, options = list(pageLength = 5))

Decryption Example

To decrypt data, we use the same key and IV that were used for encryption.

# Decrypt PatientID from AES encryption
aes_decrypt <- function(encrypted_text, key, iv) {
  encrypted <- base64_decode(encrypted_text)
  decrypted <- aes_cbc_decrypt(encrypted, key, iv)
  return(rawToChar(decrypted))
}

patient_data_decrypted <- patient_data_aes %>%
  mutate(PatientID_Decrypted = sapply(PatientID_Encrypted, function(x) aes_decrypt(x, key, iv)))

datatable(patient_data_decrypted, options = list(pageLength = 5))

Conclusions

Using AES and XAES encryption algorithms, healthcare organizations can offer patient data security, which is HIPAA and GDPR compliant. In this tutorial, we have demonstrated how patient data can be encrypted and decrypted in R and RStudio efficiently.

For best practice, always:

An understanding of the differences between anonymization, hashing, and encryption enables organizations to select the right method for data protection without infringing on privacy and regulation compliance.

Best regards, Marios Vardalachakis