Welcome to the SangerTools package

This package has been created to provide convenient functions for working with population health data. It is specifically aimed at healthcare providers and services that focus on population health management.

Many of the functions are centered around the Master Patient Index format. Where each row is a patient and each column is an observation of that patient.

In the next sections we will take you through how you use the tool.

Loading in Health Data

To load in the Master Patient Index attached to the SangerTools package follow these steps:


health_data <- SangerTools::master_patient_index

PseudoNHSNumber	Sex	Obesity	Age	IMD_Decile	Ethnicity	Locality	PrimaryCareNetwork
DMLQS0759Z	Female	0	59	1	White	North Cotswolds	North and South Gloucester
CAGCX7932K	Male	0	37	1	White	North Cotswolds	The Forest of Dean
VXKJK2204N	Female	0	69	10	White	Stroud and Berkeley Vale	North and South Gloucester
GBXXP7419B	Male	1	27	5	White	Gloucester City	The Forest of Dean
RMHVR7395Q	Male	0	84	4	White	Stroud and Berkeley Vale	North Cotswolds
DODAE0089W	Female	0	91	5	White	Cheltenham	Cheltenham St Pauls

The data contains:

PseudoNHSNumber - A Pseudonymised NHS Patient Identifier
Sex - The identifiable sex of the patient
Smoker - Health Condition Flag: 1 denotes if the patient is a smoker
Diabetes - Health Condition Flag: 1 denotes if the patient has diabetes
Dementia - Health Condition Flag: 1 denotes if the patient has dementia
Obesity - Health Condition Flag: 1 denotes if the patient is Obese
Age - Age of the patient
IMD_Decile - The decile of indices of multiple deprivation
Ethnicity - The identifiable ethnicity of the patient}
Locality - The region where the patient lives - sampled from Gloucestershire Clinical Commissioning Group (GCCG)
PrimaryCareNetwork - The network of General Practioners that the patient is registered with - sampled from (GCCG)

With the data we can start to work with the function in the package.

Age bands

It is common practice to transform continuous age values into a smaller set of categories.

There are two functions in the package for doing this.

age_bandizer
age_bandizer2

age_bandizer Uses a tidyverse philosophy of function creation with Non-Standard Evaluation. It will produce a new column with 5 year age bands as a factor.

age_bandizer2 uses Standard Evaluation and currently takes in puts of band size 2,5,10 & 20.

health_data <- SangerTools::age_bandizer(df = health_data,
                                         Age_col = Age)
#health_data <- SangerTools::age_bandizer_2(df = health_data,Age_col = "Age",Age_band_size = 2)

Age	Ageband
59	55-59
37	35-39
69	65-69
27	25-29
84	80-84
91	90-94

Generating a categorical column chart easily

The package makes it very simple to create a categorical column chart. This will be implemented in the next example:

# Group by Ethnicity
diabetes_df <- health_data %>% 
  dplyr::filter(Diabetes==1)
  
  SangerTools::categorical_col_chart(df = diabetes_df,
                                     grouping_var = Ethnicity)+
  scale_fill_sanger()+ 
  labs(title = "Diabetic Patients by Ethnicity",
       subtitle = "Nearly All Diabetics are White",
       x = NULL, 
       y = "Number of Patients") + 
  coord_flip()


# Group by Sex
health_data %>% 
  dplyr::filter(Diabetes==1) %>% 
  SangerTools::categorical_col_chart(Sex) + 
  scale_fill_sanger()+
  labs(title = "Diabetic Patients by Gender",
       x = NULL, 
       y = "Number of Patients")

It really is that simple to generate very nice looking proportional charts.

Crude Prevalence

Here we will look at the crude rate of diabetes To obtain the crude prevalence rate, this can be achieved below:

 crude_prevalence <- SangerTools::crude_rates(df = health_data,
                                              Condition =  Diabetes, 
                                              Locality)
#> Joining, by = "Locality"

Locality	Cohort_Size	Diabetes_Population	Prevalence_1k
The Forest of Dean	926	56	60.47516
Tewkesbury Newent and Staunton	676	38	56.21302
Stroud and Berkeley Vale	1806	94	52.04873
South Cotswolds	878	44	50.11390
Cheltenham	2490	118	47.38956
Gloucester City	2704	126	46.59763
North Cotswolds	520	22	42.30769

Age Standardised Rates

Let’s revisit the example above. Diabetes is highly confounded by age. Most diabetics will be diagnosed after the age of 40.


asr_prevalence <- SangerTools::standardised_rates_df(df = health_data,
                                   Split_by = Locality,
                                   Condition = Diabetes, 
                                   Population_Standard = NULL,
                                   Granular = FALSE,
                                   Ageband )
#> Joining, by = c("Locality", "Ageband")
#> Joining, by = "Ageband"

Locality	Standardised_Rate_1k
The Forest of Dean	59.53998
Tewkesbury Newent and Staunton	55.59723
Stroud and Berkeley Vale	51.83629
South Cotswolds	49.96958
Cheltenham	47.41501
Gloucester City	46.29408
North Cotswolds	39.88497

Given that each of the Localities now has the population structure of the county as whole; we can see slight differences to the crude prevalence rates.

Age Standardised Rates with User Defined Population Structure

We will use another dataset attached to the package; the UK population from the year 2018. This is broken down by 5 year age bands. For a user define population structure to integrate with standardised_rates_df ensure to change the name of the population column to Pop_Weight

Load 2018 UK Population Structure


uk_pop18<- SangerTools::uk_pop_standard

names(uk_pop18) <- c("Pop_Weight","Ageband")

Pop_Weight	Ageband
3,914,000	0-4
4,139,000	5-9
3,859,000	10-14
3,669,000	15-19
4,185,000	20-24
4,527,000	25-29

UK Age Standardised Diabetes Prevalence

asr_uk <- SangerTools::standardised_rates_df(df = health_data,
                                   Split_by = Locality,
                                   Condition = Diabetes, 
                                   Population_Standard = uk_pop18,
                                   Granular = FALSE,
                                   Ageband )
#> Joining, by = c("Locality", "Ageband")
#> Joining, by = "Ageband"

Locality	Standardised_Rate_1k
The Forest of Dean	60.60304
Tewkesbury Newent and Staunton	49.59213
Stroud and Berkeley Vale	45.56593
South Cotswolds	43.48904
Gloucester City	43.20954
Cheltenham	42.74935
North Cotswolds	34.66700

Combining Results

To view both crude and standardised rates we can use dplyr::left_join

combined_rates <- crude_prevalence %>% 
  dplyr::left_join(asr_prevalence, by = c("Locality"))

Locality	Cohort_Size	Diabetes_Population	Prevalence_1k	Standardised_Rate_1k
The Forest of Dean	926	56	60.47516	59.53998
Tewkesbury Newent and Staunton	676	38	56.21302	55.59723
Stroud and Berkeley Vale	1806	94	52.04873	51.83629
South Cotswolds	878	44	50.11390	49.96958
Cheltenham	2490	118	47.38956	47.41501
Gloucester City	2704	126	46.59763	46.29408
North Cotswolds	520	22	42.30769	39.88497

Other functionality added into the package would be to use the multiple CSV reader and clipboard functions.

Excel Clipboard function

This copies a data frame to the clipboard for you for then pasting into Excel sheets, or csvs, or raw text.


SangerTools::excel_clip(combined_rates)

There is the potential to read from multiple CSVs as well and then these can be fed into data frames.

Multiple CSV reader

To implement this function you would need to have a number of CSVs contained in a folder. To read these in, follow the below instructions:

file_path = 'my_file_path_where_csvs_are_stored'

if (length(SangerTools::multiple_csv_reader(file_path))==0){
  message("This won't work without changing the variable input to a local file path with CSVs in")
}
#> This won't work without changing the variable input to a local file path with CSVs in

multiple_excel_reader is the equivalent for excel files; however please read function documentation page as both functions have a strict set of requirements for execute.

Splitting Dataframes and Saving

split_and_save is a quick way to split a dataframe on a specified column into subsequent dataframes after which each of dataframes is dynamically written to a location choice

SangerTools::split_and_save(
 df = health_data,
 Split_by = "Locality",
 file_path = "Inputs/",
 prefix = NULL
)

Results to SQL

Write your results to SQL Server ensuring that the table name appears as expected.

This function makes a number of assumptions and is limited in scope; please read documentation.

SangerTools::df_to_sql(df = combined_rates,
                       driver = "SQL SERVER",
                       server = "Org-sql-db",
                       database = "MyReports",
                       sql_table_name = "Diabetes_Prevalence",
                       overwrite = FALSE)

See Brand Colours

This is an anonymous function and can be called without any arguments


show_brand_palette()

#> [1] "#9880BB" "#0061BA" "#3BBCD9" "#223873" "#71B72B"

See More Colours

This is also an anonymous function; it will show an extended colour palette

show_extended_palette()

#>  [1] "#9880BB" "#0061BA" "#3BBCD9" "#223873" "#71B72B" "#D585BA" "#007761"
#>  [8] "#4D8076" "#00C9A7" "#4A4453" "#C27767" "#D5CABD"

Closing

More functions are being added to this tool and a new version of the file will be released on CRAN very soon. Keep an eye out on the associated GitHub for updates.

SangerTools Vignette