1.- Overview:

Goal: To describe the occurrence of a disease (Fractures) in determined population (Osteoporosis patients) in a period of time (In 2022)

The incidence rate of fracture will be calculated as the number of new (first-ever) fracture events over the total person-time at risk in the reference population. Person-time at risk is defined for each patient as the time they are at risk of experiencing a first-ever fracture in 2022.

Inclusion/exclusion criteria rules:

Inclusion criteria

  • A minimum of 365 days of database history is required to identify prevalent patients.

  • The start of the time at risk will be defined as the latest of the following dates:

    • 1 st January 2022 (study start)
    • Start of registration with the database + 365 days (minimum required database history)
  • The end of time at risk will be defined as the earliest of the following dates:

    • 31 st December 2022 (study end)
    • End of registration with the database
    • First occurrence of a fracture event in 2022

Exclusion criteria

  • Patients that are not registered in the database in 2022 will not be included in the analysis.
  • Patients with a fracture event recorded at any time before the study period (i.e. prevalent patients) will not be included in the analysis.

The following formula will be used to calculate the incidence rate of fracture:

$IncidenceRate= \(\frac{numberIncidentPatients}{TotalPersonYearsAtRisk}\) *100000 $

Where the total person-years at risk will be the sum of all patients’ time at risk (in years) as defined above. Incident patients are defined as those patients experiencing a first-ever fracture in 2022. Incidence rate will be reported by gender and in total.

2. Steps for analysis (end-to-end)


2.1.- Load libraries and main sources

2.2.- Declare custom Functions (for tableshells specified in the Study Analysis Plan)

2.3.- Pre-processing Data Quality . Data transformations 1

  • Ensure datatypes from columns are what we expect. Using sapply(df_person, class). In this case the “dates” are strings They need to be converted to dates.

2.4.- Data transformations 2 (get study population based in the rules of the study)

QC NOTE: We can see there are 114 patients that have info in the condition_ocurrece table, and in the person_study table. 157 patients do have registration information but no condition information.

## 
## FALSE  TRUE 
##   157   114

QC NOTE: At this stage, from our study population we know which patient got a fracture, which doesn’t. NA patients could be used or not for this study. There will be two options, we could remove the NA, or work with it as if they were no fracture/condition reported. In STEP 2.6 we will decide which option to take.

## 
## fracture in study period (2022)                     no fracture 
##                              81                              33 
##                            <NA> 
##                             157

2.5 . Data transformation 3. Calculate person_time_risk per patient

QC NOTE: Here you can see that for some patients (6) time_at_risk_years is 0 (time_at_risk_years is the difference between risk_end_Date - risk_start_Date ).

person_id time_at_risk_years risk_start_date risk_end_date
45 884 0 2022-12-31 2022-12-31
81 1602 0 2022-12-29 2022-12-29
117 2122 0 2022-12-05 2022-12-05
151 2854 0 2022-12-30 2022-12-30
158 2976 0 2022-12-30 2022-12-30
181 3358 0 2022-07-15 2022-07-15

2.6 .- Questions.

  • Should we include patients with person-time equal to zero?
  • Should we include patients with no condition_ocurrence data?

The incidence rate is the number of new (incident) cases during study follow-up divided by the person-time-at- risk throughout the observation period. Since we don’t have follow up time , neither condition_ocurrence data, we will provide a set of results ‘A’, that exclude these patients. And a set of results ‘B’, ‘C’ and ‘D’, that will show the differences in a final output.

2.7.- Generate table shells (see results below)


3.-Results:

3.1.A Incidence rate table shells : Using custom functions

Female Male Overall
Patients with a first-ever fracture 44.000000 32.000000 76.0000
Person-years at risk 15.980835 15.230664 31.2115
Incidence rate 2.753298 2.101025 2.4350

Table 3.1.A interpretation- Incidence rate per person-years. Overall column : there are 109 patients that follows conditions to be included in the study of the incidence rate (registered in 2022, with more than 365 days in follow up information and more than one day follow up (person-time >0) that had at least one event in the condition_table). From these patients, the first row indicates how many had fractures in 2022 (77). The second row shows the sum of all these 109 patient-time. And the incidence rate is the division between the first and the second row displayed. This “incidence rate” in the last row, could be multiplied by 100.000, in order to get the incidence rate (x100.000 person-years)


3.1.B Incidence rate table shells : Using epiR library

Incidence rate 95% CI lower 95% CI upper
Female 2.753298 2.000551 3.696174
Male 2.101025 1.437099 2.966020
Overall 2.435000 1.918503 3.047766

Table 3.1.B interpretation- Incidence rate per person-years. Overall column : there are 109 patients that follows conditions to be included in the study of the incidence rate (registered in 2022, with more than 365 days in follow up information and more than one day follow up (person-time >0) that had at least one event in the condition_table). From these patients it is calculated the incidence rate, and also the ‘epiR’ library calculates the 95% CI margins.

3.2.A - Plot incidence rate (‘ggplot’ library style & ‘incidence’ library style)


Figure 1.- The distribution of the weekly incidence per gender seems to be “bimodal” . This can’t be modeled with a log-linear regression function.