##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:data.table':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:Hmisc':
##
## src, summarize
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:Hmisc':
##
## is.discrete, summarize
Your new client is a health insurance company. After a lengthy review of their business, the insurance company has decided to prioritize improvements in medication adherence. For our initial work, we will focus on patients with heart disease and how well they take their medications.
Your team has received some modest training from a physician. Here are the basic facts you need to know. Heart disease is one of the most pervasive health problems, especially for older patients. The initial diagnosis typically occurs too late. Most patients only become aware that they have heart disease after experiencing an acute episode. This can be limited to moderate symptoms, which might be treated by either medications or a light procedure. In more severe cases, the patient might suffer a major event such as a myocardial infarction (heart attack) or need a significant surgical operation. Whether minor or major, these events often include a hospitalization. After the initial diagnosis, patients are typically prescribed a range of medications. Three primary therapies include ACE inhibitors, beta blockers, and statins.
The insurance company has helpfully compiled data on a large number of patients. They have included a number of important clinical factors about their baseline conditions. Then, starting from the time of their initial diagnoses of heart disease, the patients were tracked based upon which medications were filled at the pharmacy. The medication records are presented in the form of panel data. A single patient’s records are linked by a unique identifier. The time measurements represent the number of days since baseline. Prescriptions are typically filled for 30 or 90 days of medications. For this study, you may assume that the patients qualified for our study and reasonably could have been expected to be prescribed all of the medicines we are tracking.
In this project, you will develop an approach to working with the information. The client company has provided a list of questions they would like to address. In addition to building the report, our team would also like you to present recommendations on how to improve upon the infrastructure. We also want you to identify opportunities for the client to make use of the information you’re working with in novel ways.
This project is divided into 4 parts:
Part 1: Summarizing the data.
Part 2: Answering specific questions about medication adherence.
Part 3: Generalizing and automating the reporting infrastructure for use beyond the current version.
Part 4: Identifying opportunities.
Please click on the other tabs for additional information.
How would you summarize the data? For each table, write 2-4 sentences with relevant information. Briefly describe what is measured in the data and provide a summary of the information. You can show a table or graphic, but keep things short.
This part of the report will be directed to your internal team at the consulting company. It is intended to document the sources of information that were used in the project. It will also describe the data in less technical terms to team members who are not data scientists. If another member of the team joins the project later, they will rely on your descriptions to gain familiarity with the data. To that end, we recommend providing some instructions that will help other consultants use the information more effectively.
Click on the tabs below for further instructions.
The Baseline Information table contains rows each representing a patient, identified by a unique patient ID. The table identifies whether or not each patient has diabetes and classifies the severity of their heart disease baseline condition. It also provides us with demographics of the patient’s age, sex, and demographic region.
The Adherence table connects to the Baseline Information table using the same unique patient ID. There are multiple rows per patient with each row representing a record of the type of medication(s) prescribed to them and the time at which the prescription is filled at the pharmacy. Each record contains a time interval with a beginning (t1) and end (t2), measured in days since the baseline diagnosis. The binary variables ace, bb, and statin represent whether the patient is in possession of the prescribed medications ACE inhibitors, beta blockers, and statins, respectively, during the specified time interval.
In addition to your summary, our team has identified specific questions of interest. Please provide these answers in output that is easy to read (e.g. tables).
This part of the report will be directed to medical case management teams throughout the client’s company. The idea is to give them the useful information they need to act on the specific questions they posed. Plan your communication accordingly.
Notes: Using data.table, most of these calculations can be solved in a moderate number of steps. Many of the questions may require information from multiple tables. Use the merge function to combine tables as needed. HTML-friendly tables can be constructed using the datatable function in the DT package.
These questions were carefully crafted based upon the client’s needs. It is important to answer them based on what is stated. To that end, please read each question closely and answer it accordingly.
The questions are listed in the tabs below.
What was the median length of followup? What percentage of the patients had at least 1 year of records?
The median length of followup time for a patient is 453 days. 54.21% of patients have at least 1 year of records.
For patients with at least 1 year of follow-up, their one-year adherence to a medication is the proportion of days in the first year after diagnosis during which the medication was possessed. For each medication, what was the average one-year adherence for the population? Use only the patients with at least 1 year of follow-up records.
## Medication Adherence
## 1: ace 61.30
## 2: bb 70.78
## 3: statin 80.10
How many medications are the patients taking? For patients with at least one year of follow-up, use their records during the first year after the initial diagnosis. Calculate the overall percentage distribution of the days that the patients are taking 0, 1, 2, and all 3 medications.
## Medications Proportion
## 1: 0 0.04
## 2: 1 0.20
## 3: 2 0.38
## 4: 3 0.38
What is the impact of age, sex, region, diabetes, and baseline condition on the one-year adherence to each medication? Use only the patients with at least 1 year of follow-up records. Fit separate linear regression models for each medicine. Then briefly comment on the results.
##
## Call:
## lm(formula = adherence ~ age + sex + region + diabetes + baseline.condition,
## data = q4_dat_ace)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.69834 -0.15450 -0.01105 0.14852 0.52164
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 0.9170122 0.0154077
## age -0.0038342 0.0002301
## sexMale -0.0186314 0.0027533
## regionNortheast 0.0249788 0.0039741
## regionSouth 0.0048107 0.0044613
## regionWest 0.0300116 0.0038461
## diabetes 0.0477187 0.0032497
## baseline.conditionmoderate symptoms or light procedure -0.1142654 0.0028932
## t value Pr(>|t|)
## (Intercept) 59.517 < 2e-16 ***
## age -16.666 < 2e-16 ***
## sexMale -6.767 1.34e-11 ***
## regionNortheast 6.285 3.32e-10 ***
## regionSouth 1.078 0.281
## regionWest 7.803 6.26e-15 ***
## diabetes 14.684 < 2e-16 ***
## baseline.conditionmoderate symptoms or light procedure -39.495 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2243 on 26593 degrees of freedom
## Multiple R-squared: 0.07452, Adjusted R-squared: 0.07428
## F-statistic: 305.9 on 7 and 26593 DF, p-value: < 2.2e-16
Being from the Northeast region or the West region and presence of diabetes have a significant positive impact on one-year adherence for ACE inhibitors. Age, being of male sex, and having a baseline condition of moderate symptoms or light procedure have a significant negative impact on one-year adherence for ACE inhibitors.
##
## Call:
## lm(formula = adherence ~ age + sex + region + diabetes + baseline.condition,
## data = q4_dat_bb)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.78888 -0.14383 0.01898 0.15975 0.43316
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 1.0469036 0.0145373
## age -0.0043447 0.0002171
## sexMale -0.0325345 0.0025991
## regionNortheast 0.0207448 0.0037467
## regionSouth -0.0087552 0.0042092
## regionWest 0.0120076 0.0036288
## diabetes 0.0340711 0.0030718
## baseline.conditionmoderate symptoms or light procedure -0.0868558 0.0027328
## t value Pr(>|t|)
## (Intercept) 72.015 < 2e-16 ***
## age -20.015 < 2e-16 ***
## sexMale -12.517 < 2e-16 ***
## regionNortheast 5.537 3.11e-08 ***
## regionSouth -2.080 0.037533 *
## regionWest 3.309 0.000938 ***
## diabetes 11.091 < 2e-16 ***
## baseline.conditionmoderate symptoms or light procedure -31.782 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2125 on 26780 degrees of freedom
## Multiple R-squared: 0.06041, Adjusted R-squared: 0.06016
## F-statistic: 246 on 7 and 26780 DF, p-value: < 2.2e-16
Being from the Northeast region or the West region and presence of diabetes have a significant positive impact on one-year adherence for beta blockers. Age, being of male sex, being from the South region, and having a baseline condition of moderate symptoms or light procedure have a significant negative impact on one-year adherence for beta blockers.
##
## Call:
## lm(formula = adherence ~ age + sex + region + diabetes + baseline.condition,
## data = q4_dat_st)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.87019 -0.09046 0.01311 0.13374 0.28313
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 1.0152454 0.0113606
## age -0.0026537 0.0001696
## sexMale -0.0118672 0.0020364
## regionNortheast 0.0045849 0.0029354
## regionSouth -0.0068873 0.0032941
## regionWest -0.0003789 0.0028410
## diabetes 0.0282418 0.0024103
## baseline.conditionmoderate symptoms or light procedure -0.0646760 0.0021437
## t value Pr(>|t|)
## (Intercept) 89.366 < 2e-16 ***
## age -15.647 < 2e-16 ***
## sexMale -5.828 5.69e-09 ***
## regionNortheast 1.562 0.1183
## regionSouth -2.091 0.0366 *
## regionWest -0.133 0.8939
## diabetes 11.717 < 2e-16 ***
## baseline.conditionmoderate symptoms or light procedure -30.170 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1673 on 27034 degrees of freedom
## Multiple R-squared: 0.04649, Adjusted R-squared: 0.04625
## F-statistic: 188.3 on 7 and 27034 DF, p-value: < 2.2e-16
Presence of diabetes has a significant positive impact on one-year adherence for statins. Age, being of male sex, being from the South region, and having a baseline condition of moderate symptoms or light procedure have a significant negative impact on one-year adherence for statins.
For each medicine, what percentage of the patients filled a prescription in the first two weeks after their initial diagnoses?
## Medications Percentage
## 1: ace 59.17
## 2: bb 66.00
## 3: statin 79.80
Now let’s compare those who filled a prescription for a statin in the first two weeks after diagnosis to those who did not. Do these two groups have different baseline covariates? Compare the groups based on their ages. Then compare the distribution of baseline conditions in the two groups. For continuous variables, compare their means using a t-test. For the categorical variables, compare their distributions using a chi-squared test.
##
## Welch Two Sample t-test
##
## data: age_yes and age_no
## t = -13.822, df = 20634, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.8687130 -0.6529347
## sample estimates:
## mean of x mean of y
## 64.68361 65.44443
The t-test indicates that there is a difference in mean age between patients who filled a statin prescription in the first two weeks and those who did not. The mean age for patients who did is 64.68 and the mean age for patients who did not is 65.44.
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: baseline_conditions
## X-squared = 776.59, df = 1, p-value < 2.2e-16
The chi-squared test indicates that there is a statistically significant difference in the distribution of baseline conditions between patients who filled a statin prescription in the first two weeks and those who did not.
How do the variables of age, sex, region, diabetes, and baseline condition impact the likelihood of initiating a medication within 14 days? For each medicine, fit a logistic regression model and comment on the results.
##
## Call:
## glm(formula = ace_14 ~ age + sex + region + diabetes + baseline.condition,
## family = "binomial", data = q7_ace)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7961 -1.3583 0.8248 0.9723 1.1598
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 1.800170 0.078078
## age -0.014283 0.001164
## sexMale -0.064694 0.013822
## regionNortheast 0.069831 0.019856
## regionSouth -0.022910 0.022247
## regionWest 0.096315 0.019243
## diabetes 0.179836 0.016415
## baseline.conditionmoderate symptoms or light procedure -0.471113 0.014793
## z value Pr(>|z|)
## (Intercept) 23.056 < 2e-16 ***
## age -12.269 < 2e-16 ***
## sexMale -4.681 2.86e-06 ***
## regionNortheast 3.517 0.000437 ***
## regionSouth -1.030 0.303088
## regionWest 5.005 5.58e-07 ***
## diabetes 10.956 < 2e-16 ***
## baseline.conditionmoderate symptoms or light procedure -31.847 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 121213 on 93606 degrees of freedom
## Residual deviance: 119862 on 93599 degrees of freedom
## AIC: 119878
##
## Number of Fisher Scoring iterations: 4
Higher age and presence of diabetes have a significant positive logistic relationship with 14-day initiation of ACE inhibitors, and being of male sex has a significant negative relationship with initiation compared with being female. There is also evidence that the Northeast and West regions have a significant positive impact on 14-day initiation compared with the other regions, while moderate/light baseline has a significant negative impact compared to major heart attack baseline.
##
## Call:
## glm(formula = bb_14 ~ age + sex + region + diabetes + baseline.condition,
## family = "binomial", data = q7_bb)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9790 -1.4349 0.7570 0.8579 1.0990
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 2.64699 0.08278
## age -0.02219 0.00123
## sexMale -0.18039 0.01459
## regionNortheast 0.11342 0.02101
## regionSouth -0.04510 0.02333
## regionWest 0.06662 0.02024
## diabetes 0.17536 0.01737
## baseline.conditionmoderate symptoms or light procedure -0.43663 0.01571
## z value Pr(>|z|)
## (Intercept) 31.975 < 2e-16 ***
## age -18.042 < 2e-16 ***
## sexMale -12.366 < 2e-16 ***
## regionNortheast 5.399 6.72e-08 ***
## regionSouth -1.933 0.053244 .
## regionWest 3.292 0.000995 ***
## diabetes 10.094 < 2e-16 ***
## baseline.conditionmoderate symptoms or light procedure -27.790 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 112271 on 93606 degrees of freedom
## Residual deviance: 110870 on 93599 degrees of freedom
## AIC: 110886
##
## Number of Fisher Scoring iterations: 4
Higher age and presence of diabetes have a significant positive logistic relationship with 14-day initiation of beta blockers, and being of male sex has a significant negative relationship with initiation compared with being female. There is also evidence that the Northeast and West regions have a significant positive impact on 14-day initiation compared with the other regions, while moderate/light baseline has a significant negative impact compared to major heart attack baseline.
##
## Call:
## glm(formula = st_14 ~ age + sex + region + diabetes + baseline.condition,
## family = "binomial", data = q7_st)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3249 0.4732 0.5680 0.6445 0.8266
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 3.455115 0.101840
## age -0.022406 0.001507
## sexMale -0.097113 0.017882
## regionNortheast 0.073572 0.025790
## regionSouth -0.020082 0.028687
## regionWest 0.032474 0.024802
## diabetes 0.221175 0.021730
## baseline.conditionmoderate symptoms or light procedure -0.557409 0.020051
## z value Pr(>|z|)
## (Intercept) 33.927 < 2e-16 ***
## age -14.869 < 2e-16 ***
## sexMale -5.431 5.61e-08 ***
## regionNortheast 2.853 0.00434 **
## regionSouth -0.700 0.48392
## regionWest 1.309 0.19042
## diabetes 10.179 < 2e-16 ***
## baseline.conditionmoderate symptoms or light procedure -27.800 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 82949 on 93606 degrees of freedom
## Residual deviance: 81781 on 93599 degrees of freedom
## AIC: 81797
##
## Number of Fisher Scoring iterations: 4
Higher age and presence of diabetes have a significant positive logistic relationship with 14-day initiation of beta blockers, and being of male sex has a significant negative relationship with initiation compared with being female. There is also evidence that the Northeast region has a significant positive impact on 14-day initiation compared with the other regions, while moderate/light baseline has a significant negative impact compared to major heart attack baseline.
For those who did fill their prescriptions within 2 weeks, how long does it typically take to fill that first prescription after the initial diagnosis? For each medicine, provide the mean, median, and standard deviation in units of days.
## Measure Days
## 1: mean 1.670801
## 2: median 0.000000
## 3: sd 2.721382
## Measure Days
## 1: mean 1.729408
## 2: median 0.000000
## 3: sd 2.756269
## Measure Days
## 1: mean 1.747807
## 2: median 0.000000
## 3: sd 2.751965
How does filling a prescription in the first two weeks impact adherence? If we want to see that a medicine is working, we need to start the observation after the patient has had a chance to fill the prescription. To answer this question, we will follow a number of steps:
Identify which patients filled a prescription in the first two weeks.
Then, for each patient with at least 379 days of followup, measure the one-year adherence rate (see Question 2) starting at two weeks after the initial diagnosis. This interval will begin at day 14 and last for 365 days.
Fit a linear regression model of this one-year adherence including the baseline covariates (age, sex, region, diabetes, baseline condition) and an indicator of whether this patient filled a prescription for the medicine in the first two weeks.
Perform this analysis for each medicine and comment on the results.
##
## Call:
## lm(formula = rate ~ age + sex + region + diabetes + baseline.condition +
## ace_14, data = q9_ace_ba)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.13917 -0.06836 -0.05776 0.09274 0.88662
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 0.1432542 0.0072698
## age 0.0001263 0.0001071
## sexMale -0.0015130 0.0012864
## regionNortheast -0.0005469 0.0018526
## regionSouth 0.0006000 0.0020791
## regionWest -0.0003283 0.0017938
## diabetes -0.0033896 0.0015240
## baseline.conditionmoderate symptoms or light procedure 0.0017837 0.0013631
## ace_14 -0.0113595 0.0013211
## t value Pr(>|t|)
## (Intercept) 19.705 <2e-16 ***
## age 1.179 0.2384
## sexMale -1.176 0.2395
## regionNortheast -0.295 0.7678
## regionSouth 0.289 0.7729
## regionWest -0.183 0.8548
## diabetes -2.224 0.0261 *
## baseline.conditionmoderate symptoms or light procedure 1.309 0.1907
## ace_14 -8.598 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1045 on 26436 degrees of freedom
## Multiple R-squared: 0.003394, Adjusted R-squared: 0.003093
## F-statistic: 11.25 on 8 and 26436 DF, p-value: 4.891e-16
The linear regression results indicate that presence of diabetes and filling ACE inhibitor prescription in the first two weeks have a significant negative impact on one-year ACE adherence rate starting after two weeks.
##
## Call:
## lm(formula = rate ~ age + sex + region + diabetes + baseline.condition +
## bb_14, data = q9_bb_ba)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.13784 -0.06735 -0.05922 0.09260 0.88850
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 0.1405686 0.0073247
## age 0.0001328 0.0001073
## sexMale -0.0016558 0.0012887
## regionNortheast -0.0005588 0.0018545
## regionSouth 0.0005346 0.0020810
## regionWest -0.0006381 0.0017949
## diabetes -0.0036794 0.0015249
## baseline.conditionmoderate symptoms or light procedure 0.0024229 0.0013621
## bb_14 -0.0070578 0.0013720
## t value Pr(>|t|)
## (Intercept) 19.191 < 2e-16 ***
## age 1.237 0.2160
## sexMale -1.285 0.1988
## regionNortheast -0.301 0.7632
## regionSouth 0.257 0.7973
## regionWest -0.356 0.7222
## diabetes -2.413 0.0158 *
## baseline.conditionmoderate symptoms or light procedure 1.779 0.0753 .
## bb_14 -5.144 2.71e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1046 on 26436 degrees of freedom
## Multiple R-squared: 0.001607, Adjusted R-squared: 0.001304
## F-statistic: 5.317 on 8 and 26436 DF, p-value: 1.086e-06
The linear regression results indicate that presence of diabetes and filling beta blocker prescription in the first two weeks have a significant negative impact on one-year beta blocker rate starting after two weeks.
##
## Call:
## lm(formula = rate ~ age + sex + region + diabetes + baseline.condition +
## st_14, data = q9_st_ba)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.13391 -0.06621 -0.05991 0.09205 0.89399
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 0.1310451 0.0073944
## age 0.0001754 0.0001073
## sexMale -0.0013346 0.0012882
## regionNortheast -0.0007764 0.0018551
## regionSouth 0.0005932 0.0020820
## regionWest -0.0007285 0.0017957
## diabetes -0.0040491 0.0015259
## baseline.conditionmoderate symptoms or light procedure 0.0032844 0.0013645
## st_14 0.0019017 0.0016132
## t value Pr(>|t|)
## (Intercept) 17.722 < 2e-16 ***
## age 1.634 0.10218
## sexMale -1.036 0.30020
## regionNortheast -0.419 0.67557
## regionSouth 0.285 0.77571
## regionWest -0.406 0.68497
## diabetes -2.654 0.00797 **
## baseline.conditionmoderate symptoms or light procedure 2.407 0.01609 *
## st_14 1.179 0.23848
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1046 on 26436 degrees of freedom
## Multiple R-squared: 0.0006597, Adjusted R-squared: 0.0003573
## F-statistic: 2.181 on 8 and 26436 DF, p-value: 0.02577
The linear regression results indicate that presence of diabetes has a significant negative impact on one-year statin adherence rate starting after two weeks and that a baseline condition of moderate symptoms or light procedure has a significant positive impact on one-year statin adherence rate starting after two weeks.
Once a patient starts a medication, how long do they continuously have a filled prescription? For each patient who filled a medication, start with the first filled prescription and count the duration of days until a gap occurs or follow-up ends. Then provide the mean, median, and standard deviation for these durations. Do this separately for each medicine.
## Measure Days
## 1: mean 22.55261
## 2: median 20.00000
## 3: sd 26.40845
## Measure Days
## 1: mean 22.97599
## 2: median 21.00000
## 3: sd 27.15116
## Measure Days
## 1: mean 25.12437
## 2: median 25.00000
## 3: sd 28.67346
This part of the report will be directed internally to your team’s engagement manager. The idea is to present these approaches to your team. The work will then be conveyed to the client’s technical team and middle managers who are working closely with you on the project. Plan your communication accordingly.
Did you see any problems with the data set? If so, whom would you report them to, and what would you do to address them? What would be different about the next version of the data?
One problem I see with the data set is that each patient is assigned only one baseline condition. This perhaps makes the analysis easier but may lead to inaccuracies because we cannot see changes in condition over time - for instance, if a patient classified as moderate symptoms has a baseline has a major heart attack or operation during the course of their treatment, their medication plan and adherence rate may change but we still group them with the prior group.
Another limitation of this data set and panel data in general is that the time only factors in time since start of treatment for each patient and not the actual date. This makes it difficult to track and analyze aggregate trends over time.
To address these limitations, I would find out who owns these data sets within the company, and report my findings to them. In order to perform more comprehensive analyses over time, I would suggest and facilitate adding additional columns for start date and current (at the time of record) condition in the next version of the data.
If the organization wants to monitor this kind of information over time, what would they need to provide, and at what frequency?
If the organization wants to monitor medication adherence over time, they would need to start by establishing the expectation or goal for each medication. So for each patient, it would be beneficial to record if and when they are first prescribed a certain medication, and for how long and at what frequency they are supposed to take it. We would also collect data on how many days they are actually filling the prescription and thus in possession of the medication until their treatment plan is complete. Then we can compare actual vs planned for each patient and each medication and continuously monitor over time to be aware of trends and take targeted actions to improve adherence.
How would you build on the reporting capabilities that you have created? What would you design next?
To build on these reporting capabilities, I would like to analyze the interactions between ACE inhibitors, beta blockers, and statins. In this project, we looked at each medication individually, but in reality, it may be important to understand how taking multiple medications concurrently and/or in sequence affects the trends and drivers of adherence.
Additionally, I would like to collect and analyze the reasons for lags or other events in the panel data. For instance, if the medications are expensive and patients cannot always afford to refill them on time, that should be reflected in adherence reports. It would also be interesting to collect the prices for each medication and see if this has an impact on adherence rate differences between the medications and between demographic groups as well as if changes in prices correlate with changes in adherence rates over time.
I would also like to record and analyze health results of the patients we are collecting the medication logs for. This would help us to see how patient demographics, prescriptions and adherence rates impact negative outcomes like heart attacks and deaths as well as positive outcomes like successful completion of treatment and long-term heart stability. With this information, we would be able to ensure that we are making decisions that best target the well-being of patients and the overall goals of the company.
This part of the report will be directed externally to your client’s senior leadership. Your work will help to determine the future direction of the project and the company’s contract with this client. Plan your communication accordingly.
What are some opportunities to learn valuable information and inform strategic decisions? List a number of questions that you might explore.
How does prescription adherence rate change over time? Are there trends in patient refill behavior over the course of treatment?
How often are gaps in medication adherence due to the patient not showing up versus the pharmacy not having prescriptions ready?
If some patients have reminders or notifications for pickups, is there a significant difference in adherence between those who do and those who do not? What are the most effective methods of increasing adherence?
To what extent does medication adherence impact the health effectiveness of each medication?
What kind of interventions would you build to help improve medication adherence? Which populations would you work with? How would you help them?
I would like to build a dashboard to continuously monitor adherence for each patient’s medications in real time in addition to analyzing and reporting on it retroactively. For patients who use mobile and online applications, we could develop an interface that shows them which medications they are prescribed, when they last picked up, and how many days they have left, so that they are aware of their supply and notified to refill before they run out. For patients who are older or otherwise less inclined to use an app, we could implement something similar on a periodic basis via an automated phone calling system.
The data reveal that populations who have had moderate symptoms or a light procedure have significantly lower one-year medication adherence than populations who started treatment with a major heart attack or operation. We want to help them by ensuring they properly adhere to their prescriptions to reduce the risk of their conditions worsening to anything more severe. This group can be a priority to target with the refill monitoring and notification system, and then we can perform further analysis to detect if there is an improvement in their adherence as a result of the system.
How would you approach other decisionmakers within the organization to assess their priorities and help them better utilize the available information?
There are additional stakeholders in the company beyond the medical case management teams who could benefit from this data if we implement the right processes to address their interests. For example, the decision maker in charge of customer engagement may be able to adopt an individualized, prescriptive approach to customer notifications and alerts to improve medication adherence based on customers’ past behavior and aggregate trends. Another possibility is that we could identify the decision maker in charge of negotiating medication prices and policies with the pharmaceutical companies and utilize this data to better predict demand and optimize accordingly.
Once I have identified the decision makers, I would like to meet with each decision maker to understand what they care about and how their business area’s success is measured. Once we discuss their business goals, I would inquire and learn about their current day to day processes of how they retrieve information to get their work done. From there, we can identify pain points and time-consuming steps that may be able to be streamlined with a reproducible reporting process. Once I know what they want to do, I can begin to build out potential solutions and work with the decision makers along the way to make changes and improvements on them.
To establish credibility and to give decision makers some starting ideas, I could share the prior work I’ve done with other areas of the client’s organization and discuss the outcomes and impact of those projects.
Video Submission: Make a 2-minute pitch to the client with a proposal for the next phase of work. Include in your request a budget, a time frame, and staffing levels. Explain why this proposal would be valuable for the client and worth the investment in your consulting services. Please submit this answer as a short video recording. You may use any video recording program you feel comfortable with. The only requirements are that you are visible and audible in the video. You may also submit a file of slides if that is part of your pitch.