author: Sam Weiss date: 4/4/2016 autosize: true
Goal is to figure out where people switch drugs from and to and why
In this I will present a formulation:
Labelling data - Find out which texts are associated with switching and whether stopped or starting
Extract Text features from data. These can be considered topics
Model relations between labelling data from extracted features-
Data given sent to me was very messy. Some text about drug companies financials and in other languages. Decided to scrape data from webmd for each drug provided. This provided a standard dataset to play with. (data enclosed in email). Captured ~2700 reviews/ posts
spacy used NLP to create a tree of dependencys for words within a sentence. I can correctly classify a sentence like:
FALSE [1] " doctor stopped byetta was put on januvia and glimepiride mgs"
FALSE relation first second sentence
FALSE 11480 nsubj doctor stopped 916
FALSE 11481 ROOT stopped ROOT*stopped 916
FALSE 11482 nsubjpass byetta put-stopped 916
FALSE 11483 auxpass was put-stopped 916
FALSE 11484 ccomp put stopped 916
FALSE 11485 prep on put-stopped 916
FALSE 11486 pobj januvia on-put-stopped 916
FALSE 11487 cc and januvia-on-put-stopped 916
FALSE 11488 compound glimepiride mgs-januvia-on-put-stopped 916
FALSE 11489 mgs-januvia-on-put-stopped 916
FALSE 11490 conj mgs januvia-on-put-stopped 916
From this you can see that Byetta has only “put-stopped” while Januvia has “on-put-stopped”. This allows me to correctly label the transition from one drug to another. This is very powerful and I think a lot more can be done to determine the structure of a sentence. For now I’ll just use it for labelling purposes.
The first question was which drugs to people switch from and to. Below is a slankey diagram of this where you can see what drugs a consumer stops (on the left) and starts (on the right)
However, only about 90 posts had information that mentioned switching to a different drug by name. Can’t really do much stats on 90 observations so I can’t answer the question directly. I thought I’d try to answer a different question: Why do people start and stop specific drugs?
The methodology is to extract features and correlate these features with whether a person starts / stops taking a certain drug. I used a method called “Latent Dirichelet Allocation” or “Topic Models” to extract features. This assumes each sentence is a combination of different “Topics”. Below are the results of important words for each topic. We have to intepret the results ourselves but it seems to do an ok job for a first try.
Next we regress these features on whether a person switches a drug or not. The model coefficients are then used to determine what is associated with increase in probability of switching or starting a drug. Below the Topic is “gain_also_weight_swell_leg_pain_lbs” which is either gaining weight and fluid and swelling. Below we can see the Actos is more associated with this feature for BOTH stopping and starting the drug… This is because people often discuss of changes when they start a new drug, not why they started one. For example:
FALSE [1] "i am so glad i read these reviews after only weeks of taking actos i have leg swellingpain and severe abdominal problems"
FALSE [2] "ive been taking actos for about months the med works well but i have gained about lbs and i have swelling in my legs and some slight breathing problems"
FALSE [1] "i was started on januvia yrs ago or so and was doing well and blood sugars were under control and then had to stop and take actos because of high cost of januvia"
FALSE [2] "why is this medicine so costly why do you all not have a generic to actos"
FALSE [1] "insurance forced me from byetta to victoza due to the cost"
FALSE [2] " all im not too happy with is the cost of victoza"
FALSE [1] "bydureon definetly has helped me nausea was horrible in the begining but has gotten better"
FALSE [2] " i have had two bouts of extreme nausea and vomiting which i first attributed to the flu but now have realized it was probably a side effect of bydureon"
FALSE [1] " i didnt get the nausea except when i injected byetta into my arm"
FALSE [2] " i was nauseated the first time i took byetta and one other time after that"
I haven’t been able to directly answer the question you’ve asked. However, I do think these methods show there is some value in identying which drugs are associated with which topics. This will at least give you a better idea of what to look for.