Synopsis

Over 2,000 comments across several categories of healthcare.gov’s help pages
Real users submitting advice with the “Give Feedback” widget
Synthesize the data in an understandable way
Identify the pain points of healthcare.gov

Data was provided by the Center for Medicaid and Medicare Services (CMS) from the Open Enrollment Period (9 million Americans enrolled).

Outline

Moving the data to a statistical system
General data features
Specific Natural Language Processing
Pain Points and notes for further analysis

Downloaded Data and General Summaries

local download of the 2,500+ comments (the second sheet of the excel doc)

csv <- "Business Analyst Data Analysis Presentation - Open Enrollment Help Page Comments - Comments.csv"

medicare <- read.csv(csv, stringsAsFactors = FALSE)
#reads as 2,179 observations of 2 columns (URL and comment)

## [1] 51

## [1] 2175

51 unique URLs for where the comments are sourced and
2175 unique comments
some comments may be blank or coincidentally an exact match

Approach Reasoning - NLP

The reasons I chose Natural Language Processing in R:

The data is small, but rich. So, doing an extensive process such as n-gram modeling is computationally possible on a single laptop.
The data comes from a specific population - people who wanted to enroll and who also saught help in doing so and also decided to give feedback.
This population may have unique (possibly demographic) similarities that can be identified by both the language use and the most common feedback.

Demographics and other valuable information would be helpful in the future for making actual product recommendations for the system.

Common Feedback Categories

## # A tibble: 51 x 2
##                                                               URL Comment
##                                                             <chr>   <int>
##  1                           help/what-health-coverage-do-i-have/     213
##  2                  help/parent-and-caretaker-relative-questions/     162
##  3                                         help/add-other-income/     128
##  4 help/i-am-having-trouble-logging-in-to-my-marketplace-account/     114
##  5                                      help/deduction-questions/     108
##  6                                     help/automatic-enrollment/     107
##  7                                     help/disability-questions/     103
##  8                          help/found-not-eligible-for-medicaid/     101
##  9                                   help/losing-health-coverage/     101
## 10                                  help/information-on-medicare/      95
## # ... with 41 more rows

Not understanding the coverage you have
Parents, Caretakers, and Relative Questions
Adding other income
Logging in to your marketplace account
Deduction questions

Common Feedback Stats

## # A tibble: 51 x 2
##                                                               URL Comment
##                                                             <chr>   <int>
##  1                           help/what-health-coverage-do-i-have/     213
##  2                  help/parent-and-caretaker-relative-questions/     162
##  3                                         help/add-other-income/     128
##  4 help/i-am-having-trouble-logging-in-to-my-marketplace-account/     114
##  5                                      help/deduction-questions/     108
##  6                                     help/automatic-enrollment/     107
##  7                                     help/disability-questions/     103
##  8                          help/found-not-eligible-for-medicaid/     101
##  9                                   help/losing-health-coverage/     101
## 10                                  help/information-on-medicare/      95
## # ... with 41 more rows

Average number of feedbacks per category is 43
Median is 20
Heavily skewed toward the top 10 categories.

For feasibility, it may be prudent to only seek to solve the most common pain points (for example, those with 80 or more comments).

Questions

Do clients generally know what needs to be fixed first?
Does NAVA to select the features to develop best serve the population?

80/20 Specific Analysis of Top 11 Categories

## # A tibble: 11 x 2
##                                                               URL Comment
##                                                             <chr>   <int>
##  1                           help/what-health-coverage-do-i-have/     213
##  2                  help/parent-and-caretaker-relative-questions/     162
##  3                                         help/add-other-income/     128
##  4 help/i-am-having-trouble-logging-in-to-my-marketplace-account/     114
##  5                                      help/deduction-questions/     108
##  6                                     help/automatic-enrollment/     107
##  7                                     help/disability-questions/     103
##  8                          help/found-not-eligible-for-medicaid/     101
##  9                                   help/losing-health-coverage/     101
## 10                                  help/information-on-medicare/      95
## 11                              help/reconciling-your-tax-credit/      89

Filtering by Categories with at least twice the average amount of comments
These 11 categories alone contain over 1,300 of the comments.

Specific Analysis, Within Groups

fundamental features that are not being explained adequately.
Example: Automatic Enrollment
the most common trigram is “how to cancel”. An easily identifiable user-story that is still difficult for some users.

count3[6,]

## # A tibble: 1 x 3
## # Groups:   URL [1]
##                          URL       trigram     n
##                        <chr>         <chr> <int>
## 1 help/automatic-enrollment/ how to cancel    15

Common Word Groupings

Most common word quadruplets among all categories:

Common Word Groupings

Most common word quadruplets among all categories:

## # A tibble: 35,181 x 2
##                          quadgram     n
##                             <chr> <int>
##  1 individual insurance non group    26
##  2   insurance non group coverage    25
##  3                  what to do if    19
##  4             how to answer this    17
##  5                it is not clear    15
##  6                 the end of the    14
##  7                   to do if you    14
##  8                i don't know if    13
##  9            it would be helpful    13
## 10        to answer this question    13
## # ... with 35,171 more rows

Specific Analysis, Within Groups

Looking at the broadest tested case: 4 and 5 grams

numeric benchmarks (19, 60) cause people to seek advice
coverage ending within 60 days
disabled children over the age of 19

For example, in the parent and caretaker questions several comments including that a feature is missing “there is no option for” or seek extra advice, “19 but…”

Specific Analysis, Within Groups

Most common 5-word groups in the parent, caretaker, relative category:

Specific Analysis of Top 11

Looking at the most common problems based on counting the different word pairs (or triplets, or quadruplets) we see a few things:

The comments show a clear lack of understanding, “how to”,“if you”,“how do i”,what to do if“…
People are commenting in the help section because the help mechanisms (whether they be FAQs or live chat, etc) aren’t working.
Users feel that there specific situations are unique enough to warrant specific instruction - “if…”

This is different than feeling like a feature should exist, but doesn’t or that an interface is too difficult to use.

Final Words

Things to consider with more time:

More data
All comments are from specific window during the Open Enrollment Period,
Healthcare management goes beyond setup

Final Words

More data on the commenters
Do certain Demographics have more issues with certain topics than others? Example: in “add-other-income” and “deduction”: HSA Contributions Lowering income
Strong local word groupings in rarely used categories
More NLP

Due to time constraints I decided against removing words or engaging in sentiment analysis, i.e. Do users give more “negative” feedback in certain categories compared to others?

AUTHOR’s NOTE: Detailed Presentation with full annotations and code are available.

Pain Points Analysis

Synopsis

Outline

Downloaded Data and General Summaries

Approach Reasoning - NLP

Common Feedback Categories

Common Feedback Stats

Questions

80/20 Specific Analysis of Top 11 Categories

Specific Analysis, Within Groups

Common Word Groupings

Common Word Groupings

Specific Analysis, Within Groups

Specific Analysis, Within Groups

Specific Analysis of Top 11

Final Words

Final Words