Health insurance companies which manage medicaid and medicare are required to report quality measures to government agency. These qaulity measures are important for the companies rating and millions of dollars are spent to improve them. Also, the higher the quality the more likely a consumer will choose a particular health insurance plan. This will has very visible impact on the revenue of the health insurance companies.
The following analysis examine the relationship of a health plan’s quality rating and its membership count. The data source is open source goverment data from this URL (https://www.healthdata.gov/dataset/managed-care-regional-consumer-guide)
Two separate files are downloaded, merged and aggregated to produce a final csv file.
Step 1. Read in the raw csv data file.
filepath <- "https://github.com/angus001/Data605/raw/master/HealthPlanQuality_Raw_20171108.csv"
healthquality <- read.csv(filepath, header = T, sep = ",")
Step 2. Check a sample of the data read in.
head(healthquality)
## Plan.ID Plan.Name Avg..Domain.Rating Plan.MemberCount
## 1 1040678 Univera Healthcare 3.094 971664
## 2 1050178 HIP (EmblemHealth) 2.593 79269965
## 3 1070680 Independent Health 3.344 3180056
## 4 1080383 MVP Health Care 3.306 26087568
## 5 1090384 CDPHP 3.903 27137232
## 6 1130185 MetroPlus Health Plan 3.525 16864460
Step 3. Plot a histogram of the membercount by health plans.
hist(healthquality$Plan.MemberCount)
Step 4. Produce a linear regression model.
lmmodel <-lm(healthquality$Avg..Domain.Rating ~ healthquality$Plan.MemberCount)
summary(lmmodel)
##
## Call:
## lm(formula = healthquality$Avg..Domain.Rating ~ healthquality$Plan.MemberCount)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.3226 -0.2747 0.0831 0.3929 1.1086
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.753e+00 1.768e-01 15.571 1.21e-12 ***
## healthquality$Plan.MemberCount 1.528e-09 1.702e-09 0.898 0.38
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6243 on 20 degrees of freedom
## Multiple R-squared: 0.03873, Adjusted R-squared: -0.00933
## F-statistic: 0.8059 on 1 and 20 DF, p-value: 0.38
Step 5. Create a scatter plot and draw the regression line based on the model in step 4. Notice the curly bracket to group the codes together
{plot(healthquality$Plan.MemberCount, healthquality$Avg..Domain.Rating, col = 'navyblue', pch = 16, # cex =1.3,
xlab = "Member Counts", ylab = "Quality Rating")
abline(lmmodel, col="red")}
Notes The following is an example of choosing color options in plot.
colors()[grep("blue",colors())]
## [1] "aliceblue" "blue" "blue1"
## [4] "blue2" "blue3" "blue4"
## [7] "blueviolet" "cadetblue" "cadetblue1"
## [10] "cadetblue2" "cadetblue3" "cadetblue4"
## [13] "cornflowerblue" "darkblue" "darkslateblue"
## [16] "deepskyblue" "deepskyblue1" "deepskyblue2"
## [19] "deepskyblue3" "deepskyblue4" "dodgerblue"
## [22] "dodgerblue1" "dodgerblue2" "dodgerblue3"
## [25] "dodgerblue4" "lightblue" "lightblue1"
## [28] "lightblue2" "lightblue3" "lightblue4"
## [31] "lightskyblue" "lightskyblue1" "lightskyblue2"
## [34] "lightskyblue3" "lightskyblue4" "lightslateblue"
## [37] "lightsteelblue" "lightsteelblue1" "lightsteelblue2"
## [40] "lightsteelblue3" "lightsteelblue4" "mediumblue"
## [43] "mediumslateblue" "midnightblue" "navyblue"
## [46] "powderblue" "royalblue" "royalblue1"
## [49] "royalblue2" "royalblue3" "royalblue4"
## [52] "skyblue" "skyblue1" "skyblue2"
## [55] "skyblue3" "skyblue4" "slateblue"
## [58] "slateblue1" "slateblue2" "slateblue3"
## [61] "slateblue4" "steelblue" "steelblue1"
## [64] "steelblue2" "steelblue3" "steelblue4"