Task

We are to choose any three of the “wide” datasets identified in the Week 5 Discussion items and for each:

  • Create a .CSV file that includes all of the information included in the dataset. We’re encouraged to use a “wide” stucture similar to how th information appears in the discussion item, so we can practice tidying and transformations.
  • Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data.
  • Perform the analysis requested in the discussion item.
  • Code should be in an R Markdown file, posted to rpubs.com, and should include narrative dscriptions of your dataa clean up work, analysis and conclusions.

Introduction

For my second dataset, I decided to take a look at hospitals in the tri-state area(NY, NJ, CT). I would like to be able to determine which state has the best hospitals overall as well as which hosptals have the top ratings.

Hypothesis

My hypothesis is that New Jersey will have the best overall ratings as well as HCAHPS score.

Loading data and required libraries

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following objects are masked from 'package:plyr':
## 
##     arrange, mutate, rename, summarise
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
Since we are only concerned about the hospitals located in the Tri-State area, our first task is to create a subset that will only include those states. Our main focus is going to be exploring how well a hospital is rated so we will only look at their “achievement points”. I would also like to see which category has the highest correlation to HCAHPS scores.
## [1] TRUE
##                                               NAME STATE COMM.NURSES
## 35                                BRISTOL HOSPITAL    CT           1
## 110                     CLARA MAASS MEDICAL CENTER    NJ           0
## 111 UNIV MEDICAL CENTER OF PRINCETON AT PLAINSBORO    NJ           0
## 112            VIRTUA WEST JERSEY HOSPITALS BERLIN    NJ           7
## 113                       RIVERVIEW MEDICAL CENTER    NJ           2
## 114        ROBERT WOOD JOHNSON UNIVERSITY HOSPITAL    NJ           1
##     COMM.DOCTORS RESPONSIVENESS PAIN.MAN COMM.MEDS ENVIRONMENT DISCHARGE
## 35             0              1        1         0           0         6
## 110            0              0        0         0           0         0
## 111            0              0        0         0           4         0
## 112            0              4        3         6           1         2
## 113            0              0        3         2           0         2
## 114            0              1        0         0           0         0
##     OVERALL HCAHPS.BASE HCAHPS.CON
## 35        0          10         16
## 110       0           3         13
## 111       3          11         17
## 112       4          28         19
## 113       1          13         18
## 114       1           5         14
Before I can do anything mathematically, I need to change the columns from character to numeric. We include the parameter na.rm =TRUE to remove the NA’s since we don’t want them to affect our calculations.
I’ll be using the dplyr and plyr packages to change around our data.
I noticed that there were a few entires where all columns are 0’s. I would like to remove those entries.
I am now at the point whereby my data looks pretty good and I am able to do some analysis and calculations.

Now to see which state has the best ratings overall.
##   STATES AVERAGE
## 1     CT    1.92
## 2     NJ    1.20
## 3     NY    1.56
##   STATES SCORE
## 1     CT 32.25
## 2     NJ 29.00
## 3     NY 30.30

Conclusion

We can conclude that Connecticut has the best hospitals in the Tri-State area based on both customer feedback as well as HCAHPS ratings. The best hospital in the Tri-State area is ADIRONDACK MEDICAL CENTER with a HCAHPS score of 89 out of 100 and an average rating of 8.86 out of 10.