The data I want to use for my final project is from the New York State Department of Health and can be found here:

https://health.data.ny.gov/dataset/Hospital-Inpatient-Discharges-SPARCS-De-Identified/22g3-z7e7.

The report contains discharge level detail on patient characteristics, diagnoses, treatments, services, and charges. Variables for race, gender, and geolocation are also included.

Variable Names:

##  [1] "hospital_service_area"          "hospital_county"               
##  [3] "operating_certificate_number"   "permanent_facility_id"         
##  [5] "facility_name"                  "age_group"                     
##  [7] "zip_code_3_digits"              "gender"                        
##  [9] "race"                           "ethnicity"                     
## [11] "length_of_stay"                 "type_of_admission"             
## [13] "patient_disposition"            "discharge_year"                
## [15] "ccs_diagnosis_code"             "ccs_diagnosis_description"     
## [17] "ccs_procedure_code"             "ccs_procedure_description"     
## [19] "apr_drg_code"                   "apr_drg_description"           
## [21] "apr_mdc_code"                   "apr_mdc_description"           
## [23] "apr_severity_of_illness_code"   "apr_severity_of_illness"       
## [25] "apr_risk_of_mortality"          "apr_medical_surgical"          
## [27] "payment_typology_1"             "payment_typology_2"            
## [29] "payment_typology_3"             "abortion_edit_indicator"       
## [31] "emergency_department_indicator" "total_charges"                 
## [33] "total_costs"                    "birth_weight"

Sample of data

Project Description

I intend to create an interactive graphic that shows top hospital providers cost in New York City and drilldowns which allows the user to choose types of admissions, procedures and diagnosis categories. I imagine line graphs or barcharts showing a high-cost trending procedure and providers. Each of those selection options could be allowed in separate graphs or on separate tabs. I’d like to include filters for DRG description and age.

Technologies

Initialy I wanted to create a shiny app in R using plotly for the interactive dropdown options, but I’ve run into a problem where my app couldn’t handle the dataset size which is 900MB. Instead of reducing it, I’ve decided to switch to Python and create true web app designed to display information directly from Socrata API. I intend to use plotly to create interactive graphics that allows user to select from dropdown options.

Relevance

As a Healthcare Analyst working for New York’s premier healthcare company, I often work on healthcare cost drilldowns. I believe that Cost Visualization tool can bring major positive changes to my company analytics department.