Vignette Title

Karma Gyatso

2022-10-30

library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
#> ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
#> ✔ tibble  3.1.7     ✔ dplyr   1.0.9
#> ✔ tidyr   1.2.0     ✔ stringr 1.4.0
#> ✔ readr   2.1.2     ✔ forcats 0.5.2
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
library(ggplot2)

Loading data from github, which was downloaded from Kaggle. classifying the dataset type and printing all the names of the variables.

patient_records_test <- read.csv("https://raw.githubusercontent.com/karmaggyatso/CUNY_SPS/main/Github_data607/tidyverse_assignment/Train-1617360447408-1660719685476.csv")
class(patient_records_test)
#> [1] "data.frame"
names(patient_records_test)
#>  [1] "index"              "encounter_id"       "patient_id"        
#>  [4] "race"               "gender"             "age"               
#>  [7] "weight"             "time_in_hospital"   "medical_specialty" 
#> [10] "num_lab_procedures" "num_procedures"     "num_medications"   
#> [13] "number_outpatient"  "number_emergency"   "number_inpatient"  
#> [16] "diag_1"             "diag_2"             "diag_3"            
#> [19] "diag_4"             "diag_5"             "number_diagnoses"  
#> [22] "X1"                 "X2"                 "X3"                
#> [25] "X4"                 "X5"                 "X6"                
#> [28] "X7"                 "X8"                 "X9"                
#> [31] "X10"                "X11"                "X12"               
#> [34] "X13"                "X14"                "X15"               
#> [37] "X16"                "X17"                "X18"               
#> [40] "X19"                "X20"                "X21"               
#> [43] "X22"                "X23"                "X24"               
#> [46] "X25"                "change"             "diabetesMed"       
#> [49] "readmitted"

Research question

What is the main factor of patient readmission in the hospital? Are the treatment and age group the reason for readmission? What are the chances of readmission based on medical treatment and age group?

Cases

Each case represents individual patient who went to hospital for treatment. There are 66,587 observations

dim(patient_records_test)
#> [1] 66587    49

Resources

Data source cited in MLA 9th edition + Vutukuri, Girish. “Hospital_Administration.” Hospital_Administration | Kaggle, www.kaggle.com/datasets/girishvutukuri/hospital-administration/code?resource=download. Accessed 30 Oct. 2022.

Analysis

Data cleaning There is a data with medical_treatment as “?”. Since we are not sure what sort of medical_treatement it is, we are omitting the records in the new variable. We’re grouping the type of medical_treatement and counting how many patient has readmitted.

hospital_readmission_by_treatment <- patient_records_test %>% 
  filter(readmitted == 1) %>%
  filter(medical_specialty != "?") %>%
  group_by(medical_specialty) %>%
  count() %>% 
  arrange(desc(n))


hospital_readmission_by_treatment
#> # A tibble: 58 × 2
#> # Groups:   medical_specialty [58]
#>    medical_specialty              n
#>    <chr>                      <int>
#>  1 InternalMedicine            4183
#>  2 Emergency/Trauma            2503
#>  3 Family/GeneralPractice      2363
#>  4 Cardiology                  1489
#>  5 Surgery-General              898
#>  6 Nephrology                   605
#>  7 Radiologist                  339
#>  8 Orthopedics                  305
#>  9 Pulmonology                  275
#> 10 Orthopedics-Reconstructive   245
#> # … with 48 more rows
#> # ℹ Use `print(n = ...)` to see more rows

A graphical representation of above code.

ggplot(hospital_readmission_by_treatment, aes(x = medical_specialty, y = n)) +
  geom_bar(stat = "identity") +
  theme(text = element_text(size = 6),element_line(size = 2)) +
  coord_flip() 

We also have to consider the age as an factor. So, I am grouping the age and counting total readmission.


hospital_readmission_by_age <- patient_records_test %>% 
  filter(readmitted == 1) %>%
  group_by(age) %>%
  count() %>% 
  arrange(desc(n))

hospital_readmission_by_age
#> # A tibble: 10 × 2
#> # Groups:   age [10]
#>    age          n
#>    <chr>    <int>
#>  1 [70-80)   8179
#>  2 [60-70)   6860
#>  3 [80-90)   5349
#>  4 [50-60)   5129
#>  5 [40-50)   2814
#>  6 [30-40)   1016
#>  7 [90-100)   697
#>  8 [20-30)    520
#>  9 [10-20)    178
#> 10 [0-10)      22

A graphical representation of above code.

ggplot(hospital_readmission_by_age, aes(x = age, y = n)) +
  geom_bar(stat = "identity") + 
  labs(title= "Hospital readmission by age",
       x="age_group",y="total count")+
  coord_flip()

Conclusion:

Those patients who had visited hospital for internalMedicine as a medical_treatment are most readmitted. Also the age group between 80-90 are also readmitted to the hospital. Further analysis can be done on age with most medical_treatment applied.

I have used the tidyverse,dplyr fucntions such as piping fuction, filter and group_by function.