I started learning about tidyverse package for a while and trying to practice data manipulation by using real life data set. I found this dataset is a good example of untidy data that can be manipulated by pivot function of dplyr package, a adjunct package of tidyverse.So I write this article to provide an example of how to use pivot function for data manipulation. Enjoy.
Environment setup
library(tidyverse) ##call the tidyverse package.
bangkok_hospital <- read.csv("bangkok_hospital.csv") ##load the dataset into R.
#The dataset is brief summary of hospitals, which are under Bangkok government in Thailand.
names(bangkok_hospital)
## [1] "X.ปฟid_9" "id_5" "dcode" "name"
## [5] "name_eng" "address" "bed_54" "bed_55"
## [9] "bed_56" "bed_57" "bed_58" "bed_59"
## [13] "bed_60" "bed_61" "out_patients51" "in_patients51"
## [17] "out_patients52" "in_patients52" "out_patients53" "in_patients53"
## [21] "out_patients54" "in_patients54" "out_patients55" "in_patients55"
## [25] "out_patients56" "in_patients56" "out_patients57" "in_patients57"
## [29] "out_patients58" "in_patients58" "out_patients59" "in_patients59"
## [33] "out_patients60" "in_patients60" "out_patients61" "in_patients61"
The name column contain each hospital name in Thai while name_eng in English. The bed_xx column contain each hospital capacity in term of number of beds while a suffix number xx stand for the last two digit of Buddha year. The out_patientsxx and the in_patientsxx represent the number of patients visit each hospital as an OPD and IPD respectively.Like bed_xx a suffix number xx also stand for the last two digit of Buddha year.
As you can see this type of data call wide table which is not tidy data yet. Because they are more than one columns that contain same type of data. I will manipulate this dataset to tidy dataset by pivot_longer fuction from dplyr package.
##For convenient I cut irrelevant columns out and delete last row because missingness.
bangkok_hospital <- bangkok_hospital %>%
select(-c(X.ปฟid_9,id_5,dcode,name,address))
bangkok_hospital <- bangkok_hospital[-10,]
bangkok_hospital
## name_eng bed_54 bed_55 bed_56 bed_57
## 1 Charoen Krung Pracharak 427 402 402 408
## 2 Bangkok Metropolitan Administration General 412 412 415 455
## 3 Taksin 402 430 430 461
## 4 Sirindhorn 243 226 230 230
## 5 Ratchaphiphat 132 132 132 135
## 6 Venerable Thawisak Jutindharo 100 100 100 118
## 7 Wetchakarunrat 76 76 76 76
## 8 Bangkok Metropolitan Administration Lat Krabang 60 60 60 60
## 9 Bangkhunthian Geriatric 0 0 0 0
## bed_58 bed_59 bed_60 bed_61 out_patients51 in_patients51 out_patients52
## 1 408 434 434 434 566631 20324 618145
## 2 446 452 475 475 489150 15657 511416
## 3 461 473 473 473 499787 21054 545452
## 4 243 228 242 324 231022 11770 265414
## 5 136 151 151 162 149230 6408 160758
## 6 120 126 126 126 121433 4365 126441
## 7 106 106 106 106 145660 4246 156963
## 8 60 60 60 60 147178 4661 145028
## 9 0 0 0 0 0 0 0
## in_patients52 out_patients53 in_patients53 out_patients54 in_patients54
## 1 21453 668379 22657 715289 21386
## 2 15257 527599 16456 594398 18231
## 3 20746 558261 20531 558094 18762
## 4 13345 348777 18270 349718 17215
## 5 6505 187235 7855 244510 9895
## 6 3971 140159 4506 236434 5160
## 7 4735 177198 4862 190213 4546
## 8 4396 147412 3287 142619 3426
## 9 0 0 0 0 0
## out_patients55 in_patients55 out_patients56 in_patients56 out_patients57
## 1 691768 20062 602080 16608 640438
## 2 598603 16964 624882 17946 707005
## 3 514690 17567 716363 20229 720888
## 4 337886 18555 378320 18143 384489
## 5 244223 8619 273523 9411 297424
## 6 271499 4831 259683 4756 331241
## 7 200215 3771 200556 3875 213467
## 8 166425 3224 173339 3849 217037
## 9 0 0 3725 0 10518
## in_patients57 out_patients58 in_patients58 out_patients59 in_patients59
## 1 15615 768409 21132 780026 20848
## 2 19492 655320 16355 654305 16082
## 3 20221 713909 19811 755730 20147
## 4 17259 413554 17280 452915 16802
## 5 9655 317148 9421 309508 10311
## 6 4495 343680 4541 380812 4493
## 7 3829 219742 3950 254720 4790
## 8 3901 201485 3548 220691 3764
## 9 0 15261 0 28401 0
## out_patients60 in_patients60 out_patients61 in_patients61
## 1 743879 19734 743854 20710
## 2 655100 15426 681995 16637
## 3 734525 19653 770257 20306
## 4 469425 15518 554520 17852
## 5 319425 10745 341451 11373
## 6 373528 4945 365690 5049
## 7 264784 5176 270583 4988
## 8 224260 3588 218091 3690
## 9 35744 0 57754 0
##Define vectors which represent opd and ipd columns name.
opd <- c("out_patients51","out_patients52","out_patients53",
"out_patients54","out_patients55","out_patients56",
"out_patients57","out_patients58","out_patients59",
"out_patients60","out_patients61")
ipd<- c("in_patients51","in_patients52","in_patients53",
"in_patients54","in_patients55","in_patients56",
"in_patients57","in_patients58","in_patients59",
"in_patients60","in_patients61")
##Create a separate table about number of beds capacity of each hospitals.
##Then transform wide table to long table with pivot_longer function
##with new column name year.
##Finally use str_remove for remove string "bed_" out of year column,
##the output is two number which stand for Buddha year
bangkok_hospital_bed <- bangkok_hospital %>%
select(name_eng,bed_54:bed_61)%>%
pivot_longer(cols = bed_54:bed_61,
names_to = "year",
values_to = "size_beds")
bangkok_hospital_bed$year <-str_remove(bangkok_hospital_bed$year, "bed_")
bangkok_hospital_bed
## # A tibble: 72 x 3
## name_eng year size_beds
## <chr> <chr> <int>
## 1 Charoen Krung Pracharak 54 427
## 2 Charoen Krung Pracharak 55 402
## 3 Charoen Krung Pracharak 56 402
## 4 Charoen Krung Pracharak 57 408
## 5 Charoen Krung Pracharak 58 408
## 6 Charoen Krung Pracharak 59 434
## 7 Charoen Krung Pracharak 60 434
## 8 Charoen Krung Pracharak 61 434
## 9 Bangkok Metropolitan Administration General 54 412
## 10 Bangkok Metropolitan Administration General 55 412
## # ... with 62 more rows
##Apply previous steps to opd and ipd column
bangkok_hospital_opd <- bangkok_hospital %>%
select(name_eng,opd)%>%
pivot_longer(cols = opd,
names_to = "year",
names_repair = "unique",
values_to = "opd_visit")
bangkok_hospital_opd$year <-str_remove(bangkok_hospital_opd$year,"out_patients")
bangkok_hospital_opd
## # A tibble: 99 x 3
## name_eng year opd_visit
## <chr> <chr> <int>
## 1 Charoen Krung Pracharak 51 566631
## 2 Charoen Krung Pracharak 52 618145
## 3 Charoen Krung Pracharak 53 668379
## 4 Charoen Krung Pracharak 54 715289
## 5 Charoen Krung Pracharak 55 691768
## 6 Charoen Krung Pracharak 56 602080
## 7 Charoen Krung Pracharak 57 640438
## 8 Charoen Krung Pracharak 58 768409
## 9 Charoen Krung Pracharak 59 780026
## 10 Charoen Krung Pracharak 60 743879
## # ... with 89 more rows
bangkok_hospital_ipd<- bangkok_hospital %>%
select(name_eng,ipd)%>%
pivot_longer(cols = ipd,
names_to = "year",
names_repair = "minimal",
values_to = "ipd_visit")
bangkok_hospital_ipd$year<- str_remove(bangkok_hospital_ipd$year,"in_patients")
bangkok_hospital_ipd
## # A tibble: 99 x 3
## name_eng year ipd_visit
## <chr> <chr> <int>
## 1 Charoen Krung Pracharak 51 20324
## 2 Charoen Krung Pracharak 52 21453
## 3 Charoen Krung Pracharak 53 22657
## 4 Charoen Krung Pracharak 54 21386
## 5 Charoen Krung Pracharak 55 20062
## 6 Charoen Krung Pracharak 56 16608
## 7 Charoen Krung Pracharak 57 15615
## 8 Charoen Krung Pracharak 58 21132
## 9 Charoen Krung Pracharak 59 20848
## 10 Charoen Krung Pracharak 60 19734
## # ... with 89 more rows
##Join three table together.
opd_ipd <- bangkok_hospital_opd %>%
inner_join(bangkok_hospital_ipd,by =c("name_eng","year"))
bangkok_hospital_tidy <- opd_ipd %>%
left_join(bangkok_hospital_bed,by =c("name_eng","year"))
##Now I got tidy data
bangkok_hospital_tidy
## # A tibble: 99 x 5
## name_eng year opd_visit ipd_visit size_beds
## <chr> <chr> <int> <int> <int>
## 1 Charoen Krung Pracharak 51 566631 20324 NA
## 2 Charoen Krung Pracharak 52 618145 21453 NA
## 3 Charoen Krung Pracharak 53 668379 22657 NA
## 4 Charoen Krung Pracharak 54 715289 21386 427
## 5 Charoen Krung Pracharak 55 691768 20062 402
## 6 Charoen Krung Pracharak 56 602080 16608 402
## 7 Charoen Krung Pracharak 57 640438 15615 408
## 8 Charoen Krung Pracharak 58 768409 21132 408
## 9 Charoen Krung Pracharak 59 780026 20848 434
## 10 Charoen Krung Pracharak 60 743879 19734 434
## # ... with 89 more rows
##For more tidy just turn Buddha year into CE
#Change data class form chr to number for computation
bangkok_hospital_tidy$year<-as.numeric(bangkok_hospital_tidy$year)
bangkok_hospital_tidy <- bangkok_hospital_tidy %>%
mutate("CE_year"= year+2500-543)%>%
select(-year)%>%
relocate(CE_year,.after=name_eng)
##Done
bangkok_hospital_tidy
## # A tibble: 99 x 5
## name_eng CE_year opd_visit ipd_visit size_beds
## <chr> <dbl> <int> <int> <int>
## 1 Charoen Krung Pracharak 2008 566631 20324 NA
## 2 Charoen Krung Pracharak 2009 618145 21453 NA
## 3 Charoen Krung Pracharak 2010 668379 22657 NA
## 4 Charoen Krung Pracharak 2011 715289 21386 427
## 5 Charoen Krung Pracharak 2012 691768 20062 402
## 6 Charoen Krung Pracharak 2013 602080 16608 402
## 7 Charoen Krung Pracharak 2014 640438 15615 408
## 8 Charoen Krung Pracharak 2015 768409 21132 408
## 9 Charoen Krung Pracharak 2016 780026 20848 434
## 10 Charoen Krung Pracharak 2017 743879 19734 434
## # ... with 89 more rows