I started learning about tidyverse package for a while and trying to practice data manipulation by using real life data set. I found this dataset is a good example of untidy data that can be manipulated by pivot function of dplyr package, a adjunct package of tidyverse.So I write this article to provide an example of how to use pivot function for data manipulation. Enjoy.

Environment setup

library(tidyverse) ##call the tidyverse package.
bangkok_hospital <- read.csv("bangkok_hospital.csv") ##load the dataset into R.
#The dataset is brief summary of hospitals, which are under Bangkok government in Thailand. 
names(bangkok_hospital)
##  [1] "X.ปฟid_9"       "id_5"           "dcode"          "name"          
##  [5] "name_eng"       "address"        "bed_54"         "bed_55"        
##  [9] "bed_56"         "bed_57"         "bed_58"         "bed_59"        
## [13] "bed_60"         "bed_61"         "out_patients51" "in_patients51" 
## [17] "out_patients52" "in_patients52"  "out_patients53" "in_patients53" 
## [21] "out_patients54" "in_patients54"  "out_patients55" "in_patients55" 
## [25] "out_patients56" "in_patients56"  "out_patients57" "in_patients57" 
## [29] "out_patients58" "in_patients58"  "out_patients59" "in_patients59" 
## [33] "out_patients60" "in_patients60"  "out_patients61" "in_patients61"

The name column contain each hospital name in Thai while name_eng in English. The bed_xx column contain each hospital capacity in term of number of beds while a suffix number xx stand for the last two digit of Buddha year. The out_patientsxx and the in_patientsxx represent the number of patients visit each hospital as an OPD and IPD respectively.Like bed_xx a suffix number xx also stand for the last two digit of Buddha year.

As you can see this type of data call wide table which is not tidy data yet. Because they are more than one columns that contain same type of data. I will manipulate this dataset to tidy dataset by pivot_longer fuction from dplyr package.

##For convenient I cut irrelevant columns out and delete last row because missingness.
bangkok_hospital <- bangkok_hospital %>%
                    select(-c(X.ปฟid_9,id_5,dcode,name,address))
bangkok_hospital <- bangkok_hospital[-10,]
bangkok_hospital
##                                          name_eng bed_54 bed_55 bed_56 bed_57
## 1                         Charoen Krung Pracharak    427    402    402    408
## 2     Bangkok Metropolitan Administration General    412    412    415    455
## 3                                          Taksin    402    430    430    461
## 4                                      Sirindhorn    243    226    230    230
## 5                                   Ratchaphiphat    132    132    132    135
## 6                   Venerable Thawisak Jutindharo    100    100    100    118
## 7                                  Wetchakarunrat     76     76     76     76
## 8 Bangkok Metropolitan Administration Lat Krabang     60     60     60     60
## 9                         Bangkhunthian Geriatric      0      0      0      0
##   bed_58 bed_59 bed_60 bed_61 out_patients51 in_patients51 out_patients52
## 1    408    434    434    434         566631         20324         618145
## 2    446    452    475    475         489150         15657         511416
## 3    461    473    473    473         499787         21054         545452
## 4    243    228    242    324         231022         11770         265414
## 5    136    151    151    162         149230          6408         160758
## 6    120    126    126    126         121433          4365         126441
## 7    106    106    106    106         145660          4246         156963
## 8     60     60     60     60         147178          4661         145028
## 9      0      0      0      0              0             0              0
##   in_patients52 out_patients53 in_patients53 out_patients54 in_patients54
## 1         21453         668379         22657         715289         21386
## 2         15257         527599         16456         594398         18231
## 3         20746         558261         20531         558094         18762
## 4         13345         348777         18270         349718         17215
## 5          6505         187235          7855         244510          9895
## 6          3971         140159          4506         236434          5160
## 7          4735         177198          4862         190213          4546
## 8          4396         147412          3287         142619          3426
## 9             0              0             0              0             0
##   out_patients55 in_patients55 out_patients56 in_patients56 out_patients57
## 1         691768         20062         602080         16608         640438
## 2         598603         16964         624882         17946         707005
## 3         514690         17567         716363         20229         720888
## 4         337886         18555         378320         18143         384489
## 5         244223          8619         273523          9411         297424
## 6         271499          4831         259683          4756         331241
## 7         200215          3771         200556          3875         213467
## 8         166425          3224         173339          3849         217037
## 9              0             0           3725             0          10518
##   in_patients57 out_patients58 in_patients58 out_patients59 in_patients59
## 1         15615         768409         21132         780026         20848
## 2         19492         655320         16355         654305         16082
## 3         20221         713909         19811         755730         20147
## 4         17259         413554         17280         452915         16802
## 5          9655         317148          9421         309508         10311
## 6          4495         343680          4541         380812          4493
## 7          3829         219742          3950         254720          4790
## 8          3901         201485          3548         220691          3764
## 9             0          15261             0          28401             0
##   out_patients60 in_patients60 out_patients61 in_patients61
## 1         743879         19734         743854         20710
## 2         655100         15426         681995         16637
## 3         734525         19653         770257         20306
## 4         469425         15518         554520         17852
## 5         319425         10745         341451         11373
## 6         373528          4945         365690          5049
## 7         264784          5176         270583          4988
## 8         224260          3588         218091          3690
## 9          35744             0          57754             0
##Define vectors which represent opd and ipd columns name.
opd <- c("out_patients51","out_patients52","out_patients53",
         "out_patients54","out_patients55","out_patients56",
         "out_patients57","out_patients58","out_patients59",
         "out_patients60","out_patients61")

ipd<- c("in_patients51","in_patients52","in_patients53",
        "in_patients54","in_patients55","in_patients56",
        "in_patients57","in_patients58","in_patients59",
        "in_patients60","in_patients61")


##Create a separate table about number of beds capacity of each hospitals.
##Then transform wide table to long table with pivot_longer function
##with new column name year.
##Finally use str_remove for remove string "bed_" out of year column, 
##the output is two number which stand for Buddha year
bangkok_hospital_bed <- bangkok_hospital %>%
    select(name_eng,bed_54:bed_61)%>%
    pivot_longer(cols = bed_54:bed_61,
                 names_to = "year",
                 values_to = "size_beds")
bangkok_hospital_bed$year <-str_remove(bangkok_hospital_bed$year, "bed_")
bangkok_hospital_bed
## # A tibble: 72 x 3
##    name_eng                                    year  size_beds
##    <chr>                                       <chr>     <int>
##  1 Charoen Krung Pracharak                     54          427
##  2 Charoen Krung Pracharak                     55          402
##  3 Charoen Krung Pracharak                     56          402
##  4 Charoen Krung Pracharak                     57          408
##  5 Charoen Krung Pracharak                     58          408
##  6 Charoen Krung Pracharak                     59          434
##  7 Charoen Krung Pracharak                     60          434
##  8 Charoen Krung Pracharak                     61          434
##  9 Bangkok Metropolitan Administration General 54          412
## 10 Bangkok Metropolitan Administration General 55          412
## # ... with 62 more rows
##Apply previous steps to opd and ipd column
bangkok_hospital_opd <- bangkok_hospital %>% 
    select(name_eng,opd)%>%
    pivot_longer(cols = opd, 
                 names_to = "year",
                 names_repair = "unique",
                 values_to = "opd_visit")
bangkok_hospital_opd$year <-str_remove(bangkok_hospital_opd$year,"out_patients")
bangkok_hospital_opd
## # A tibble: 99 x 3
##    name_eng                year  opd_visit
##    <chr>                   <chr>     <int>
##  1 Charoen Krung Pracharak 51       566631
##  2 Charoen Krung Pracharak 52       618145
##  3 Charoen Krung Pracharak 53       668379
##  4 Charoen Krung Pracharak 54       715289
##  5 Charoen Krung Pracharak 55       691768
##  6 Charoen Krung Pracharak 56       602080
##  7 Charoen Krung Pracharak 57       640438
##  8 Charoen Krung Pracharak 58       768409
##  9 Charoen Krung Pracharak 59       780026
## 10 Charoen Krung Pracharak 60       743879
## # ... with 89 more rows
bangkok_hospital_ipd<- bangkok_hospital %>% 
    select(name_eng,ipd)%>%
    pivot_longer(cols =  ipd,
                 names_to = "year",
                 names_repair = "minimal",
                 values_to = "ipd_visit")
bangkok_hospital_ipd$year<- str_remove(bangkok_hospital_ipd$year,"in_patients")
bangkok_hospital_ipd
## # A tibble: 99 x 3
##    name_eng                year  ipd_visit
##    <chr>                   <chr>     <int>
##  1 Charoen Krung Pracharak 51        20324
##  2 Charoen Krung Pracharak 52        21453
##  3 Charoen Krung Pracharak 53        22657
##  4 Charoen Krung Pracharak 54        21386
##  5 Charoen Krung Pracharak 55        20062
##  6 Charoen Krung Pracharak 56        16608
##  7 Charoen Krung Pracharak 57        15615
##  8 Charoen Krung Pracharak 58        21132
##  9 Charoen Krung Pracharak 59        20848
## 10 Charoen Krung Pracharak 60        19734
## # ... with 89 more rows
##Join three table together.
opd_ipd <- bangkok_hospital_opd %>%
                inner_join(bangkok_hospital_ipd,by =c("name_eng","year"))

bangkok_hospital_tidy <- opd_ipd %>%
                        left_join(bangkok_hospital_bed,by =c("name_eng","year"))

##Now I got tidy data
bangkok_hospital_tidy
## # A tibble: 99 x 5
##    name_eng                year  opd_visit ipd_visit size_beds
##    <chr>                   <chr>     <int>     <int>     <int>
##  1 Charoen Krung Pracharak 51       566631     20324        NA
##  2 Charoen Krung Pracharak 52       618145     21453        NA
##  3 Charoen Krung Pracharak 53       668379     22657        NA
##  4 Charoen Krung Pracharak 54       715289     21386       427
##  5 Charoen Krung Pracharak 55       691768     20062       402
##  6 Charoen Krung Pracharak 56       602080     16608       402
##  7 Charoen Krung Pracharak 57       640438     15615       408
##  8 Charoen Krung Pracharak 58       768409     21132       408
##  9 Charoen Krung Pracharak 59       780026     20848       434
## 10 Charoen Krung Pracharak 60       743879     19734       434
## # ... with 89 more rows
##For more tidy just turn Buddha year into CE
#Change data class form chr to number for computation
bangkok_hospital_tidy$year<-as.numeric(bangkok_hospital_tidy$year)
bangkok_hospital_tidy <- bangkok_hospital_tidy %>%
                          mutate("CE_year"= year+2500-543)%>%
                            select(-year)%>%
                             relocate(CE_year,.after=name_eng)
##Done
bangkok_hospital_tidy
## # A tibble: 99 x 5
##    name_eng                CE_year opd_visit ipd_visit size_beds
##    <chr>                     <dbl>     <int>     <int>     <int>
##  1 Charoen Krung Pracharak    2008    566631     20324        NA
##  2 Charoen Krung Pracharak    2009    618145     21453        NA
##  3 Charoen Krung Pracharak    2010    668379     22657        NA
##  4 Charoen Krung Pracharak    2011    715289     21386       427
##  5 Charoen Krung Pracharak    2012    691768     20062       402
##  6 Charoen Krung Pracharak    2013    602080     16608       402
##  7 Charoen Krung Pracharak    2014    640438     15615       408
##  8 Charoen Krung Pracharak    2015    768409     21132       408
##  9 Charoen Krung Pracharak    2016    780026     20848       434
## 10 Charoen Krung Pracharak    2017    743879     19734       434
## # ... with 89 more rows