Data Analysis of Medication Prescriptions

How it starts

This pet project started from a desire to track what my daughter’s doctors were prescribing her. My daughter Hayoon spent most of her days in daycare and would often end up catching various viruses or developing skin issues. Sometimes she needed to visit the doctor once or twice each week. In Korea, it is incredibly easy to and convenient to visit any doctor or clinic of your choice, but there is never enough time to deeply consult with medical professionals concerning the health issues that brought you in for their care. Each time we left the doctor’s office with a new prescription and I began to wonder exactly what kind of medicine was being prescribed. Even though I asked the doctors to share my daughter’s EMR data with me, I could not get them to share a digital version, so I needed to collect all of her paper prescriptions and type them into a spreadsheet by hand.

Questions

What diseases were afflicting her?
What medicines were being prescribed?
What are the trends in medicine type and amount?

Procedures of Visual information

I collected all medication prescriptions and analyzed the data to see what the relation was between disease codes and medications prescribed. I first collected the data by hand then analyzed and visualized them with R programming.

The project led me through the whole process of data analysis from collecting raw data, processing the data for analyzing, actual analysis and visualization. It encouraged me to join an R community and study statistics and R programming language.

Also, this personal project revealed to me the issue of health autonomy and medical data ownership when I was not allowed to access to my own daughter’s medical data. It showed me the true value of data, and led me to believe the individual should have some rights over it. It beckons me to learn more about the data market and data technology of the healthcare industry. It even led me to learn more about medical data saving controversies that erupted in Korea and about an uptick in medicinal overdoses due to “medical shopping”.

Data collection

Whenever one gets a prescription in Korea, they are always presented with two copies. One is for the patient and the other is for the drug store. I collected almost a year’s worth of paper prescription from my daughter’s doctor visits and transcribed the data into a spreadsheet.

1. 데이터 수집: 처방전 인쇄물 디지털 DB화

질병종류: 질병분류기호
의사종류: 선생님 이름 4-5명
약의 종류: 약 이름으로 약 고유번호 검색

I used ten variables for column headers, with key variables being “disease code” and “medicine name” to verify the existence of a correlation.

Data wrangling and analysis

Making a chart requires knowledge of statistics, and making them attractive requires visual literacy and graphic skills. I chose R programming language so that I could use a diversity of charts to make beautiful visualizations. Using graphic tools to create charts is both slow and imprecise. By using R, I could become more familiar with statistics and programming at the same time.

2. 데이터 분석

기초 통계
EDA: 시각화
블로깅으로 과정과 시행착오 정리

Data visualization

Before compiling the data, I had no idea what to ask or suspect. While wrangling the data, questions began to arise like why a certain diagnosis occurred so frequently, or the different prescribing habits of each doctor. In making these charts, not only were trends revealed, but it also brought new thoughts and questions to mind, deepening my desire to learn more.

필요한 패키지

readxl 패키지: 엑셀 파일 불러오기
tidyverse 패키지: ggplot2 포함해 tidya data를 만들기 위한 해들리 위컴의 패키지

library(readxl)

Warning: package 'readxl' was built under R version 3.5.2

library(tidyverse)

─ Attaching packages ────────────────────────────────────── tidyverse 1.2.1 ─

✔ ggplot2 3.2.1     ✔ purrr   0.2.5
✔ tibble  2.1.3     ✔ dplyr   0.8.3
✔ tidyr   0.8.1     ✔ stringr 1.3.1
✔ readr   1.1.1     ✔ forcats 0.3.0

Warning: package 'ggplot2' was built under R version 3.5.2

Warning: package 'tibble' was built under R version 3.5.2

Warning: package 'dplyr' was built under R version 3.5.2

─ Conflicts ──────────────────────────────────────── tidyverse_conflicts() ─
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

폰트 설정 패키지, 폰트설정

par(family="AppleGothic")

*엑셀 데이터 가져오기

hayoon<-read_excel("hayoon_hist.xlsx")

New names:
* `` -> ...13
* `` -> ...14
* `` -> ...15
* `` -> ...16
* `` -> ...17
* … and 3 more problems

str(hayoon)

Classes 'tbl_df', 'tbl' and 'data.frame':   156 obs. of  20 variables:
 $ date       : chr  "20190912" "20190912" "20190912" "20190912" ...
 $ persc      : chr  "04854호" "04854호" "04854호" "04854호" ...
 $ hosp       : chr  "아주맑은소아청소년과의원" "아주맑은소아청소년과의원" "아주맑은소아청소년과의원" "아주맑은소아청소년과의원" ...
 $ disease    : chr  "J303, R093" "J303, R093" "J303, R093" "J303, R093" ...
 $ doc        : chr  "김병기" "김병기" "김병기" "김병기" ...
 $ medic      : num  6.49e+08 6.44e+08 6.46e+08 6.46e+08 6.44e+08 ...
 $ medic_name : chr  "디푸루칸건조시럽" "레브로콜정" "코비안에스시럽" "씨투스건조시럽" ...
 $ once       : chr  "4" "0.5" "4" "0.33329999999999999" ...
 $ one_day    : num  1 3 3 3 3 3 3 3 3 3 ...
 $ total_count: num  4 4 4 4 4 3 5 5 5 5 ...
 $ no_ins     : chr  NA NA NA NA ...
 $ double     : chr  NA NA NA NA ...
 $ ...13      : logi  NA NA NA NA NA NA ...
 $ ...14      : logi  NA NA NA NA NA NA ...
 $ ...15      : logi  NA NA NA NA NA NA ...
 $ ...16      : logi  NA NA NA NA NA NA ...
 $ ...17      : logi  NA NA NA NA NA NA ...
 $ ...18      : logi  NA NA NA NA NA NA ...
 $ ...19      : logi  NA NA NA NA NA NA ...
 $ ...20      : logi  NA NA NA NA NA NA ...

#단순 통계 
names(hayoon) #칼럼명만 보기

 [1] "date"        "persc"       "hosp"        "disease"     "doc"        
 [6] "medic"       "medic_name"  "once"        "one_day"     "total_count"
[11] "no_ins"      "double"      "...13"       "...14"       "...15"      
[16] "...16"       "...17"       "...18"       "...19"       "...20"

dim(hayoon) # 행렬 개수

[1] 156  20

# group_by()와 같은 결과, excel의 countif()함수
table(hayoon$disease)


 H6501, J209  H6501, J459 H6501, L2088  H6641, J209  H6641, J303 
           6           11            7            5           22 
  J180, J459   J209, J060   J209, J303   J209, J459   J303, R093 
           6           10            9            6           12 
  J459, J459   L209, J209   L509, L039   R113, A090 
          33           17            6            4

*날짜를 Data값으로 변환하기 “data” 명칭 있는 칼럼 찾기

library(stringr) #문자열 패턴 찾기  패키지 
# date 란 단어가 있는 칼럼 모두 찾기 
date_cols <- str_detect(names(hayoon), "date")
date_cols

 [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

lubridate: charactor를 날짜로 변환하는 패키지 ymd: year-month-date롤 변환

library(lubridate)  #chr를 날짜로 변환하는 패키지


Attaching package: 'lubridate'

The following object is masked from 'package:base':

    date

# 하윤 데이터 칼럼에 date란 단어가 든 칼럼 날짜로 변환하기 
hayoon[ ,date_cols] <- lapply(hayoon[ ,date_cols], ymd) #lappy(데이터셋, 함수)

Warning: 2 failed to parse.

hayoon[ ,date_cols]

# A tibble: 156 x 1
   date      
   <date>    
 1 2019-09-12
 2 2019-09-12
 3 2019-09-12
 4 2019-09-12
 5 2019-09-12
 6 2019-09-10
 7 2019-09-10
 8 2019-09-10
 9 2019-09-10
10 2019-09-10
# … with 146 more rows

print(hayoon[ ,date_cols], n=5)

# A tibble: 156 x 1
  date      
  <date>    
1 2019-09-12
2 2019-09-12
3 2019-09-12
4 2019-09-12
5 2019-09-12
# … with 151 more rows

str(hayoon)

Classes 'tbl_df', 'tbl' and 'data.frame':   156 obs. of  20 variables:
 $ date       : Date, format: "2019-09-12" "2019-09-12" ...
 $ persc      : chr  "04854호" "04854호" "04854호" "04854호" ...
 $ hosp       : chr  "아주맑은소아청소년과의원" "아주맑은소아청소년과의원" "아주맑은소아청소년과의원" "아주맑은소아청소년과의원" ...
 $ disease    : chr  "J303, R093" "J303, R093" "J303, R093" "J303, R093" ...
 $ doc        : chr  "김병기" "김병기" "김병기" "김병기" ...
 $ medic      : num  6.49e+08 6.44e+08 6.46e+08 6.46e+08 6.44e+08 ...
 $ medic_name : chr  "디푸루칸건조시럽" "레브로콜정" "코비안에스시럽" "씨투스건조시럽" ...
 $ once       : chr  "4" "0.5" "4" "0.33329999999999999" ...
 $ one_day    : num  1 3 3 3 3 3 3 3 3 3 ...
 $ total_count: num  4 4 4 4 4 3 5 5 5 5 ...
 $ no_ins     : chr  NA NA NA NA ...
 $ double     : chr  NA NA NA NA ...
 $ ...13      : logi  NA NA NA NA NA NA ...
 $ ...14      : logi  NA NA NA NA NA NA ...
 $ ...15      : logi  NA NA NA NA NA NA ...
 $ ...16      : logi  NA NA NA NA NA NA ...
 $ ...17      : logi  NA NA NA NA NA NA ...
 $ ...18      : logi  NA NA NA NA NA NA ...
 $ ...19      : logi  NA NA NA NA NA NA ...
 $ ...20      : logi  NA NA NA NA NA NA ...

summary(hayoon)

      date               persc               hosp          
 Min.   :2018-12-10   Length:156         Length:156        
 1st Qu.:2019-04-02   Class :character   Class :character  
 Median :2019-06-01   Mode  :character   Mode  :character  
 Mean   :2019-06-04                                        
 3rd Qu.:2019-08-19                                        
 Max.   :2019-12-26                                        
 NA's   :2                                                 
   disease              doc                medic          
 Length:156         Length:156         Min.   : 64350611  
 Class :character   Class :character   1st Qu.:643504180  
 Mode  :character   Mode  :character   Median :643506111  
                                       Mean   :641438965  
                                       3rd Qu.:645700935  
                                       Max.   :671803433  
                                       NA's   :2          
  medic_name            once              one_day       total_count   
 Length:156         Length:156         Min.   :1.000   Min.   :1.000  
 Class :character   Class :character   1st Qu.:3.000   1st Qu.:3.000  
 Mode  :character   Mode  :character   Median :3.000   Median :3.000  
                                       Mean   :2.701   Mean   :3.131  
                                       3rd Qu.:3.000   3rd Qu.:3.000  
                                       Max.   :3.000   Max.   :5.000  
                                       NA's   :2       NA's   :3      
    no_ins             double           ...13          ...14        
 Length:156         Length:156         Mode:logical   Mode:logical  
 Class :character   Class :character   NA's:156       NA's:156      
 Mode  :character   Mode  :character                                
                                                                    
                                                                    
                                                                    
                                                                    
  ...15          ...16          ...17          ...18        
 Mode:logical   Mode:logical   Mode:logical   Mode:logical  
 NA's:156       NA's:156       NA's:156       NA's:156      
                                                            
                                                            
                                                            
                                                            
                                                            
  ...19          ...20        
 Mode:logical   Mode:logical  
 NA's:156       NA's:156

*missing value 찾기

# missing value 찾기
missing <- lapply(hayoon[,date_cols], is.na)
missing

$date
  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [34] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [45] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [67] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [78] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [89] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[100] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[111] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[122] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[144] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[155]  TRUE  TRUE

num_missing <- sapply(missing, sum)
num_missing

date 
   2

*불필요한 행열 제거

# 불필요한 칼럼 제거 
remove <- c(13:20)
# 불필요한 행(row) 제거 
remove_row<-c(155:156)

cleanHayoonL<-hayoon[-remove_row, -remove]

str(cleanHayoonL)

Classes 'tbl_df', 'tbl' and 'data.frame':   154 obs. of  12 variables:
 $ date       : Date, format: "2019-09-12" "2019-09-12" ...
 $ persc      : chr  "04854호" "04854호" "04854호" "04854호" ...
 $ hosp       : chr  "아주맑은소아청소년과의원" "아주맑은소아청소년과의원" "아주맑은소아청소년과의원" "아주맑은소아청소년과의원" ...
 $ disease    : chr  "J303, R093" "J303, R093" "J303, R093" "J303, R093" ...
 $ doc        : chr  "김병기" "김병기" "김병기" "김병기" ...
 $ medic      : num  6.49e+08 6.44e+08 6.46e+08 6.46e+08 6.44e+08 ...
 $ medic_name : chr  "디푸루칸건조시럽" "레브로콜정" "코비안에스시럽" "씨투스건조시럽" ...
 $ once       : chr  "4" "0.5" "4" "0.33329999999999999" ...
 $ one_day    : num  1 3 3 3 3 3 3 3 3 3 ...
 $ total_count: num  4 4 4 4 4 3 5 5 5 5 ...
 $ no_ins     : chr  NA NA NA NA ...
 $ double     : chr  NA NA NA NA ...

dim(cleanHayoonL)

[1] 154  12

*데이터 타입 정리

#질병분류기호 factor로 만들기 
cleanHayoonL$medic<-as.factor(cleanHayoonL$medic)
#질병분류기호 빈도 보기 
medic_freq<-table(cleanHayoonL$medic)

theme_ft <- theme_light()+
        theme(axis.title = element_text(size=8, face="bold"),
        axis.text.x = element_text(size=6, angle=90),
        axis.text.y = element_text(size=6),
        plot.title = element_text(size=10, color="#333333", face="bold.italic"))

Q. Which diseases were diagnosed?

*Chart 01: stacked bar

Disease codes are selected by the doctor to prescribe medicine. In the patient prescription copy, there were two major disease codes. To find out what disease she had most often in a year, I made a stack bar chart: the x axis is disease code and the y axis is the number of times it appears. Disease codes are the combination of alphabet and numbers, I factorized them to make a bar chart. The graph shows how often each disease code occured.

질병분류기호의 칼럼명은 disease이다. 막대챠트를 그리기 위해서는 카테고리 데이터가 필요하므로 factor로 변환시켜 주어 x좌표에 위치시킨다. 의약품의 칼럼명은 medic이다. 막대그래프안에 의약품별로 분류하기 위해 fill함수로 추가한다. 챠트를 보면 핑크색, 초록색 의약품이 전반적으로 씌여있다. 핑크색이 씌여지지 않은 막대 4, 5번이 눈에 띈다.

ggplot(cleanHayoonL, aes(x=factor(disease), fill=medic))+
  geom_bar()+
  labs(x='Disease codes',
       y='Count of prescription',
       title='Which diseases were diagnosed?',
       fill ='Drug codes')+
  theme_ft+
  scale_fill_hue(h=c(0,300))

Q. Which medications were prescribed for each disease code?

*Chart 02: bar, dodge

The positions were adjusted by dodging overlaps to the side. This makes it possible to more accurately know the type and dosage of drug prescribed by disease code classification.

geom_bar함수 안에 position=“dodge”로 설정하면 factor 요소가 분리되어 표현된다. 질병분류별 처방된 약의 종류와 용량을 더 정확하게 알 수 있다. geom_bar에 아무런 인자를 넣지 않으면 position=“stack”이 디폴트가 된다. 상단 챠트와 같이 그려진다.

ggplot(cleanHayoonL, aes(x=factor(disease), fill=medic))+geom_bar(position="dodge")+
    labs(x='Disease codes',
       y='Count of prescription',
       title='Which medications were prescribed for each disease code?',
       fill ='Drug codes')+
  theme_light()+
  theme_ft+
  scale_fill_hue(h=c(0,300))

하지만 약의 종류가 많아서 제대로 알아 볼 수가 없다. 어떻게 해야 할까?

Q. Which medications were prescribed the most?

*Chart 03:

The proper quantity could not be directly compared between different medications, so I utilized a ratio comparison. Also, there were 32 different medications; too many to simply convey information by assigning each medication a distinct color. An overload of colors would not help to differentiate between each medication. Using only one color was easier on the eyes, and I was able to differentiate between each one by adjusting the opacity. I used the color palette package to achieve this effect.

의약품 개수가 많으므로 색을 조정해 보기로 했다. 색 팔레트 패키지 설치 install.packages(“RColorBrewer”) position=“fill”로 하여 막대의 길이를 동일하게 해서 각 처방날 마다 약의 비율만 비교해 보기로 했다.

library(RColorBrewer)
display.brewer.all(n=10, exact.n=FALSE)

# 색깔 팔레트 선택해서 넣기
blues<-brewer.pal(9,"Blues")
blue_range<-colorRampPalette(blues)
ggplot(cleanHayoonL, aes(x=factor(disease), fill=medic))+
  geom_bar(position="fill")+
  labs(x='Disease codes',
       y='Count of prescription',
       title='Which medications were prescribed the most?',
       fill = 'Drug Codes')+  
  theme_ft+
  scale_fill_manual(values=blue_range(32)) #32개 의약품 색처리

다른 색으로도 챠트를 그려보자.

*Chart 03-2:

RColorBrewer패키지에서 색 셋트를 불러오는 함수는 brewer.pal(n, name)이다. n: 사용하려는 색의 개수, name: 팔레트 이름

#팔레트 "PuOr" 셋트를 선택하고, 그 중에 10개의 색을 기본으로 선택한다.
cols<-brewer.pal(10, "PuOr")

#선택한 색을 colorRampPalette의 함수로 interpolate 하도록 만든다.
# colorRampPalette함수는 가져온 색 셋트에서 몇 개의 색을 뿌려줄지 선택하면, 그 사이의 색을 섞어서 보여준다.

pal_m<-colorRampPalette(cols)

ggplot(cleanHayoonL, aes(x=factor(disease), fill=medic))+
  geom_bar(position="fill")+
  labs(x='Disease codes',
       y='Count of prescription',
       title='Which medications were prescribed the most?',
       fill = 'Drug Codes')+
  scale_fill_manual(values=pal_m(32)) + #선택한 팔렛트에서 32개의 색을 사용한다고 선택한다.
  theme_ft

Q. What kind of medicine were prescriped per disease codes?

*Chart 04:

I decided to sort each medication by disease classification code. Each box represents a disease code combination by doctor per visit. Multi-panel plots looked better than one bar chart to compare disease codes.

질병분류코드별로 의약품을 나누어 보기로 했다. 각 박스는 질병분류코드를 의미한다. x=처방받은 날짜, y=한번에 먹어야 하는 약의 용량, 색깔코드=의약품코드 질병코드별로 처방된 날짜와 의약품 보기가 훨씬 쉬워졌다.

class(cleanHayoonL$once)

[1] "character"

once_n<-as.numeric(cleanHayoonL$once)

Warning: 강제형변환에 의해 생성된 NA 입니다

scaleFUN <- function(x) sprintf("%.2f", x)

# 질병코드별 facet 그리드
ggplot(cleanHayoonL, aes(date, once_n, col=medic))+
  geom_jitter(alpha=0.5, size=3, width=0.2)+
  facet_wrap(~disease)+
  labs(x='Dates',
       y='One dose amount',
       title='What kind of medicine were prescriped per disease codes?',
       color = 'Drug Codes')+
  theme_ft+
  scale_y_continuous(labels=scaleFUN)

Warning: Removed 1 rows containing missing values (geom_point).

Q. What kind of medicine were prescriped per disease codes?

*Chart 04-2:

의사별로 어떤 질병코드로 처방했을까?

# 의사별 분류
ggplot(cleanHayoonL, aes(date, once_n, col=doc))+
  geom_point(alpha=0.5, size=3, width=0.2)+
  facet_wrap(~disease)+
  labs(x='Dates',
     y='One dose amount',
     title='Which disease codes were prescribed by doctors?',
     color = 'Doctor name')+
  theme_ft+
  scale_y_continuous(labels=scaleFUN)+
  theme(legend.text = element_text(family="AppleGothic"))

Warning: Ignoring unknown parameters: width

Warning: Removed 1 rows containing missing values (geom_point).

Visual literacy

*Chart 05

Disease codes are like blocks. The blocks can be combined with different codes for each doctor encounter. The graph below shows each code combination. For example, the code H6501 was diagnosed with J209, J459, and L2088 on different encounters. The density of color represents the number of medication administrations per day. The code H6501 represents otitis media and J209 is bronchitis. The graph shows that bronchitis was diagnosed several times that otitis media was also diagnosed.

좀 더 쉽게 볼 수 있는 챠트는 없을까?

x축, y축: 질병분류코드 색: 1일 투여횟수 (모든 의약품)

하루 섭취량이 가장 높은게 진한 색이어야 하는데, 반대로 되어있다.

#히트맵
cleanHayoonS<-separate(data=cleanHayoonL, col = disease, sep=", ", into = c("disease_1", "disease_2"))

ggplot(cleanHayoonS, aes(x=disease_1, disease_2, fill=one_day))+
  geom_tile()+
  labs(x="Disease Code 1", y="Disease Code 2", fill="Number of doses per day",
       title="Which disease has the most frequency to take per day?")+
  theme_ft

낮은 수는 연한 하늘색, 높은 수는 진한 하늘색으로 gradient를 준다.

ggplot(cleanHayoonS, aes(x=factor(disease_1), y=factor(disease_2), fill=one_day))+
  geom_tile()+
  labs(x="Disease Code 1", y="Disease Code 2", fill="Number of doses per day",
       title="Which disease has the most frequency to take per day?")+
  theme_ft+
  scale_fill_gradient(low="#bfddff", high="#002e63")

그래프 x축 첫번째 값인 H6501 질병코드는 J209, J459, L2088과 조합해서 처방된 적이 있다는 것을 알 수 있다.

혹여나 해서 다시 필터링해 보았다. H6501 은 급성 장액성 중이염 (양쪽 귀)를 의미하는 질병코드이다. 주요질병 중이염이 걸렸을 때 J209: 상세불명의 기관지염, J459: 상세불명의 천식, L2088: 기타 아토피성 피부염을 부질병으로 처방했다는 뜻이다. 색의 진하기를 의미하는 바는 하루의 1일 투여횟수를 의미한다. 진한 파랑색은 1일 투여횟수가 많다는 의미다. 1일 투여되는 약은 처방된 모든 의약품이다. 즉, 진한 파랑색으로 유추할 수 있는 것은

처방되는 약의 종류가 많다는 것을 우회적으로 유추할 수 있다(단, 다음의 사실 (2)로 이 사항 (1)은 오류일 수 있다.).
혹은 처방된 질병코드가 더 빈번했다는 것을 유추할 수 있다.

히트맵은 위와 같은 사항으로 정확한 사실을 전달해 주는 것이 어렵게 느껴진다. 다른 그래프를 그려봐야 겠다.

cleanHayoonS %>%
 filter(disease_1 == "H6501") %>%
 group_by(disease_2) %>%
 tally()

# A tibble: 3 x 2
  disease_2     n
  <chr>     <int>
1 J209          6
2 J459         11
3 L2088         7

The result

The most important takeaway from all this was that ordinary citizens do not have much right to own or use their own data. If people were able to easily access and browse their own data it would help them to be healthier and protect them from overprescription of medications.

Second, this exercise helped me to see data from different perspectives. An idle curiosity moved me to learn R programming language to wrangling the data. It took more than a year to become versed in R. This personal project motivated me to learn R, and I have become more savvy at data analysis and data visualization because of it.

느낀점

시각화 챠트를 만드는 과정에서 데이터를 보는 여러 관점으로 상황을 파악할 수 있었다.

제약점

질병분류기호는 실제로 감염 질병을 의미하는 것 아니라 약을 처방하기 위한 것임
3가지 이상의 질병분류기호가 존재할 수 있지만, 처방전에는 2개만 표시됨

향후 계획

일자별 처방 데이터 추가 입력시 자동 통계 프로그램

Reference

질병 데이터 표현 GUI
질병 데이터 분석 방법

Data Analysis of Medication Prescriptions

Miyoung Yoon

10/13/2019

How it starts

Questions

Procedures of Visual information

Data collection

1. 데이터 수집: 처방전 인쇄물 디지털 DB화

Data wrangling and analysis

2. 데이터 분석

Data visualization

필요한 패키지

폰트 설정 패키지, 폰트설정

*엑셀 데이터 가져오기

*날짜를 Data값으로 변환하기 “data” 명칭 있는 칼럼 찾기

*missing value 찾기

*불필요한 행열 제거

*데이터 타입 정리

Q. Which diseases were diagnosed?

*Chart 01: stacked bar

Q. Which medications were prescribed for each disease code?

*Chart 02: bar, dodge

Q. Which medications were prescribed the most?

*Chart 03:

*Chart 03-2:

Q. What kind of medicine were prescriped per disease codes?

*Chart 04:

Q. What kind of medicine were prescriped per disease codes?

*Chart 04-2:

Visual literacy

*Chart 05

The result

느낀점

제약점

향후 계획

Reference