Analytics of Seoul Sports Facilities

사전 작업

다운받은 서울시 생활체육시설 현황 자료를 위에서부터 3행을 삭제하고, 숫자 컬럼의 ’,’를 모두 제거한 후
csv로 저장한 후 시작

Import library

library(tidyverse)

Read data

df = read_csv("data/data.csv", locale=locale('ko',encoding='euc-kr'))
df %>% str()

tibble [260 x 28] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Column1 : num [1:260] 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
 $ Column2 : chr [1:260] "서울시" "종로구" "중구" "용산구" ...
 $ Column3 : num [1:260] 11353 315 388 221 357 ...
 $ Column4 : num [1:260] 3106746 72082 124418 69597 78648 ...
 $ Column5 : chr [1:260] "12" "-" "1" "1" ...
 $ Column6 : chr [1:260] "54126" "-" "5808" "3676" ...
 $ Column7 : chr [1:260] "79" "5" "7" "5" ...
 $ Column8 : chr [1:260] "272236" "11281" "12686" "14044" ...
 $ Column9 : chr [1:260] "71" "1" "5" "1" ...
 $ Column10: chr [1:260] "248097" "1268" "18107" "519" ...
 $ Column11: num [1:260] 2290 17 23 41 77 94 99 107 93 83 ...
 $ Column12: chr [1:260] "353153" "1547" "2977" "5118" ...
 $ Column13: num [1:260] 1829 36 73 37 57 ...
 $ Column14: chr [1:260] "816058" "16510" "24956" "22653" ...
 $ Column15: chr [1:260] "1577" "45" "63" "29" ...
 $ Column16: chr [1:260] "447521" "10120" "20331" "7391" ...
 $ Column17: num [1:260] 5323 193 198 103 176 ...
 $ Column18: chr [1:260] "878862" "29785" "37672" "15892" ...
 $ Column19: chr [1:260] "3" "-" "-" "-" ...
 $ Column20: chr [1:260] "11605" "-" "-" "-" ...
 $ Column21: chr [1:260] "14" "1" "1" "-" ...
 $ Column22: chr [1:260] "7610" "331" "489" "-" ...
 $ Column23: chr [1:260] "155" "17" "17" "4" ...
 $ Column24: chr [1:260] "17477" "1240" "1392" "304" ...
 $ Column25: chr [1:260] "-" "-" "-" "-" ...
 $ Column26: chr [1:260] "-" "-" "-" "-" ...
 $ Column27: chr [1:260] "-" "-" "-" "-" ...
 $ Column28: chr [1:260] "-" "-" "-" "-" ...
 - attr(*, "spec")=
  .. cols(
  ..   Column1 = col_double(),
  ..   Column2 = col_character(),
  ..   Column3 = col_double(),
  ..   Column4 = col_double(),
  ..   Column5 = col_character(),
  ..   Column6 = col_character(),
  ..   Column7 = col_character(),
  ..   Column8 = col_character(),
  ..   Column9 = col_character(),
  ..   Column10 = col_character(),
  ..   Column11 = col_double(),
  ..   Column12 = col_character(),
  ..   Column13 = col_double(),
  ..   Column14 = col_character(),
  ..   Column15 = col_character(),
  ..   Column16 = col_character(),
  ..   Column17 = col_double(),
  ..   Column18 = col_character(),
  ..   Column19 = col_character(),
  ..   Column20 = col_character(),
  ..   Column21 = col_character(),
  ..   Column22 = col_character(),
  ..   Column23 = col_character(),
  ..   Column24 = col_character(),
  ..   Column25 = col_character(),
  ..   Column26 = col_character(),
  ..   Column27 = col_character(),
  ..   Column28 = col_character()
  .. )

컬럼의 이름을 모두 영문으로 변경: 애매한 것들은 구글번역기에서 한글의 로마자 표기 이용.

names(df)<-c('year', 'gu', 'tot_cnt', 'tot_area', 'yoteujang_cnt','yoteujang_area',
             'ice_link_cnt','ice_link_area','jonghabcheyugsiseol_cnt', 'jonghabcheyugsiseol_area',
             'swimming_pool_cnt','swimming_pool_area', 'cheyugdojang_cnt','cheyugdojang_area',
             'golpeuyeonseubjang_cnt','golpeuyeonseubjang_area','fitness_cnt','fitness_area',
             'billiard_cnt','billiard_area','sseolmaejang_cnt','sseolmaejang_area','mudojang_cnt','mudojang_area',
             'mudohag-won_cnt','mudohag-won_area','golpeujang_cnt','golpeujang_area')

숫자 컬럼을 모두 숫자로(integer)로 변경, year 컬럼을 char로 변경,

df[,3:28]<-df[,3:28] %>% sapply(as.integer) # 숫자 컬럼을 모두 숫자로(integer)로 변경
df$year <- as.character(df$year) #year 컬럼을 char로 변경
df <- df %>% filter(gu!='서울시') # 서울시 행 제거 
df[is.na(df)]<-0 # na 값은 모두 0 값이므로 0으로 채움

서울시 지도를 아래 사이트에서 다운로드(data directory에 저장): 시도/시군구/읍면동/리

http://www.gisdeveloper.co.kr/?p=2332

Import rgdal library

rgdal 라이브러리는 ‘GDAL’( ’Geospatial’Data Abstraction Library)에 대한 바인딩을 제공
GDAL은 Open Source Geospatial Foundation의 X / MIT 스타일 오픈 소스 라이선스에 따라 출시 된 래스터 및 벡터 지리 공간 데이터 형식을위한 변환기 라이브러리임.

library(rgdal)

시도 지도(.shp 형식) 읽기

map<-readOGR('data/TL_SCCO_CTPRVN.shp')

OGR data source with driver: ESRI Shapefile 
Source: "C:\Rproject\seoul\data\TL_SCCO_CTPRVN.shp", layer: "TL_SCCO_CTPRVN"
with 17 features
It has 3 fields

map 데이터 확인

map %>% class()

[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"

데이터는 R의 클라스 중의 하나인 S4 클라스임. S4 클라스는 @를 이용해서 확인이 가능함.

map %>% slotNames()

[1] "data"        "polygons"    "plotOrder"   "bbox"        "proj4string"

map@data %>% head()

  CTPRVN_CD       CTP_ENG_NM CTP_KOR_NM
0        42       Gangwon-do     강원도
1        41      Gyeonggi-do     경기도
2        48 Gyeongsangnam-do   경상남도
3        47 Gyeongsangbuk-do   경상북도
4        29          Gwangju 광주광역시
5        27            Daegu 대구광역시

map@data %>% nrow()

[1] 17

map@bbox %>% str()

 num [1:2, 1:2] 746110 1458754 1387950 2068444
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2] "x" "y"
  ..$ : chr [1:2] "min" "max"

map 데이터를 data frame으로 읽어들이기

df_map <- fortify(map) # forty: '데이터로 모델을 강화'한다는데 곧 deprecate 될 예정이며 broom 패키지를 사용하라고 권고함
df_map %>% head()

     long     lat order  hole piece id group
1 1091705 2034023     1 FALSE     1  0   0.1
2 1091705 2034038     2 FALSE     1  0   0.1
3 1091656 2034038     3 FALSE     1  0   0.1
4 1091616 2034059     4 FALSE     1  0   0.1
5 1091570 2034089     5 FALSE     1  0   0.1
6 1091550 2034094     6 FALSE     1  0   0.1

df_map %>% str() # 컬럼(order,hole,piece,group)의 의미는 잘 모르겠음.

'data.frame':   809620 obs. of  7 variables:
 $ long : num  1091705 1091705 1091656 1091616 1091570 ...
 $ lat  : num  2034023 2034038 2034038 2034059 2034089 ...
 $ order: int  1 2 3 4 5 6 7 8 9 10 ...
 $ hole : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ piece: Factor w/ 2206 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ id   : chr  "0" "0" "0" "0" ...
 $ group: Factor w/ 4748 levels "0.1","0.2","0.3",..: 1 1 1 1 1 1 1 1 1 1 ...

ggplot으로 지도 그려보기

df_map %>% 
  ggplot()+
  aes(x = long, y = lat, group=group, col=id)+
  geom_polygon(fill="white")+
  theme(legend.position = 'none')

id 별로 그려보기(id는 지역)

df_map[df_map$id %in% 0:5,] %>% 
  ggplot()+
  aes(x = long, y = lat, group=group, col=id)+
  geom_polygon(fill="white")+
  facet_wrap(~ id, nrow = 2, scales = 'free')+
  theme(legend.position = 'none')

시도의 id 집어넣기

df_map_info <- map@data
df_map_info[,'id'] <- (1:nrow(map@data))-1

서울시의 시군구 데이터를 불러오기

map <- readOGR("data/TL_SCCO_SIG.shp")

OGR data source with driver: ESRI Shapefile 
Source: "C:\Rproject\seoul\data\TL_SCCO_SIG.shp", layer: "TL_SCCO_SIG"
with 250 features
It has 3 fields

df_map <- fortify(map)
df_map %>% head()

     long     lat order  hole piece id group
1 1007462 2008949     1 FALSE     1  0   0.1
2 1007512 2008902     2 FALSE     1  0   0.1
3 1007698 2008919     3 FALSE     1  0   0.1
4 1007797 2008978     4 FALSE     1  0   0.1
5 1007921 2008946     5 FALSE     1  0   0.1
6 1007947 2008939     6 FALSE     1  0   0.1

df_map %>% 
  ggplot()+
  aes(x = long, y = lat, group=group, col=id)+
  geom_polygon(fill='white')+
  theme(legend.position = "none")

df_map_info <- map@data
df_map_info %>% head()

  SIG_CD   SIG_ENG_NM SIG_KOR_NM
0  42110 Chuncheon-si     춘천시
1  42130     Wonju-si     원주시
2  42150 Gangneung-si     강릉시
3  42170   Donghae-si     동해시
4  42190   Taebaek-si     태백시
5  42210    Sokcho-si     속초시

이제 시군구 중에서 서울만 불러오자

df_map_info[,'id'] <- (1:nrow(df_map_info))-1 # 여기의 id는 df_map$id와 일치함.
df_map_info[,'SIDO'] <- df_map_info$SIG_CD %>%
  str_sub(start = 1, end = 2) %>%
  as.integer()

df_map_info_seoul <- df_map_info %>% 
  filter(SIDO==11) # 11은 서울을 의미함.

df_map_seoul <- df_map %>% 
  filter(id %in% df_map_info_seoul$id) # df_map$id 중에서 df_map_info_seoul$id를 포함하고 있는 행만 추출

서울시의 지도 그리기

df_map_seoul %>% 
  ggplot()+
  aes(x = long, y = lat, group=group, col=id)+
  geom_polygon(fill='white')+
  theme(legend.position = 'none')

생활체육시설 데이터(df)와 map(df_map_seoul) 데이터 병합

df_map_seoul$id %>% class()

[1] "character"

df_map_seoul$id <- df_map_seoul$id %>% as.integer()

# 1. df_map_info_seoul의 'SIG_KOR_NM'과 df의 'gu'를 키 값으로 병합
df_map_info_seoul <- 
  left_join(x = df_map_info_seoul, y = df, by=c('SIG_KOR_NM'='gu'))

# 2. df_map_seoul의 'id'와 df_map의 'id'를 키 값으로 병합
df_map_seoul_join <-
  left_join(x = df_map_seoul, y = df_map_info_seoul, by=c('id'='id'))

각 구를 지도위에 프린트하기 위해서 구의 df를 만듬.

df_gu <- df_map_seoul_join %>% 
  group_by(SIG_KOR_NM) %>% 
  summarise(long=mean(long), lat=mean(lat), tot_cnt=mean(tot_cnt))

각 구별로 생활체육시설의 개수 그리기

여기서는 각 구별 체육시설의 2019년 총계(tot_cnt)를 그림.

df_map_seoul_join %>% 
  filter(year=='2019') %>% 
  ggplot()+
  aes(x = long, y = lat)+
  geom_polygon(aes(group=group, fill=tot_cnt), color='black',size=0.7)+
  scale_fill_gradient(low = "yellow", high = "blue")+
  geom_text(data=df_gu, 
            aes(long, lat, label = paste0(SIG_KOR_NM,"\n","(",as.integer(tot_cnt),")")), size=3)

이제 2010~2019년 시계열 데이터를 만들기

library(ggrepel) # For geom_text_repel instead of geom_text(데이터 라벨을 분리하기 위해 사용됨)

df %>% 
  ggplot(aes(x = year, y = tot_cnt))+
  geom_line(aes(color=gu, group=gu))+
  geom_point(aes(color=gu))+
  scale_y_log10(n.breaks=10)+
  labs(title = "서울시 각 구별 생활체육시설 총계(log 스케일임을 주의)")+
  theme(plot.title = element_text(hjust = 0.5),legend.position = 'none')+
  geom_text_repel(data=df %>% filter(year=='2010'),
                  aes(label=gu, color=gu),size=2.5, hjust=1.2, vjust=0.1)

shiny 패키지를 사용하여 각 체육시설 별로 선택해서 비교

library(shiny)

shinyApp(
  ui = fluidPage(
    varSelectInput("variable", "Variable:", df[,3:28]), #데이터 프레임의 열 이름으로 선택 목록을 만듬. id, label, data. server의 !!input$variable
    plotOutput("data") # server의 output$data와 협력(co-work)
  ),
  server = function(input, output) {
    output$data <- renderPlot({
      ggplot(df, aes(x = year, y = !!input$variable))+ #y = tot_cnt -> y = !!input$variable
        geom_line(aes(color=gu, group=gu))+
        geom_point(aes(color=gu))+
        #scale_y_log10(n.breaks=10)+
        #scale_y_sqrt(n.breaks=10)+
        theme(legend.position = 'none')+
        geom_text_repel(data=df %>% filter(year=='2010'),
                        aes(label=gu, color=gu),
                        size=2.5, hjust=2.0, vjust=0.1)
    })
  }
)

Shiny applications not supported in static R Markdown documents

Analytics of Seoul Sports Facilities

Semper

2021 2 1

서울시 생활체육시설에 대한 통계분석

참고:

사전 작업

Import library

Read data

컬럼의 이름을 모두 영문으로 변경: 애매한 것들은 구글번역기에서 한글의 로마자 표기 이용.

숫자 컬럼을 모두 숫자로(integer)로 변경, year 컬럼을 char로 변경,

서울시 지도를 아래 사이트에서 다운로드(data directory에 저장): 시도/시군구/읍면동/리

Import rgdal library

시도 지도(.shp 형식) 읽기

map 데이터 확인

map 데이터를 data frame으로 읽어들이기

ggplot으로 지도 그려보기

id 별로 그려보기(id는 지역)

시도의 id 집어넣기

서울시의 시군구 데이터를 불러오기

이제 시군구 중에서 서울만 불러오자

서울시의 지도 그리기

생활체육시설 데이터(df)와 map(df_map_seoul) 데이터 병합

각 구를 지도위에 프린트하기 위해서 구의 df를 만듬.

각 구별로 생활체육시설의 개수 그리기

이제 2010~2019년 시계열 데이터를 만들기

shiny 패키지를 사용하여 각 체육시설 별로 선택해서 비교

끝