교재 123쪽

송지원

Q1. ggplot2midwest 데이터를 데이터 프레임 형태로 불러온 다음 데이터의 특징을 파악하세요.

midwest <- as.data.frame(ggplot2::midwest)
head(midwest)
##   PID    county state  area poptotal popdensity popwhite popblack popamerindian
## 1 561     ADAMS    IL 0.052    66090  1270.9615    63917     1702            98
## 2 562 ALEXANDER    IL 0.014    10626   759.0000     7054     3496            19
## 3 563      BOND    IL 0.022    14991   681.4091    14477      429            35
## 4 564     BOONE    IL 0.017    30806  1812.1176    29344      127            46
## 5 565     BROWN    IL 0.018     5836   324.2222     5264      547            14
## 6 566    BUREAU    IL 0.050    35688   713.7600    35157       50            65
##   popasian popother percwhite  percblack percamerindan  percasian  percother
## 1      249      124  96.71206  2.5752761     0.1482826 0.37675897 0.18762294
## 2       48        9  66.38434 32.9004329     0.1788067 0.45172219 0.08469791
## 3       16       34  96.57128  2.8617170     0.2334734 0.10673071 0.22680275
## 4      150     1139  95.25417  0.4122574     0.1493216 0.48691813 3.69733169
## 5        5        6  90.19877  9.3728581     0.2398903 0.08567512 0.10281014
## 6      195      221  98.51210  0.1401031     0.1821340 0.54640215 0.61925577
##   popadults  perchsd percollege percprof poppovertyknown percpovertyknown
## 1     43298 75.10740   19.63139 4.355859           63628         96.27478
## 2      6724 59.72635   11.24331 2.870315           10529         99.08714
## 3      9669 69.33499   17.03382 4.488572           14235         94.95697
## 4     19272 75.47219   17.27895 4.197800           30337         98.47757
## 5      3979 68.86152   14.47600 3.367680            4815         82.50514
## 6     23444 76.62941   18.90462 3.275891           35107         98.37200
##   percbelowpoverty percchildbelowpovert percadultpoverty percelderlypoverty
## 1        13.151443             18.01172        11.009776          12.443812
## 2        32.244278             45.82651        27.385647          25.228976
## 3        12.068844             14.03606        10.852090          12.697410
## 4         7.209019             11.17954         5.536013           6.217047
## 5        13.520249             13.02289        11.143211          19.200000
## 6        10.399635             14.15882         8.179287          11.008586
##   inmetro category
## 1       0      AAR
## 2       0      LHR
## 3       0      AAR
## 4       1      ALU
## 5       0      AAR
## 6       0      AAR

Q2.poptal 변수를 total로, popasian변수를 asian으로 수정하세요.

midwwet의 복사본을 만듭니다.

midwest_new <-midwest

rename함수를 이용하기 위해 dplyr를 실행시켜줍니다.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

rename함수를 사용하여 변수명을 변경합니다.

midwest_new <-rename(midwest_new, total=poptotal)
midwest_new <-rename(midwest_new, asian=popasian)

변수명이 바뀌었는지 확인합니다.

head(midwest_new)
##   PID    county state  area total popdensity popwhite popblack popamerindian
## 1 561     ADAMS    IL 0.052 66090  1270.9615    63917     1702            98
## 2 562 ALEXANDER    IL 0.014 10626   759.0000     7054     3496            19
## 3 563      BOND    IL 0.022 14991   681.4091    14477      429            35
## 4 564     BOONE    IL 0.017 30806  1812.1176    29344      127            46
## 5 565     BROWN    IL 0.018  5836   324.2222     5264      547            14
## 6 566    BUREAU    IL 0.050 35688   713.7600    35157       50            65
##   asian popother percwhite  percblack percamerindan  percasian  percother
## 1   249      124  96.71206  2.5752761     0.1482826 0.37675897 0.18762294
## 2    48        9  66.38434 32.9004329     0.1788067 0.45172219 0.08469791
## 3    16       34  96.57128  2.8617170     0.2334734 0.10673071 0.22680275
## 4   150     1139  95.25417  0.4122574     0.1493216 0.48691813 3.69733169
## 5     5        6  90.19877  9.3728581     0.2398903 0.08567512 0.10281014
## 6   195      221  98.51210  0.1401031     0.1821340 0.54640215 0.61925577
##   popadults  perchsd percollege percprof poppovertyknown percpovertyknown
## 1     43298 75.10740   19.63139 4.355859           63628         96.27478
## 2      6724 59.72635   11.24331 2.870315           10529         99.08714
## 3      9669 69.33499   17.03382 4.488572           14235         94.95697
## 4     19272 75.47219   17.27895 4.197800           30337         98.47757
## 5      3979 68.86152   14.47600 3.367680            4815         82.50514
## 6     23444 76.62941   18.90462 3.275891           35107         98.37200
##   percbelowpoverty percchildbelowpovert percadultpoverty percelderlypoverty
## 1        13.151443             18.01172        11.009776          12.443812
## 2        32.244278             45.82651        27.385647          25.228976
## 3        12.068844             14.03606        10.852090          12.697410
## 4         7.209019             11.17954         5.536013           6.217047
## 5        13.520249             13.02289        11.143211          19.200000
## 6        10.399635             14.15882         8.179287          11.008586
##   inmetro category
## 1       0      AAR
## 2       0      LHR
## 3       0      AAR
## 4       1      ALU
## 5       0      AAR
## 6       0      AAR

Q3. total, aisan변수를 이용해 ‘전체 인구 대비 아시아 인구 백분율’ 파생변수를 만들고, 히스토그램을 만들어 도시들이 어떻게 분포하는지 살펴보세요.

midwest_new$ratio <-(midwest_new$asian/midwest_new$total)*100
head(midwest_new)
##   PID    county state  area total popdensity popwhite popblack popamerindian
## 1 561     ADAMS    IL 0.052 66090  1270.9615    63917     1702            98
## 2 562 ALEXANDER    IL 0.014 10626   759.0000     7054     3496            19
## 3 563      BOND    IL 0.022 14991   681.4091    14477      429            35
## 4 564     BOONE    IL 0.017 30806  1812.1176    29344      127            46
## 5 565     BROWN    IL 0.018  5836   324.2222     5264      547            14
## 6 566    BUREAU    IL 0.050 35688   713.7600    35157       50            65
##   asian popother percwhite  percblack percamerindan  percasian  percother
## 1   249      124  96.71206  2.5752761     0.1482826 0.37675897 0.18762294
## 2    48        9  66.38434 32.9004329     0.1788067 0.45172219 0.08469791
## 3    16       34  96.57128  2.8617170     0.2334734 0.10673071 0.22680275
## 4   150     1139  95.25417  0.4122574     0.1493216 0.48691813 3.69733169
## 5     5        6  90.19877  9.3728581     0.2398903 0.08567512 0.10281014
## 6   195      221  98.51210  0.1401031     0.1821340 0.54640215 0.61925577
##   popadults  perchsd percollege percprof poppovertyknown percpovertyknown
## 1     43298 75.10740   19.63139 4.355859           63628         96.27478
## 2      6724 59.72635   11.24331 2.870315           10529         99.08714
## 3      9669 69.33499   17.03382 4.488572           14235         94.95697
## 4     19272 75.47219   17.27895 4.197800           30337         98.47757
## 5      3979 68.86152   14.47600 3.367680            4815         82.50514
## 6     23444 76.62941   18.90462 3.275891           35107         98.37200
##   percbelowpoverty percchildbelowpovert percadultpoverty percelderlypoverty
## 1        13.151443             18.01172        11.009776          12.443812
## 2        32.244278             45.82651        27.385647          25.228976
## 3        12.068844             14.03606        10.852090          12.697410
## 4         7.209019             11.17954         5.536013           6.217047
## 5        13.520249             13.02289        11.143211          19.200000
## 6        10.399635             14.15882         8.179287          11.008586
##   inmetro category      ratio
## 1       0      AAR 0.37675897
## 2       0      LHR 0.45172219
## 3       0      AAR 0.10673071
## 4       1      ALU 0.48691813
## 5       0      AAR 0.08567512
## 6       0      AAR 0.54640215
hist(midwest_new$ratio)

Q4. 아시아 인구 백분율 전체 평균을 구하고, 평균을 초과하면 large, 그 외에는 samll을 부여하는 파생변수를 만들어 보세요.

mean(midwest_new$ratio)
## [1] 0.4872462
midwest_new$test <-ifelse(midwest_new$ratio > 0.4872462, "large", "small")

Q5. largesmall에 해당하는 지역이 얼마나 되는지 빈도표와 빈도 막대 그래프를 만들어 확인해 보세요.

table(midwest_new$test)
## 
## large small 
##   119   318
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following object is masked _by_ '.GlobalEnv':
## 
##     midwest
qplot(midwest_new$test)