송지원
Q1. ggplot2의 midwest 데이터를 데이터 프레임 형태로 불러온 다음 데이터의 특징을 파악하세요.
midwest <- as.data.frame(ggplot2::midwest)
head(midwest)
## PID county state area poptotal popdensity popwhite popblack popamerindian
## 1 561 ADAMS IL 0.052 66090 1270.9615 63917 1702 98
## 2 562 ALEXANDER IL 0.014 10626 759.0000 7054 3496 19
## 3 563 BOND IL 0.022 14991 681.4091 14477 429 35
## 4 564 BOONE IL 0.017 30806 1812.1176 29344 127 46
## 5 565 BROWN IL 0.018 5836 324.2222 5264 547 14
## 6 566 BUREAU IL 0.050 35688 713.7600 35157 50 65
## popasian popother percwhite percblack percamerindan percasian percother
## 1 249 124 96.71206 2.5752761 0.1482826 0.37675897 0.18762294
## 2 48 9 66.38434 32.9004329 0.1788067 0.45172219 0.08469791
## 3 16 34 96.57128 2.8617170 0.2334734 0.10673071 0.22680275
## 4 150 1139 95.25417 0.4122574 0.1493216 0.48691813 3.69733169
## 5 5 6 90.19877 9.3728581 0.2398903 0.08567512 0.10281014
## 6 195 221 98.51210 0.1401031 0.1821340 0.54640215 0.61925577
## popadults perchsd percollege percprof poppovertyknown percpovertyknown
## 1 43298 75.10740 19.63139 4.355859 63628 96.27478
## 2 6724 59.72635 11.24331 2.870315 10529 99.08714
## 3 9669 69.33499 17.03382 4.488572 14235 94.95697
## 4 19272 75.47219 17.27895 4.197800 30337 98.47757
## 5 3979 68.86152 14.47600 3.367680 4815 82.50514
## 6 23444 76.62941 18.90462 3.275891 35107 98.37200
## percbelowpoverty percchildbelowpovert percadultpoverty percelderlypoverty
## 1 13.151443 18.01172 11.009776 12.443812
## 2 32.244278 45.82651 27.385647 25.228976
## 3 12.068844 14.03606 10.852090 12.697410
## 4 7.209019 11.17954 5.536013 6.217047
## 5 13.520249 13.02289 11.143211 19.200000
## 6 10.399635 14.15882 8.179287 11.008586
## inmetro category
## 1 0 AAR
## 2 0 LHR
## 3 0 AAR
## 4 1 ALU
## 5 0 AAR
## 6 0 AAR
Q2.poptal 변수를 total로, popasian변수를 asian으로 수정하세요.
midwwet의 복사본을 만듭니다.
midwest_new <-midwest
rename함수를 이용하기 위해 dplyr를 실행시켜줍니다.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
rename함수를 사용하여 변수명을 변경합니다.
midwest_new <-rename(midwest_new, total=poptotal)
midwest_new <-rename(midwest_new, asian=popasian)
변수명이 바뀌었는지 확인합니다.
head(midwest_new)
## PID county state area total popdensity popwhite popblack popamerindian
## 1 561 ADAMS IL 0.052 66090 1270.9615 63917 1702 98
## 2 562 ALEXANDER IL 0.014 10626 759.0000 7054 3496 19
## 3 563 BOND IL 0.022 14991 681.4091 14477 429 35
## 4 564 BOONE IL 0.017 30806 1812.1176 29344 127 46
## 5 565 BROWN IL 0.018 5836 324.2222 5264 547 14
## 6 566 BUREAU IL 0.050 35688 713.7600 35157 50 65
## asian popother percwhite percblack percamerindan percasian percother
## 1 249 124 96.71206 2.5752761 0.1482826 0.37675897 0.18762294
## 2 48 9 66.38434 32.9004329 0.1788067 0.45172219 0.08469791
## 3 16 34 96.57128 2.8617170 0.2334734 0.10673071 0.22680275
## 4 150 1139 95.25417 0.4122574 0.1493216 0.48691813 3.69733169
## 5 5 6 90.19877 9.3728581 0.2398903 0.08567512 0.10281014
## 6 195 221 98.51210 0.1401031 0.1821340 0.54640215 0.61925577
## popadults perchsd percollege percprof poppovertyknown percpovertyknown
## 1 43298 75.10740 19.63139 4.355859 63628 96.27478
## 2 6724 59.72635 11.24331 2.870315 10529 99.08714
## 3 9669 69.33499 17.03382 4.488572 14235 94.95697
## 4 19272 75.47219 17.27895 4.197800 30337 98.47757
## 5 3979 68.86152 14.47600 3.367680 4815 82.50514
## 6 23444 76.62941 18.90462 3.275891 35107 98.37200
## percbelowpoverty percchildbelowpovert percadultpoverty percelderlypoverty
## 1 13.151443 18.01172 11.009776 12.443812
## 2 32.244278 45.82651 27.385647 25.228976
## 3 12.068844 14.03606 10.852090 12.697410
## 4 7.209019 11.17954 5.536013 6.217047
## 5 13.520249 13.02289 11.143211 19.200000
## 6 10.399635 14.15882 8.179287 11.008586
## inmetro category
## 1 0 AAR
## 2 0 LHR
## 3 0 AAR
## 4 1 ALU
## 5 0 AAR
## 6 0 AAR
Q3. total, aisan변수를 이용해 ‘전체 인구 대비 아시아 인구 백분율’ 파생변수를 만들고, 히스토그램을 만들어 도시들이 어떻게 분포하는지 살펴보세요.
midwest_new$ratio <-(midwest_new$asian/midwest_new$total)*100
head(midwest_new)
## PID county state area total popdensity popwhite popblack popamerindian
## 1 561 ADAMS IL 0.052 66090 1270.9615 63917 1702 98
## 2 562 ALEXANDER IL 0.014 10626 759.0000 7054 3496 19
## 3 563 BOND IL 0.022 14991 681.4091 14477 429 35
## 4 564 BOONE IL 0.017 30806 1812.1176 29344 127 46
## 5 565 BROWN IL 0.018 5836 324.2222 5264 547 14
## 6 566 BUREAU IL 0.050 35688 713.7600 35157 50 65
## asian popother percwhite percblack percamerindan percasian percother
## 1 249 124 96.71206 2.5752761 0.1482826 0.37675897 0.18762294
## 2 48 9 66.38434 32.9004329 0.1788067 0.45172219 0.08469791
## 3 16 34 96.57128 2.8617170 0.2334734 0.10673071 0.22680275
## 4 150 1139 95.25417 0.4122574 0.1493216 0.48691813 3.69733169
## 5 5 6 90.19877 9.3728581 0.2398903 0.08567512 0.10281014
## 6 195 221 98.51210 0.1401031 0.1821340 0.54640215 0.61925577
## popadults perchsd percollege percprof poppovertyknown percpovertyknown
## 1 43298 75.10740 19.63139 4.355859 63628 96.27478
## 2 6724 59.72635 11.24331 2.870315 10529 99.08714
## 3 9669 69.33499 17.03382 4.488572 14235 94.95697
## 4 19272 75.47219 17.27895 4.197800 30337 98.47757
## 5 3979 68.86152 14.47600 3.367680 4815 82.50514
## 6 23444 76.62941 18.90462 3.275891 35107 98.37200
## percbelowpoverty percchildbelowpovert percadultpoverty percelderlypoverty
## 1 13.151443 18.01172 11.009776 12.443812
## 2 32.244278 45.82651 27.385647 25.228976
## 3 12.068844 14.03606 10.852090 12.697410
## 4 7.209019 11.17954 5.536013 6.217047
## 5 13.520249 13.02289 11.143211 19.200000
## 6 10.399635 14.15882 8.179287 11.008586
## inmetro category ratio
## 1 0 AAR 0.37675897
## 2 0 LHR 0.45172219
## 3 0 AAR 0.10673071
## 4 1 ALU 0.48691813
## 5 0 AAR 0.08567512
## 6 0 AAR 0.54640215
hist(midwest_new$ratio)
Q4. 아시아 인구 백분율 전체 평균을 구하고, 평균을 초과하면 large, 그 외에는 samll을 부여하는 파생변수를 만들어 보세요.
mean(midwest_new$ratio)
## [1] 0.4872462
midwest_new$test <-ifelse(midwest_new$ratio > 0.4872462, "large", "small")
Q5. large와 small에 해당하는 지역이 얼마나 되는지 빈도표와 빈도 막대 그래프를 만들어 확인해 보세요.
table(midwest_new$test)
##
## large small
## 119 318
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following object is masked _by_ '.GlobalEnv':
##
## midwest
qplot(midwest_new$test)