Pg 123. 분석 도전!

ggplot2패키지에는 미국 동북중부 437개 지역의 인구통계 정보를 담은 midwest라는 데이터가 들어있습니다. midwest데이터를 사용해 데이터 분석 문제를 해결해 보세요.

Q1. ggplot2midwest데이터를 데이터 프레임 형태로 불러온 다음 데이터의 특징을 파악하세요.

midwest <- as.data.frame(ggplot2::midwest)
summary(midwest)
##       PID          county             state                area        
##  Min.   : 561   Length:437         Length:437         Min.   :0.00500  
##  1st Qu.: 670   Class :character   Class :character   1st Qu.:0.02400  
##  Median :1221   Mode  :character   Mode  :character   Median :0.03000  
##  Mean   :1437                                         Mean   :0.03317  
##  3rd Qu.:2059                                         3rd Qu.:0.03800  
##  Max.   :3052                                         Max.   :0.11000  
##     poptotal         popdensity          popwhite          popblack      
##  Min.   :   1701   Min.   :   85.05   Min.   :    416   Min.   :      0  
##  1st Qu.:  18840   1st Qu.:  622.41   1st Qu.:  18630   1st Qu.:     29  
##  Median :  35324   Median : 1156.21   Median :  34471   Median :    201  
##  Mean   :  96130   Mean   : 3097.74   Mean   :  81840   Mean   :  11024  
##  3rd Qu.:  75651   3rd Qu.: 2330.00   3rd Qu.:  72968   3rd Qu.:   1291  
##  Max.   :5105067   Max.   :88018.40   Max.   :3204947   Max.   :1317147  
##  popamerindian        popasian         popother        percwhite    
##  Min.   :    4.0   Min.   :     0   Min.   :     0   Min.   :10.69  
##  1st Qu.:   44.0   1st Qu.:    35   1st Qu.:    20   1st Qu.:94.89  
##  Median :   94.0   Median :   102   Median :    66   Median :98.03  
##  Mean   :  343.1   Mean   :  1310   Mean   :  1613   Mean   :95.56  
##  3rd Qu.:  288.0   3rd Qu.:   401   3rd Qu.:   345   3rd Qu.:99.07  
##  Max.   :10289.0   Max.   :188565   Max.   :384119   Max.   :99.82  
##    percblack       percamerindan        percasian        percother      
##  Min.   : 0.0000   Min.   : 0.05623   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.: 0.1157   1st Qu.: 0.15793   1st Qu.:0.1737   1st Qu.:0.09102  
##  Median : 0.5390   Median : 0.21502   Median :0.2972   Median :0.17844  
##  Mean   : 2.6763   Mean   : 0.79894   Mean   :0.4872   Mean   :0.47906  
##  3rd Qu.: 2.6014   3rd Qu.: 0.38362   3rd Qu.:0.5212   3rd Qu.:0.48050  
##  Max.   :40.2100   Max.   :89.17738   Max.   :5.0705   Max.   :7.52427  
##    popadults          perchsd        percollege        percprof      
##  Min.   :   1287   Min.   :46.91   Min.   : 7.336   Min.   : 0.5203  
##  1st Qu.:  12271   1st Qu.:71.33   1st Qu.:14.114   1st Qu.: 2.9980  
##  Median :  22188   Median :74.25   Median :16.798   Median : 3.8142  
##  Mean   :  60973   Mean   :73.97   Mean   :18.273   Mean   : 4.4473  
##  3rd Qu.:  47541   3rd Qu.:77.20   3rd Qu.:20.550   3rd Qu.: 4.9493  
##  Max.   :3291995   Max.   :88.90   Max.   :48.079   Max.   :20.7913  
##  poppovertyknown   percpovertyknown percbelowpoverty percchildbelowpovert
##  Min.   :   1696   Min.   :80.90    Min.   : 2.180   Min.   : 1.919      
##  1st Qu.:  18364   1st Qu.:96.89    1st Qu.: 9.199   1st Qu.:11.624      
##  Median :  33788   Median :98.17    Median :11.822   Median :15.270      
##  Mean   :  93642   Mean   :97.11    Mean   :12.511   Mean   :16.447      
##  3rd Qu.:  72840   3rd Qu.:98.60    3rd Qu.:15.133   3rd Qu.:20.352      
##  Max.   :5023523   Max.   :99.86    Max.   :48.691   Max.   :64.308      
##  percadultpoverty percelderlypoverty    inmetro         category        
##  Min.   : 1.938   Min.   : 3.547     Min.   :0.0000   Length:437        
##  1st Qu.: 7.668   1st Qu.: 8.912     1st Qu.:0.0000   Class :character  
##  Median :10.008   Median :10.869     Median :0.0000   Mode  :character  
##  Mean   :10.919   Mean   :11.389     Mean   :0.3432                     
##  3rd Qu.:13.182   3rd Qu.:13.412     3rd Qu.:1.0000                     
##  Max.   :43.312   Max.   :31.162     Max.   :1.0000

Q2. poptotal(전체인구) 변수를 total로, popasian(아시아 인구) 변수를 asian으로 수정하세요.

library("dplyr")
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
midwest <- dplyr::rename(midwest, total = poptotal)
midwest <- dplyr::rename(midwest, asian = popasian)
summary(midwest)
##       PID          county             state                area        
##  Min.   : 561   Length:437         Length:437         Min.   :0.00500  
##  1st Qu.: 670   Class :character   Class :character   1st Qu.:0.02400  
##  Median :1221   Mode  :character   Mode  :character   Median :0.03000  
##  Mean   :1437                                         Mean   :0.03317  
##  3rd Qu.:2059                                         3rd Qu.:0.03800  
##  Max.   :3052                                         Max.   :0.11000  
##      total           popdensity          popwhite          popblack      
##  Min.   :   1701   Min.   :   85.05   Min.   :    416   Min.   :      0  
##  1st Qu.:  18840   1st Qu.:  622.41   1st Qu.:  18630   1st Qu.:     29  
##  Median :  35324   Median : 1156.21   Median :  34471   Median :    201  
##  Mean   :  96130   Mean   : 3097.74   Mean   :  81840   Mean   :  11024  
##  3rd Qu.:  75651   3rd Qu.: 2330.00   3rd Qu.:  72968   3rd Qu.:   1291  
##  Max.   :5105067   Max.   :88018.40   Max.   :3204947   Max.   :1317147  
##  popamerindian         asian           popother        percwhite    
##  Min.   :    4.0   Min.   :     0   Min.   :     0   Min.   :10.69  
##  1st Qu.:   44.0   1st Qu.:    35   1st Qu.:    20   1st Qu.:94.89  
##  Median :   94.0   Median :   102   Median :    66   Median :98.03  
##  Mean   :  343.1   Mean   :  1310   Mean   :  1613   Mean   :95.56  
##  3rd Qu.:  288.0   3rd Qu.:   401   3rd Qu.:   345   3rd Qu.:99.07  
##  Max.   :10289.0   Max.   :188565   Max.   :384119   Max.   :99.82  
##    percblack       percamerindan        percasian        percother      
##  Min.   : 0.0000   Min.   : 0.05623   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.: 0.1157   1st Qu.: 0.15793   1st Qu.:0.1737   1st Qu.:0.09102  
##  Median : 0.5390   Median : 0.21502   Median :0.2972   Median :0.17844  
##  Mean   : 2.6763   Mean   : 0.79894   Mean   :0.4872   Mean   :0.47906  
##  3rd Qu.: 2.6014   3rd Qu.: 0.38362   3rd Qu.:0.5212   3rd Qu.:0.48050  
##  Max.   :40.2100   Max.   :89.17738   Max.   :5.0705   Max.   :7.52427  
##    popadults          perchsd        percollege        percprof      
##  Min.   :   1287   Min.   :46.91   Min.   : 7.336   Min.   : 0.5203  
##  1st Qu.:  12271   1st Qu.:71.33   1st Qu.:14.114   1st Qu.: 2.9980  
##  Median :  22188   Median :74.25   Median :16.798   Median : 3.8142  
##  Mean   :  60973   Mean   :73.97   Mean   :18.273   Mean   : 4.4473  
##  3rd Qu.:  47541   3rd Qu.:77.20   3rd Qu.:20.550   3rd Qu.: 4.9493  
##  Max.   :3291995   Max.   :88.90   Max.   :48.079   Max.   :20.7913  
##  poppovertyknown   percpovertyknown percbelowpoverty percchildbelowpovert
##  Min.   :   1696   Min.   :80.90    Min.   : 2.180   Min.   : 1.919      
##  1st Qu.:  18364   1st Qu.:96.89    1st Qu.: 9.199   1st Qu.:11.624      
##  Median :  33788   Median :98.17    Median :11.822   Median :15.270      
##  Mean   :  93642   Mean   :97.11    Mean   :12.511   Mean   :16.447      
##  3rd Qu.:  72840   3rd Qu.:98.60    3rd Qu.:15.133   3rd Qu.:20.352      
##  Max.   :5023523   Max.   :99.86    Max.   :48.691   Max.   :64.308      
##  percadultpoverty percelderlypoverty    inmetro         category        
##  Min.   : 1.938   Min.   : 3.547     Min.   :0.0000   Length:437        
##  1st Qu.: 7.668   1st Qu.: 8.912     1st Qu.:0.0000   Class :character  
##  Median :10.008   Median :10.869     Median :0.0000   Mode  :character  
##  Mean   :10.919   Mean   :11.389     Mean   :0.3432                     
##  3rd Qu.:13.182   3rd Qu.:13.412     3rd Qu.:1.0000                     
##  Max.   :43.312   Max.   :31.162     Max.   :1.0000

Q3. total,asian변수를 이용해 ‘전체 인구 대비 아시아 인구 백분율’ 파생 변수를 만들고, 히스토그램을 만들어 도시들이 어떻게 분호하는지 살펴보세요.

library("dplyr")
midwest$percentage_of_asian <- (midwest$asian/midwest$total)*100
hist(midwest$percentage_of_asian)

Q4. 아시아 인구 백분율 전체 평균을 구하고 평균을 초과하면 “large”, 그외에는 “small”을 부여하는 파생변수 생성

mean(midwest$percentage_of_asian)
## [1] 0.4872462
midwest$test <- ifelse(midwest$percentage_of_asian > 0.4872462, "large", "small")
table(midwest$test)
## 
## large small 
##   119   318

Q5. “large”“small”에 해당하는 지역이 얼마나 되는지 빈도표와 빈도 막대 그래프 만들어서 확인.

library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following object is masked _by_ '.GlobalEnv':
## 
##     midwest
table(midwest$test)
## 
## large small 
##   119   318
qplot(midwest$test)