Data wrangling: Homework 3

讀資料,檢查資料結構

library(car)
#> Loading required package: carData
library(carData)
str(Prestige)
#> 'data.frame':    102 obs. of  6 variables:
#>  $ education: num  13.1 12.3 12.8 11.4 14.6 ...
#>  $ income   : int  12351 25879 9271 8865 8403 11030 8258 14163 11377 11023 ...
#>  $ women    : num  11.16 4.02 15.7 9.11 11.68 ...
#>  $ prestige : num  68.8 69.1 63.4 56.8 73.5 77.6 72.6 78.1 73.1 68.8 ...
#>  $ census   : int  1113 1130 1171 1175 2111 2113 2133 2141 2143 2153 ...
#>  $ type     : Factor w/ 3 levels "bc","prof","wc": 2 2 2 2 2 2 2 2 2 2 ...

三種類型的中位數

先把 Prestige 裡的 prestige 資料依 type 分組取出，再計算各組描述統計。

dta <- Prestige
dta1 <- split(Prestige, Prestige$type)
lapply(dta1, function(x) median(x$prestige))
#> $bc
#> [1] 35.9
#> 
#> $prof
#> [1] 68.4
#> 
#> $wc
#> [1] 41.5

得到不同類型的中位數。

利用三個類型的中位數將 `prestige` 分為 High、Low 二類

先了解三個類型 prestige 的數值範圍。

lapply(dta1, function(x) summary(x$prestige))
#> $bc
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   17.30   27.10   35.90   35.53   42.60   54.90 
#> 
#> $prof
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   53.80   61.00   68.40   67.85   72.95   87.20 
#> 
#> $wc
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   26.50   35.90   41.50   42.24   47.50   67.50

可以發現，三個類型的 prestige 數值都介於 17 到 88 之間。

**

將三個類型各自依其中位數分類。

dta1$bc$level <- with(dta1$bc, cut(prestige, ordered=T, breaks=c(0, 35.9, 100), labels=c("Low", "High")))
dta1$prof$level <- with(dta1$prof, cut(prestige, ordered=T, breaks= c(0, 68.4, 100), labels=c("Low", "High")))
dta1$wc$level <- with(dta1$wc, cut(prestige, ordered=T, breaks= c(0, 41.5, 100), labels=c("Low", "High")))

lapply(dta1, function(y) table(y$level))
#> $bc
#> 
#>  Low High 
#>   24   20 
#> 
#> $prof
#> 
#>  Low High 
#>   16   15 
#> 
#> $wc
#> 
#>  Low High 
#>   12   11

不同類型的 `prestige` 高低類群將資料分為 6 組

這 6 組的 education 和 income 的關係分別如何呢？

res_bc <- aggregate(cbind(education, income) ~ level, data = dta1$bc, FUN = mean)
res_bc
#>   level education   income
#> 1   Low  7.870417 4087.125
#> 2  High  8.946000 6918.550
res_prof <- aggregate(cbind(education, income) ~ level, data = dta1$prof, FUN = mean)
res_prof
#>   level education    income
#> 1   Low  13.49375  8762.062
#> 2  High  14.71400 12476.667
res_wc <- aggregate(cbind(education, income) ~ level, data = dta1$wc, FUN = mean)
res_wc
#>   level education   income
#> 1   Low  10.56250 4751.667
#> 2  High  11.52273 5380.273

將結果視覺化。

這裡將三個類型的高低分類分布圖之座標軸尺度設定為，涵蓋 Prestige 資料之 education 和 income 的最大值與最小值。

summary(Prestige$education)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   6.380   8.445  10.540  10.738  12.648  15.970
summary(Prestige$income)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>     611    4106    5930    6798    8187   25879

library(lattice)
xyplot(education ~ income | level, data=dta1$bc, type=c("g","p","r"), xlab="Income", xlim= c(600, 26000), ylab="Education", ylim= c(6, 16), main="Relationship between Income & Education in Type BC")


xyplot(education ~ income | level, data=dta1$prof, type=c("g","p","r"), xlab="Income", xlim= c(600, 26000), ylab="Education", ylim= c(6, 16), main="Relationship between Income & Education in Type PROF")


xyplot(education ~ income | level, data=dta1$wc, type=c("g","p","r"), xlab="Income", xlim= c(600, 26000), ylab="Education", ylim= c(6, 16), main="Relationship between Income & Education in Type WC")

Data wrangling: Homework 3

2020-Spring [Data Management] Instructor: SHEU, Ching-Fan

CHIU, Ming-Tzu

2020-04-13

讀資料,檢查資料結構

三種類型的中位數

利用三個類型的中位數將 `prestige` 分為 High、Low 二類

不同類型的 `prestige` 高低類群將資料分為 6 組

Data wrangling: Homework 3

2020-Spring [Data Management] Instructor: SHEU, Ching-Fan

CHIU, Ming-Tzu

2020-04-13

讀資料,檢查資料結構

三種類型的中位數

利用三個類型的中位數將 prestige 分為 High、Low 二類

不同類型的 prestige 高低類群將資料分為 6 組

利用三個類型的中位數將 `prestige` 分為 High、Low 二類

不同類型的 `prestige` 高低類群將資料分為 6 組