Use the Prestige{car} data set for this problem.
Find the median prestige score for each of the three types of occupation, respectively.
Use the median score in each type of occupation to define two levels of prestige: High and low, for each occupation, respectively.
Summarize the relationship between income and education for each category generated from crossing the factor prestige with the type of occupation.
library(car)
#> Loading required package: carData
library(carData)
str(Prestige)
#> 'data.frame': 102 obs. of 6 variables:
#> $ education: num 13.1 12.3 12.8 11.4 14.6 ...
#> $ income : int 12351 25879 9271 8865 8403 11030 8258 14163 11377 11023 ...
#> $ women : num 11.16 4.02 15.7 9.11 11.68 ...
#> $ prestige : num 68.8 69.1 63.4 56.8 73.5 77.6 72.6 78.1 73.1 68.8 ...
#> $ census : int 1113 1130 1171 1175 2111 2113 2133 2141 2143 2153 ...
#> $ type : Factor w/ 3 levels "bc","prof","wc": 2 2 2 2 2 2 2 2 2 2 ...先把 Prestige 裡的 prestige 資料依 type 分組取出,再計算各組描述統計。
dta <- Prestige
dta1 <- split(Prestige, Prestige$type)
lapply(dta1, function(x) median(x$prestige))
#> $bc
#> [1] 35.9
#>
#> $prof
#> [1] 68.4
#>
#> $wc
#> [1] 41.5得到不同類型的中位數。
prestige 分為 High、Low 二類先了解三個類型 prestige 的數值範圍。
lapply(dta1, function(x) summary(x$prestige))
#> $bc
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 17.30 27.10 35.90 35.53 42.60 54.90
#>
#> $prof
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 53.80 61.00 68.40 67.85 72.95 87.20
#>
#> $wc
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 26.50 35.90 41.50 42.24 47.50 67.50可以發現,三個類型的 prestige 數值都介於 17 到 88 之間。
**
將三個類型各自依其中位數分類。
dta1$bc$level <- with(dta1$bc, cut(prestige, ordered=T, breaks=c(0, 35.9, 100), labels=c("Low", "High")))
dta1$prof$level <- with(dta1$prof, cut(prestige, ordered=T, breaks= c(0, 68.4, 100), labels=c("Low", "High")))
dta1$wc$level <- with(dta1$wc, cut(prestige, ordered=T, breaks= c(0, 41.5, 100), labels=c("Low", "High")))prestige 高低類群將資料分為 6 組這 6 組的 education 和 income 的關係分別如何呢?
res_bc <- aggregate(cbind(education, income) ~ level, data = dta1$bc, FUN = mean)
res_bc
#> level education income
#> 1 Low 7.870417 4087.125
#> 2 High 8.946000 6918.550
res_prof <- aggregate(cbind(education, income) ~ level, data = dta1$prof, FUN = mean)
res_prof
#> level education income
#> 1 Low 13.49375 8762.062
#> 2 High 14.71400 12476.667
res_wc <- aggregate(cbind(education, income) ~ level, data = dta1$wc, FUN = mean)
res_wc
#> level education income
#> 1 Low 10.56250 4751.667
#> 2 High 11.52273 5380.273將結果視覺化。
這裡將三個類型的高低分類分布圖之座標軸尺度設定為,涵蓋 Prestige 資料之 education 和 income 的最大值與最小值。
summary(Prestige$education)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 6.380 8.445 10.540 10.738 12.648 15.970
summary(Prestige$income)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 611 4106 5930 6798 8187 25879library(lattice)
xyplot(education ~ income | level, data=dta1$bc, type=c("g","p","r"), xlab="Income", xlim= c(600, 26000), ylab="Education", ylim= c(6, 16), main="Relationship between Income & Education in Type BC")
xyplot(education ~ income | level, data=dta1$prof, type=c("g","p","r"), xlab="Income", xlim= c(600, 26000), ylab="Education", ylim= c(6, 16), main="Relationship between Income & Education in Type PROF")
xyplot(education ~ income | level, data=dta1$wc, type=c("g","p","r"), xlab="Income", xlim= c(600, 26000), ylab="Education", ylim= c(6, 16), main="Relationship between Income & Education in Type WC")