Read data
ncku<-read.csv(file="C:/Users/user/Desktop/ncku_prof_V6.csv",header=TRUE, stringsAsFactors = TRUE) #除了數值,其他都轉成factor
View(ncku)
str(ncku) #structure of data
## 'data.frame': 460 obs. of 14 variables:
## $ ID : int 10001 10002 10003 10004 10005 10006 10007 10008 10009 10010 ...
## $ Initial : Factor w/ 347 levels "BCT","BHC","BLC",..: 308 60 81 90 145 201 293 176 198 276 ...
## $ Citation: int 305 355 3452 15808 280 2506 672 5735 1118 685 ...
## $ H.id : int 9 11 10 65 10 22 14 40 19 14 ...
## $ Gender : Factor w/ 2 levels "F","M": 2 2 2 2 1 2 2 2 2 2 ...
## $ Degree : Factor w/ 2 levels "D","O": 1 1 1 2 2 1 1 1 2 1 ...
## $ Rank : int 3 2 1 1 2 2 1 1 2 1 ...
## $ College : Factor w/ 5 levels "ENG","LIB","MNG",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Dept : Factor w/ 25 levels "ACC","BAD","CEN",..: 10 10 10 10 10 10 10 10 10 10 ...
## $ Grads : int 3 10 0 92 25 41 36 54 74 195 ...
## $ FPY : int 2013 2008 2011 1997 2011 2002 2008 2001 1994 1991 ...
## $ Articles: int 30 22 14 349 23 90 36 123 26 70 ...
## $ StuApp : int 169 169 169 169 169 169 169 169 169 169 ...
## $ Colprof : int 309 309 309 309 309 309 309 309 309 309 ...
summary(ncku) #summary of the data
## ID Initial Citation H.id Gender
## Min. :10001 CHL : 9 Min. : 0.00 Min. : 0.0 F:111
## 1st Qu.:10116 CCL : 8 1st Qu.: 51.75 1st Qu.: 3.0 M:349
## Median :10230 YCL : 7 Median : 384.00 Median :10.0
## Mean :10280 CHC : 5 Mean : 1200.68 Mean :12.8
## 3rd Qu.:10387 CYC : 5 3rd Qu.: 1271.50 3rd Qu.:19.0
## Max. :10668 CCC : 4 Max. :33845.00 Max. :92.0
## (Other):422
## Degree Rank College Dept Grads
## D:138 Min. :1.000 ENG:184 MEN : 46 Min. : 0.00
## O:322 1st Qu.:1.000 LIB: 63 CEN : 36 1st Qu.: 8.00
## Median :1.000 MNG: 84 MAT : 34 Median : 26.00
## Mean :1.633 SCI: 72 MSE : 30 Mean : 44.29
## 3rd Qu.:2.000 SSC: 57 FLL : 25 3rd Qu.: 66.25
## Max. :3.000 ESC : 24 Max. :374.00
## (Other):265
## FPY Articles StuApp Colprof
## Min. :1979 Min. : 0.00 Min. : 8.00 Min. : 77.0
## 1st Qu.:1994 1st Qu.: 12.00 1st Qu.: 23.00 1st Qu.:107.0
## Median :2002 Median : 26.00 Median : 45.00 Median :140.0
## Mean :2001 Mean : 45.48 Mean : 77.64 Mean :190.3
## 3rd Qu.:2008 3rd Qu.: 60.00 3rd Qu.:140.00 3rd Qu.:309.0
## Max. :2021 Max. :481.00 Max. :215.00 Max. :309.0
## NA's :2
ID
Initial教授名字第一個字母
Citation論文引用次數
H.id論文引用指數
Gender
Degree學位國籍
Rank職等
College學院(ENG工學院、LIB文學院、SCI理學院、SSC社會科學院、MNG管理學院)
Dept系所
Grads指導學生畢業人數
FPY第一篇期刊論文發表年
Articles期刊篇數
StuApp系所招生人數
Colprof學院專職教師數
Assessment 1 Using pipes, subset the professors data to include H.id higher than 12 and retain only the columns H.id, Gender, College, Rank, Degree, and Grads. Show the last six rows of the data.
a1<-ncku %>%filter(H.id > 12) %>%select(H.id,Gender,College,Rank,Degree,Grads)
tail(a1,6) # Show the last six rows of the data.
## H.id Gender College Rank Degree Grads
## 187 14 F MNG 2 D 17
## 188 13 M MNG 1 O 89
## 189 15 M MNG 2 D 27
## 190 17 M MNG 1 D 26
## 191 27 M MNG 1 D 23
## 192 14 M MNG 2 D 9
Assessment 2 Create a new data frame (newdta) from the professors data that meets the following criteria: 1. retain H.id, Gender, Degree, Rank, Grads columns and create two new columns called ‘academicy’ for the academic years and ‘Grads_m’ for the average of graduate students for each academic year. Show only the first six rows of the new dataframe.
a2<-ncku %>%filter(!is.na(FPY)) %>%
mutate(academicy = 2022 - FPY,
Grads_m = Grads / academicy) #‘academicy’ for the academic years and ‘Grads_m’ for the average of graduate students
View(a2)
newdta<-a2[,c(4,5,6,7,10,15,16)]
head(newdta,6) # Show only the first six rows of the new dataframe
## H.id Gender Degree Rank Grads academicy Grads_m
## 1 9 M D 3 3 9 0.3333333
## 2 11 M D 2 10 14 0.7142857
## 3 10 M D 1 0 11 0.0000000
## 4 65 M O 1 92 25 3.6800000
## 5 10 F O 2 25 11 2.2727273
## 6 22 M D 2 41 20 2.0500000
Assessment3 利用group_by()和summarize()找出各個College、Gender、Rank和Degree的H.id平均數、標準差、變異數、最小值和最大值,以及各組別的人數, 再以各學院的H.id平均數由大至小排列。根據上述結果,試回答下列問題:
ncku %>%
group_by(College) %>%
summarize(mean_H.id = mean(H.id, na.rm = TRUE),
sd_H.id = sd(H.id, na.rm = TRUE), var_H.id=var(H.id, na.rm = TRUE),min_H.id=min(H.id, na.rm = TRUE),max_H.id=max(H.id, na.rm = TRUE))%>% arrange(desc(mean_H.id))
## # A tibble: 5 x 6
## College mean_H.id sd_H.id var_H.id min_H.id max_H.id
## <fct> <dbl> <dbl> <dbl> <int> <int>
## 1 ENG 19.9 12.3 151. 2 92
## 2 SCI 13.5 12.3 151. 1 58
## 3 MNG 9.32 7.64 58.4 0 39
## 4 SSC 6.56 7.01 49.1 0 32
## 5 LIB 1.52 3.19 10.2 0 21
ncku %>%
group_by(Gender) %>%
summarize(mean_H.id = mean(H.id, na.rm = TRUE),
sd_H.id = sd(H.id, na.rm = TRUE), var_H.id=var(H.id, na.rm = TRUE),min_H.id=min(H.id, na.rm = TRUE),max_H.id=max(H.id, na.rm = TRUE))%>% arrange(desc(mean_H.id))
## # A tibble: 2 x 6
## Gender mean_H.id sd_H.id var_H.id min_H.id max_H.id
## <fct> <dbl> <dbl> <dbl> <int> <int>
## 1 M 14.4 12.6 158. 0 92
## 2 F 7.82 8.78 77.1 0 46
ncku %>%
group_by(Rank) %>%
summarize(mean_H.id = mean(H.id, na.rm = TRUE),
sd_H.id = sd(H.id, na.rm = TRUE), var_H.id=var(H.id, na.rm = TRUE),min_H.id=min(H.id, na.rm = TRUE),max_H.id=max(H.id, na.rm = TRUE))%>% arrange(desc(mean_H.id))
## # A tibble: 3 x 6
## Rank mean_H.id sd_H.id var_H.id min_H.id max_H.id
## <int> <dbl> <dbl> <dbl> <int> <int>
## 1 1 17.5 13.6 186. 0 92
## 2 2 8.18 7.45 55.5 0 40
## 3 3 6.59 6.77 45.8 0 37
ncku %>%
group_by(Degree) %>%
summarize(mean_H.id = mean(H.id, na.rm = TRUE),
sd_H.id = sd(H.id, na.rm = TRUE), var_H.id=var(H.id, na.rm = TRUE),min_H.id=min(H.id, na.rm = TRUE),max_H.id=max(H.id, na.rm = TRUE))%>% arrange(desc(mean_H.id))
## # A tibble: 2 x 6
## Degree mean_H.id sd_H.id var_H.id min_H.id max_H.id
## <fct> <dbl> <dbl> <dbl> <int> <int>
## 1 D 13.1 11.3 127. 0 54
## 2 O 12.7 12.4 154. 0 92
table(ncku$College)
##
## ENG LIB MNG SCI SSC
## 184 63 84 72 57
table(ncku$Gender)
##
## F M
## 111 349
table(ncku$Rank)
##
## 1 2 3
## 240 149 71
table(ncku$Degree)
##
## D O
## 138 322
2.1 H.id平均數最高和H.id平均數最低的群組特質為何?
ncku %>%
group_by(College,Gender,Rank,Degree) %>%
summarize(mean_H.id = mean(H.id, na.rm = TRUE),
sd_H.id = sd(H.id, na.rm = TRUE), var_H.id=var(H.id, na.rm = TRUE),min_H.id=min(H.id, na.rm = TRUE),max_H.id=max(H.id, na.rm = TRUE))%>% arrange(desc(mean_H.id))
## `summarise()` has grouped output by 'College', 'Gender', 'Rank'. You can
## override using the `.groups` argument.
## # A tibble: 53 x 9
## # Groups: College, Gender, Rank [30]
## College Gender Rank Degree mean_H.id sd_H.id var_H.id min_H.id max_H.id
## <fct> <fct> <int> <fct> <dbl> <dbl> <dbl> <int> <int>
## 1 ENG F 1 D 34 10.4 108 28 46
## 2 ENG M 1 D 24.4 11.0 121. 6 54
## 3 ENG M 1 O 24.2 13.9 192. 3 92
## 4 SCI M 1 D 21 16.1 258 6 39
## 5 ENG F 1 O 19.5 9.71 94.3 10 32
## 6 SCI M 1 O 18.8 15.2 231. 3 58
## 7 SCI F 1 O 18.2 14.4 206. 3 34
## 8 ENG M 2 D 17.3 6.81 46.4 10 40
## 9 MNG M 1 D 16 6.48 42 8 27
## 10 MNG F 1 O 15.2 8.54 72.9 8 27
## # ... with 43 more rows
最高組別為 ENG,F,1,D
最低組別為 LIB,M,3,D
2.2 工學院男教授的平均學術產能不及工學院女教授。此論述是否恰當,就學院之教授人數提出你的看法。
ncku %>%
group_by(College,Gender) %>%
summarize(mean_H.id = mean(H.id,na.rm=TRUE)) %>%
arrange(desc(mean_H.id))
## `summarise()` has grouped output by 'College'. You can override using the
## `.groups` argument.
## # A tibble: 10 x 3
## # Groups: College [5]
## College Gender mean_H.id
## <fct> <fct> <dbl>
## 1 ENG M 20.2
## 2 ENG F 16.9
## 3 SCI F 13.7
## 4 SCI M 13.5
## 5 MNG M 10.3
## 6 SSC F 7.36
## 7 MNG F 6.96
## 8 SSC M 6.06
## 9 LIB F 1.76
## 10 LIB M 1.24
table(ncku$College,ncku$Gender)
##
## F M
## ENG 17 167
## LIB 34 29
## MNG 24 60
## SCI 14 58
## SSC 22 35
ncku_college<-split(ncku,ncku$College)
t.test(H.id ~ Gender, data = ncku_college$"ENG")
##
## Welch Two Sample t-test
##
## data: H.id by Gender
## t = -1.1474, df = 20.155, p-value = 0.2647
## alternative hypothesis: true difference in means between group F and group M is not equal to 0
## 95 percent confidence interval:
## -9.339458 2.708954
## sample estimates:
## mean in group F mean in group M
## 16.88235 20.19760
此敘述不太洽當,人數比有點懸殊。且兩者的T-TEST結果不顯著(P>.05),不能這樣說!!
2.3 針對文學院教授的學術產能提出至少一項論述,並說明你的理由。
summary(ncku_college$"LIB"$H.id,na.rm=TRUE)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 1.524 2.000 21.000
hist(ncku_college$"LIB"$H.id, breaks = 20)

結果表明,文學院的學術產能有點低落,中位數為0,表明有一半以上的人,學術產程僅0,長條圖也能很好的看出這點。
Assessment 4 請製作下列三個表格,根據結果,寫出三個結論。
a4 <- ncku %>% select(College,Gender,Rank,Degree)
tbl_summary(a4,by=College)
| Characteristic |
ENG, N = 184 |
LIB, N = 63 |
MNG, N = 84 |
SCI, N = 72 |
SSC, N = 57 |
| Gender |
|
|
|
|
|
| F |
17 (9.2%) |
34 (54%) |
24 (29%) |
14 (19%) |
22 (39%) |
| M |
167 (91%) |
29 (46%) |
60 (71%) |
58 (81%) |
35 (61%) |
| Rank |
|
|
|
|
|
| 1 |
111 (60%) |
29 (46%) |
36 (43%) |
36 (50%) |
28 (49%) |
| 2 |
44 (24%) |
29 (46%) |
27 (32%) |
27 (38%) |
22 (39%) |
| 3 |
29 (16%) |
5 (7.9%) |
21 (25%) |
9 (12%) |
7 (12%) |
| Degree |
|
|
|
|
|
| D |
63 (34%) |
21 (33%) |
25 (30%) |
16 (22%) |
13 (23%) |
| O |
121 (66%) |
42 (67%) |
59 (70%) |
56 (78%) |
44 (77%) |
O(國外)的教授在各學院都比D(國內)的教授多。
1(正)教授在各學院都比2(副)教授或3(助理)教授多。
除了文學院以外,M(男)教授在其餘學院都比(F)女教授多。