library(readr)
dta <- read.csv("C:/RStudio/ncku_prof_V6.csv", h=T, stringsAsFactors = TRUE)
head(dta)
##      ID Initial Citation H.id Gender Degree Rank College Dept Grads  FPY
## 1 10001     YCC      305    9      M      D    3     ENG  ESC     3 2013
## 2 10002     CYC      355   11      M      D    2     ENG  ESC    10 2008
## 3 10003     HBC     3452   10      M      D    1     ENG  ESC     0 2011
## 4 10004     HHC    15808   65      M      O    1     ENG  ESC    92 1997
## 5 10005     JSC      280   10      F      O    2     ENG  ESC    25 2011
## 6 10006     MYC     2506   22      M      D    2     ENG  ESC    41 2002
##   Articles StuApp Colprof
## 1       30    169     309
## 2       22    169     309
## 3       14    169     309
## 4      349    169     309
## 5       23    169     309
## 6       90    169     309
str(dta)
## 'data.frame':    460 obs. of  14 variables:
##  $ ID      : int  10001 10002 10003 10004 10005 10006 10007 10008 10009 10010 ...
##  $ Initial : Factor w/ 347 levels "BCT","BHC","BLC",..: 308 60 81 90 145 201 293 176 198 276 ...
##  $ Citation: int  305 355 3452 15808 280 2506 672 5735 1118 685 ...
##  $ H.id    : int  9 11 10 65 10 22 14 40 19 14 ...
##  $ Gender  : Factor w/ 2 levels "F","M": 2 2 2 2 1 2 2 2 2 2 ...
##  $ Degree  : Factor w/ 2 levels "D","O": 1 1 1 2 2 1 1 1 2 1 ...
##  $ Rank    : int  3 2 1 1 2 2 1 1 2 1 ...
##  $ College : Factor w/ 5 levels "ENG","LIB","MNG",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Dept    : Factor w/ 25 levels "ACC","BAD","CEN",..: 10 10 10 10 10 10 10 10 10 10 ...
##  $ Grads   : int  3 10 0 92 25 41 36 54 74 195 ...
##  $ FPY     : int  2013 2008 2011 1997 2011 2002 2008 2001 1994 1991 ...
##  $ Articles: int  30 22 14 349 23 90 36 123 26 70 ...
##  $ StuApp  : int  169 169 169 169 169 169 169 169 169 169 ...
##  $ Colprof : int  309 309 309 309 309 309 309 309 309 309 ...

Assessment 1

選取H.id中數值大於12的資料,並篩選出H.id, Gender, College, Rank, Degree,和 Grads 等變項。

dta1 <- dta %>%
    filter(H.id > 12) %>%
    select(H.id, Gender, College, Rank, Degree, Grads)
tail(dta1)
##     H.id Gender College Rank Degree Grads
## 187   14      F     MNG    2      D    17
## 188   13      M     MNG    1      O    89
## 189   15      M     MNG    2      D    27
## 190   17      M     MNG    1      D    26
## 191   27      M     MNG    1      D    23
## 192   14      M     MNG    2      D     9

Assessment 2

選取dta中H.id, Gender, Degree, Rank, Grads等變項資料,接著創造2個新變項,分別為計算總學年數的”academicy”和計算每學年研究生平均人數的”Grads_m”。

dta2 <- dta %>%
    select(H.id, Gender, Degree, Rank, Grads) %>%
    mutate(academicy = 2022 - dta$FPY,
           Grads_m = Grads / academicy)
head(dta2)
##   H.id Gender Degree Rank Grads academicy   Grads_m
## 1    9      M      D    3     3         9 0.3333333
## 2   11      M      D    2    10        14 0.7142857
## 3   10      M      D    1     0        11 0.0000000
## 4   65      M      O    1    92        25 3.6800000
## 5   10      F      O    2    25        11 2.2727273
## 6   22      M      D    2    41        20 2.0500000

Assessment 3

排序規範預設都是遞增排序,如果想改為遞減排序,就在變數名稱外增加desc()。

dta3 <- dta %>%
    group_by(College, Gender, Rank, Degree) %>%
    summarize(mean_H.id = mean(H.id, na.rm = TRUE),
              sd_H.id = sd(H.id),
              var_H.id = var(H.id),
              max_H.id = max(H.id),
              min_H.id = max(H.id),
              count = n()) %>%
    arrange(desc(mean_H.id))
## `summarise()` has grouped output by 'College', 'Gender', 'Rank'. You can
## override using the `.groups` argument.
head(dta3)
## # A tibble: 6 × 10
## # Groups:   College, Gender, Rank [3]
##   College Gender  Rank Degree mean_H.id sd_H.id var_H.id max_H.id min_H.id count
##   <fct>   <fct>  <int> <fct>      <dbl>   <dbl>    <dbl>    <int>    <int> <int>
## 1 ENG     F          1 D           34     10.4     108         46       46     3
## 2 ENG     M          1 D           24.4   11.0     121.        54       54    28
## 3 ENG     M          1 O           24.2   13.9     192.        92       92    76
## 4 SCI     M          1 D           21     16.1     258         39       39     4
## 5 ENG     F          1 O           19.5    9.71     94.3       32       32     4
## 6 SCI     M          1 O           18.8   15.2     231.        58       58    24
tail(dta3)
## # A tibble: 6 × 10
## # Groups:   College, Gender, Rank [5]
##   College Gender  Rank Degree mean_H.id sd_H.id var_H.id max_H.id min_H.id count
##   <fct>   <fct>  <int> <fct>      <dbl>   <dbl>    <dbl>    <int>    <int> <int>
## 1 LIB     F          3 O          0.5     0.707    0.5          1        1     2
## 2 LIB     M          2 D          0.167   0.408    0.167        1        1     6
## 3 LIB     F          2 D          0       0        0            0        0     6
## 4 LIB     F          3 D          0       0        0            0        0     2
## 5 LIB     M          1 D          0       0        0            0        0     4
## 6 LIB     M          3 D          0      NA       NA            0        0     1

(1)以filter(dta, Gender == “F”)學院(College)來說,「LIB文學院」教授的H.id較「ENG工學院」和「SCI理學院」偏低。以職等(Rank)來說,職等較高的教授其H.id也會較高,但仍可以從後6筆資料看出有例外。最後從學位國籍(Degree)來看,H.id高的前6筆中,國外和國內各佔3筆,然而後6筆則有5筆皆是國內學位,儘管如此仍不能說明取得國外學位會有較好的H.id。

dta3_2 <- filter(dta3, College == "ENG")
dta3_2 <- select(dta3_2, College, Gender, mean_H.id)
## Adding missing grouping variables: `Rank`
show(dta3_2)
## # A tibble: 10 × 4
## # Groups:   College, Gender, Rank [6]
##     Rank College Gender mean_H.id
##    <int> <fct>   <fct>      <dbl>
##  1     1 ENG     F          34   
##  2     1 ENG     M          24.4 
##  3     1 ENG     M          24.2 
##  4     1 ENG     F          19.5 
##  5     2 ENG     M          17.3 
##  6     2 ENG     F          13   
##  7     3 ENG     M          12.4 
##  8     3 ENG     M          12   
##  9     2 ENG     M          11.2 
## 10     3 ENG     F           9.17

(2)單獨拉出含有工學院的資料,再看性別的部分,並沒有發現女教授的H.id都很高,只有第一位女教授的H.id特別突出。因此無法單從前五比資料去說明男教授的平均學術產能不及女教授。

dta3_3 <- filter(dta3, College == "LIB")
dta3_3 <- select(dta3_3, College, Gender, mean_H.id)
## Adding missing grouping variables: `Rank`
show(dta3_3)
## # A tibble: 11 × 4
## # Groups:   College, Gender, Rank [6]
##     Rank College Gender mean_H.id
##    <int> <fct>   <fct>      <dbl>
##  1     1 LIB     M          3.86 
##  2     1 LIB     F          3.19 
##  3     1 LIB     F          1.5  
##  4     2 LIB     F          0.833
##  5     2 LIB     M          0.727
##  6     3 LIB     F          0.5  
##  7     2 LIB     M          0.167
##  8     2 LIB     F          0    
##  9     3 LIB     F          0    
## 10     1 LIB     M          0    
## 11     3 LIB     M          0

(3)從前面的後6筆資料和dta3_3的資料中可以看出,只有文學院的教授的mean_H.id有出現0。

Assessment 4

dta %>%
  select(College, Gender, Degree, Rank) %>%
  tbl_summary(by = College)
## Warning: The `fmt_missing()` function is deprecated and will soon be removed
## * Use the `sub_missing()` function instead
Characteristic ENG, N = 1841 LIB, N = 631 MNG, N = 841 SCI, N = 721 SSC, N = 571
Gender
F 17 (9.2%) 34 (54%) 24 (29%) 14 (19%) 22 (39%)
M 167 (91%) 29 (46%) 60 (71%) 58 (81%) 35 (61%)
Degree
D 63 (34%) 21 (33%) 25 (30%) 16 (22%) 13 (23%)
O 121 (66%) 42 (67%) 59 (70%) 56 (78%) 44 (77%)
Rank
1 111 (60%) 29 (46%) 36 (43%) 36 (50%) 28 (49%)
2 44 (24%) 29 (46%) 27 (32%) 27 (38%) 22 (39%)
3 29 (16%) 5 (7.9%) 21 (25%) 9 (12%) 7 (12%)
1 n (%)
  1. 從Gender中可以看出,男教授的學術產值普遍比女教授高,只有文學院例外。
  2. 從Degree中可以看出,有國外學歷的教授,其學術產值較國內學歷的教授高。
  3. 從Rank中可以看出,排名越高的教授,其學術產值通常也越好。