Read data

ncku<-read.csv(file="C:/Users/user/Desktop/ncku_prof_V6.csv",header=TRUE, stringsAsFactors = TRUE) #除了數值,其他都轉成factor
View(ncku)
str(ncku) #structure of data
## 'data.frame':    460 obs. of  14 variables:
##  $ ID      : int  10001 10002 10003 10004 10005 10006 10007 10008 10009 10010 ...
##  $ Initial : Factor w/ 347 levels "BCT","BHC","BLC",..: 308 60 81 90 145 201 293 176 198 276 ...
##  $ Citation: int  305 355 3452 15808 280 2506 672 5735 1118 685 ...
##  $ H.id    : int  9 11 10 65 10 22 14 40 19 14 ...
##  $ Gender  : Factor w/ 2 levels "F","M": 2 2 2 2 1 2 2 2 2 2 ...
##  $ Degree  : Factor w/ 2 levels "D","O": 1 1 1 2 2 1 1 1 2 1 ...
##  $ Rank    : int  3 2 1 1 2 2 1 1 2 1 ...
##  $ College : Factor w/ 5 levels "ENG","LIB","MNG",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Dept    : Factor w/ 25 levels "ACC","BAD","CEN",..: 10 10 10 10 10 10 10 10 10 10 ...
##  $ Grads   : int  3 10 0 92 25 41 36 54 74 195 ...
##  $ FPY     : int  2013 2008 2011 1997 2011 2002 2008 2001 1994 1991 ...
##  $ Articles: int  30 22 14 349 23 90 36 123 26 70 ...
##  $ StuApp  : int  169 169 169 169 169 169 169 169 169 169 ...
##  $ Colprof : int  309 309 309 309 309 309 309 309 309 309 ...
summary(ncku) #summary of the data
##        ID           Initial       Citation             H.id      Gender 
##  Min.   :10001   CHL    :  9   Min.   :    0.00   Min.   : 0.0   F:111  
##  1st Qu.:10116   CCL    :  8   1st Qu.:   51.75   1st Qu.: 3.0   M:349  
##  Median :10230   YCL    :  7   Median :  384.00   Median :10.0          
##  Mean   :10280   CHC    :  5   Mean   : 1200.68   Mean   :12.8          
##  3rd Qu.:10387   CYC    :  5   3rd Qu.: 1271.50   3rd Qu.:19.0          
##  Max.   :10668   CCC    :  4   Max.   :33845.00   Max.   :92.0          
##                  (Other):422                                            
##  Degree       Rank       College        Dept         Grads       
##  D:138   Min.   :1.000   ENG:184   MEN    : 46   Min.   :  0.00  
##  O:322   1st Qu.:1.000   LIB: 63   CEN    : 36   1st Qu.:  8.00  
##          Median :1.000   MNG: 84   MAT    : 34   Median : 26.00  
##          Mean   :1.633   SCI: 72   MSE    : 30   Mean   : 44.29  
##          3rd Qu.:2.000   SSC: 57   FLL    : 25   3rd Qu.: 66.25  
##          Max.   :3.000             ESC    : 24   Max.   :374.00  
##                                    (Other):265                   
##       FPY          Articles          StuApp          Colprof     
##  Min.   :1979   Min.   :  0.00   Min.   :  8.00   Min.   : 77.0  
##  1st Qu.:1994   1st Qu.: 12.00   1st Qu.: 23.00   1st Qu.:107.0  
##  Median :2002   Median : 26.00   Median : 45.00   Median :140.0  
##  Mean   :2001   Mean   : 45.48   Mean   : 77.64   Mean   :190.3  
##  3rd Qu.:2008   3rd Qu.: 60.00   3rd Qu.:140.00   3rd Qu.:309.0  
##  Max.   :2021   Max.   :481.00   Max.   :215.00   Max.   :309.0  
##  NA's   :2
ID
Initial教授名字第一個字母
Citation論文引用次數
H.id論文引用指數
Gender
Degree學位國籍
Rank職等
College學院(ENG工學院、LIB文學院、SCI理學院、SSC社會科學院、MNG管理學院)
Dept系所
Grads指導學生畢業人數
FPY第一篇期刊論文發表年
Articles期刊篇數
StuApp系所招生人數
Colprof學院專職教師數

Assessment 1 Using pipes, subset the professors data to include H.id higher than 12 and retain only the columns H.id, Gender, College, Rank, Degree, and Grads. Show the last six rows of the data.

a1<-ncku %>%filter(H.id > 12) %>%select(H.id,Gender,College,Rank,Degree,Grads)
tail(a1,6) # Show the last six rows of the data.
##     H.id Gender College Rank Degree Grads
## 187   14      F     MNG    2      D    17
## 188   13      M     MNG    1      O    89
## 189   15      M     MNG    2      D    27
## 190   17      M     MNG    1      D    26
## 191   27      M     MNG    1      D    23
## 192   14      M     MNG    2      D     9

Assessment 2 Create a new data frame (newdta) from the professors data that meets the following criteria: 1. retain H.id, Gender, Degree, Rank, Grads columns and create two new columns called ‘academicy’ for the academic years and ‘Grads_m’ for the average of graduate students for each academic year. Show only the first six rows of the new dataframe.

a2<-ncku %>%filter(!is.na(FPY)) %>%
  mutate(academicy = 2022 - FPY,
         Grads_m = Grads / academicy) #‘academicy’ for the academic years and ‘Grads_m’ for the average of graduate students

View(a2)
newdta<-a2[,c(4,5,6,7,10,15,16)]
  
head(newdta,6) # Show only the first six rows of the new dataframe
##   H.id Gender Degree Rank Grads academicy   Grads_m
## 1    9      M      D    3     3         9 0.3333333
## 2   11      M      D    2    10        14 0.7142857
## 3   10      M      D    1     0        11 0.0000000
## 4   65      M      O    1    92        25 3.6800000
## 5   10      F      O    2    25        11 2.2727273
## 6   22      M      D    2    41        20 2.0500000

Assessment3 利用group_by()和summarize()找出各個College、Gender、Rank和Degree的H.id平均數、標準差、變異數、最小值和最大值,以及各組別的人數, 再以各學院的H.id平均數由大至小排列。根據上述結果,試回答下列問題:

ncku %>%
  group_by(College) %>%
  summarize(mean_H.id = mean(H.id, na.rm = TRUE),
            sd_H.id = sd(H.id, na.rm = TRUE), var_H.id=var(H.id, na.rm = TRUE),min_H.id=min(H.id, na.rm = TRUE),max_H.id=max(H.id, na.rm = TRUE))%>% arrange(desc(mean_H.id))
## # A tibble: 5 x 6
##   College mean_H.id sd_H.id var_H.id min_H.id max_H.id
##   <fct>       <dbl>   <dbl>    <dbl>    <int>    <int>
## 1 ENG         19.9    12.3     151.         2       92
## 2 SCI         13.5    12.3     151.         1       58
## 3 MNG          9.32    7.64     58.4        0       39
## 4 SSC          6.56    7.01     49.1        0       32
## 5 LIB          1.52    3.19     10.2        0       21
ncku %>%
  group_by(Gender) %>%
  summarize(mean_H.id = mean(H.id, na.rm = TRUE),
            sd_H.id = sd(H.id, na.rm = TRUE), var_H.id=var(H.id, na.rm = TRUE),min_H.id=min(H.id, na.rm = TRUE),max_H.id=max(H.id, na.rm = TRUE))%>% arrange(desc(mean_H.id))
## # A tibble: 2 x 6
##   Gender mean_H.id sd_H.id var_H.id min_H.id max_H.id
##   <fct>      <dbl>   <dbl>    <dbl>    <int>    <int>
## 1 M          14.4    12.6     158.         0       92
## 2 F           7.82    8.78     77.1        0       46
ncku %>%
  group_by(Rank) %>%
  summarize(mean_H.id = mean(H.id, na.rm = TRUE),
            sd_H.id = sd(H.id, na.rm = TRUE), var_H.id=var(H.id, na.rm = TRUE),min_H.id=min(H.id, na.rm = TRUE),max_H.id=max(H.id, na.rm = TRUE))%>% arrange(desc(mean_H.id))
## # A tibble: 3 x 6
##    Rank mean_H.id sd_H.id var_H.id min_H.id max_H.id
##   <int>     <dbl>   <dbl>    <dbl>    <int>    <int>
## 1     1     17.5    13.6     186.         0       92
## 2     2      8.18    7.45     55.5        0       40
## 3     3      6.59    6.77     45.8        0       37
ncku %>%
  group_by(Degree) %>%
  summarize(mean_H.id = mean(H.id, na.rm = TRUE),
            sd_H.id = sd(H.id, na.rm = TRUE), var_H.id=var(H.id, na.rm = TRUE),min_H.id=min(H.id, na.rm = TRUE),max_H.id=max(H.id, na.rm = TRUE))%>% arrange(desc(mean_H.id))
## # A tibble: 2 x 6
##   Degree mean_H.id sd_H.id var_H.id min_H.id max_H.id
##   <fct>      <dbl>   <dbl>    <dbl>    <int>    <int>
## 1 D           13.1    11.3     127.        0       54
## 2 O           12.7    12.4     154.        0       92
table(ncku$College)
## 
## ENG LIB MNG SCI SSC 
## 184  63  84  72  57
table(ncku$Gender)
## 
##   F   M 
## 111 349
table(ncku$Rank)
## 
##   1   2   3 
## 240 149  71
table(ncku$Degree)
## 
##   D   O 
## 138 322

2.1 H.id平均數最高和H.id平均數最低的群組特質為何?

ncku %>%
  group_by(College,Gender,Rank,Degree) %>%
  summarize(mean_H.id = mean(H.id, na.rm = TRUE),
            sd_H.id = sd(H.id, na.rm = TRUE), var_H.id=var(H.id, na.rm = TRUE),min_H.id=min(H.id, na.rm = TRUE),max_H.id=max(H.id, na.rm = TRUE))%>% arrange(desc(mean_H.id))
## `summarise()` has grouped output by 'College', 'Gender', 'Rank'. You can
## override using the `.groups` argument.
## # A tibble: 53 x 9
## # Groups:   College, Gender, Rank [30]
##    College Gender  Rank Degree mean_H.id sd_H.id var_H.id min_H.id max_H.id
##    <fct>   <fct>  <int> <fct>      <dbl>   <dbl>    <dbl>    <int>    <int>
##  1 ENG     F          1 D           34     10.4     108         28       46
##  2 ENG     M          1 D           24.4   11.0     121.         6       54
##  3 ENG     M          1 O           24.2   13.9     192.         3       92
##  4 SCI     M          1 D           21     16.1     258          6       39
##  5 ENG     F          1 O           19.5    9.71     94.3       10       32
##  6 SCI     M          1 O           18.8   15.2     231.         3       58
##  7 SCI     F          1 O           18.2   14.4     206.         3       34
##  8 ENG     M          2 D           17.3    6.81     46.4       10       40
##  9 MNG     M          1 D           16      6.48     42          8       27
## 10 MNG     F          1 O           15.2    8.54     72.9        8       27
## # ... with 43 more rows

最高組別為 ENG,F,1,D

最低組別為 LIB,M,3,D

2.2 工學院男教授的平均學術產能不及工學院女教授。此論述是否恰當,就學院之教授人數提出你的看法。

ncku %>%
  group_by(College,Gender) %>%
  summarize(mean_H.id = mean(H.id,na.rm=TRUE)) %>%
  arrange(desc(mean_H.id))
## `summarise()` has grouped output by 'College'. You can override using the
## `.groups` argument.
## # A tibble: 10 x 3
## # Groups:   College [5]
##    College Gender mean_H.id
##    <fct>   <fct>      <dbl>
##  1 ENG     M          20.2 
##  2 ENG     F          16.9 
##  3 SCI     F          13.7 
##  4 SCI     M          13.5 
##  5 MNG     M          10.3 
##  6 SSC     F           7.36
##  7 MNG     F           6.96
##  8 SSC     M           6.06
##  9 LIB     F           1.76
## 10 LIB     M           1.24
table(ncku$College,ncku$Gender)
##      
##         F   M
##   ENG  17 167
##   LIB  34  29
##   MNG  24  60
##   SCI  14  58
##   SSC  22  35
ncku_college<-split(ncku,ncku$College)

t.test(H.id ~ Gender, data = ncku_college$"ENG")
## 
##  Welch Two Sample t-test
## 
## data:  H.id by Gender
## t = -1.1474, df = 20.155, p-value = 0.2647
## alternative hypothesis: true difference in means between group F and group M is not equal to 0
## 95 percent confidence interval:
##  -9.339458  2.708954
## sample estimates:
## mean in group F mean in group M 
##        16.88235        20.19760

此敘述不太洽當,人數比有點懸殊。且兩者的T-TEST結果不顯著(P>.05),不能這樣說!!

2.3 針對文學院教授的學術產能提出至少一項論述,並說明你的理由。

summary(ncku_college$"LIB"$H.id,na.rm=TRUE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   1.524   2.000  21.000
hist(ncku_college$"LIB"$H.id, breaks = 20)

結果表明,文學院的學術產能有點低落,中位數為0,表明有一半以上的人,學術產程僅0,長條圖也能很好的看出這點。

Assessment 4 請製作下列三個表格,根據結果,寫出三個結論。

a4 <- ncku %>% select(College,Gender,Rank,Degree)
tbl_summary(a4,by=College)
Characteristic ENG, N = 1841 LIB, N = 631 MNG, N = 841 SCI, N = 721 SSC, N = 571
Gender
F 17 (9.2%) 34 (54%) 24 (29%) 14 (19%) 22 (39%)
M 167 (91%) 29 (46%) 60 (71%) 58 (81%) 35 (61%)
Rank
1 111 (60%) 29 (46%) 36 (43%) 36 (50%) 28 (49%)
2 44 (24%) 29 (46%) 27 (32%) 27 (38%) 22 (39%)
3 29 (16%) 5 (7.9%) 21 (25%) 9 (12%) 7 (12%)
Degree
D 63 (34%) 21 (33%) 25 (30%) 16 (22%) 13 (23%)
O 121 (66%) 42 (67%) 59 (70%) 56 (78%) 44 (77%)
1 n (%)

O(國外)的教授在各學院都比D(國內)的教授多。

1(正)教授在各學院都比2(副)教授或3(助理)教授多。

除了文學院以外,M(男)教授在其餘學院都比(F)女教授多。