tidyverse + dplyr = easy code reader

Jie Zou

2021-04-01

Read table

the data is extracted from http://www.cuny.edu/about/alumni-students-faculty/faculty/distinguished-professors/. it shows the records of professors who teach in CUNY. I already preprocess data a little

data <- read.csv("https://raw.githubusercontent.com/Sugarcane-svg/R/main/R607/Assignments/a6/professors_in_cuny.csv")
datatable(data)

Tidy data

the data is already clean actually because I didn’t extract complicated information, just name, department, email and office phone number. However, there are some some prefessors who did not provide office phone number, therefore, we’re going to remove these data.

data1 <- data %>%
  filter(!is.na(office_phone))

Some Analysis Example

Here, we are going to perform some simple analysis based on the “clean” data

  • how many distinct colleges are listed and how many professors are shown in those colleges?

    count() = group_by() + sum()

as we can see from the calculation, there are 15 distinct colleges listed, and for the individual college, the number of professrs is shown below under column name [n]

head(data1 %>%
  count(college))
##                            college  n
## 1                   Baruch College  7
## 2                 Brooklyn College  6
## 3         College of Staten Island  4
## 4             CUNY Graduate Center 42
## 5               CUNY School of Law  1
## 6 Graduate School of Public Health  2
  • are there popular departments(department with more than three distinguished professors)?

    filter() = eleminate rows with the condition(s) you provide

there are 8 departments are considered popular in the case. However, I cannot believe there is no science, and English departmemt is the most outstanding one based on the result.

a <- data1 %>% 
  count(department)

a %>% filter(n > 3)
##    department  n
## 1     English 12
## 2     History 11
## 3 Mathematics  6
## 4       Music  4
## 5  Philosophy  5
## 6     Physics  4
## 7  Psychology  5
## 8   Sociology  6
  • decide who is working in the graducate center and show the name of professors and the status?

    mutate() = add a column and fill in data

    select() = specify which column you want to see

data1 <- data1 %>%
  mutate(work_in_grad_center = ifelse(college == "CUNY Graduate Center", "yes", "no"))

b <- data1 %>% 
  select(name, work_in_grad_center)

datatable(b)
  • what is the percentage of those who work in grad center and who don’t
b %>%
  count(work_in_grad_center) %>%
  mutate(percentage = n/sum(n))
##   work_in_grad_center  n percentage
## 1                  no 73  0.6347826
## 2                 yes 42  0.3652174
  • sort by name(aphabetic)?

    arrange() = arrange the order

head(data1 %>% 
       select(name)%>%
       arrange(name))
##               name
## 1 Alexandra Juhasz
## 2 Alison Griffiths
## 3     Andre Aciman
## 4 Anthony Tamburri
## 5     Arthur Apter
## 6    Azriel Genack