Overview

This is the second part of my Project 2 assignment for DATA607 in the Fall 2023 Term at CUNY SPS. In this assignment I import a wide data set, tidy it, and then analyze it. This second data set contains the distribution of doctoral degrees from US universities every 5 years from 1992 to 2022.

Tidying Data

In this code block, I load the necessary libraries and import the data from my github repository.

library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.4     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)

raw_data <- read.csv("https://raw.githubusercontent.com/Marley-Myrianthopoulos/Data607Project2/main/Data607DoctorateFields.csv")

kable(raw_data, format = "pipe", caption = "Initial Doctorate Data", align = "lcccccccccccccccc")
Initial Doctorate Data
Table.1.3 X X.1 X.2 X.3 X.4 X.5 X.6 X.7 X.8 X.9 X.10 X.11 X.12 X.13 X.14
Research doctorate recipients, by historical major field of doctorate: Selected years, 1992–2022 NA
(Number and percent) NA
Field of doctorate 1992 1997 2002 2007 2012 2017 2022 NA
Number Percent Number Percent Number Percent Number Percent Number Percent Number Percent Number Percent NA
All fields 38,886 100.0 42,539 100.0 40,031 100.0 48,132 100.0 50,943 100.0 54,552 100.0 57,596 100.0 NA
Life sciences 7,172 18.4 8,421 19.8 8,478 21.2 10,702 22.2 11,964 23.5 12,554 23.0 13,211 22.9 NA
Agricultural sciences and natural resources 1,261 3.2 1,212 2.8 1,129 2.8 1,321 2.7 1,255 2.5 1,493 2.7 1,434 2.5 NA
Biological and biomedical sciences 4,799 12.3 5,788 13.6 5,695 14.2 7,238 15.0 8,322 16.3 8,566 15.7 9,218 16.0 NA
Health sciences 1,112 2.9 1,421 3.3 1,654 4.1 2,143 4.5 2,387 4.7 2,495 4.6 2,559 4.4 NA
Physical sciences and earth sciences 4,517 11.6 4,550 10.7 3,875 9.7 4,956 10.3 5,419 10.6 6,082 11.1 6,649 11.5 NA
Chemistry 2,213 5.7 2,148 5.0 1,922 4.8 2,318 4.8 2,416 4.7 2,699 4.9 3,060 5.3 NA
Geosciences, atmospheric sciences, and ocean sciences 767 2.0 803 1.9 689 1.7 875 1.8 941 1.8 1,169 2.1 1,181 2.1 NA
Physics and astronomy 1,537 4.0 1,599 3.8 1,264 3.2 1,763 3.7 2,062 4.0 2,214 4.1 2,408 4.2 NA
Mathematics and computer sciences 1,927 5.0 2,032 4.8 1,729 4.3 3,042 6.3 3,496 6.9 3,842 7.0 4,854 8.4 NA
Computer and information sciences 869 2.2 909 2.1 809 2.0 1,654 3.4 1,793 3.5 1,998 3.7 2,606 4.5 NA
Mathematics and statistics 1,058 2.7 1,123 2.6 920 2.3 1,388 2.9 1,703 3.3 1,844 3.4 2,248 3.9 NA
Psychology and social sciences 6,562 16.9 7,369 17.3 6,925 17.3 7,309 15.2 8,498 16.7 9,034 16.6 9,235 16.0 NA
Psychology 3,262 8.4 3,557 8.4 3,207 8.0 3,276 6.8 3,599 7.1 3,925 7.2 3,990 6.9 NA
Anthropology 320 0.8 434 1.0 496 1.2 512 1.1 547 1.1 446 0.8 415 0.7 NA
Economics 910 2.3 1,030 2.4 908 2.3 1,004 2.1 1,243 2.4 1,239 2.3 1,287 2.2 NA
Political science and government 513 1.3 665 1.6 606 1.5 588 1.2 724 1.4 743 1.4 678 1.2 NA
Sociology 495 1.3 577 1.4 547 1.4 576 1.2 633 1.2 683 1.3 611 1.1 NA
Other social sciences 1,062 2.7 1,106 2.6 1,161 2.9 1,353 2.8 1,752 3.4 1,998 3.7 2,254 3.9 NA
Engineering 5,438 14.0 6,114 14.4 5,081 12.7 7,749 16.1 8,469 16.6 9,776 17.9 11,530 20.0 NA
Aerospace, aeronautical, and astronautical engineering 234 0.6 273 0.6 209 0.5 267 0.6 307 0.6 379 0.7 374 0.6 NA
Bioengineering and biomedical engineering 147 0.4 211 0.5 246 0.6 637 1.3 943 1.9 1,032 1.9 1,228 2.1 NA
Chemical engineering 607 1.6 662 1.6 607 1.5 817 1.7 840 1.6 931 1.7 1,142 2.0 NA
Civil engineering 540 1.4 592 1.4 540 1.3 703 1.5 495 1.0 713 1.3 898 1.6 NA
Electrical, electronics, and communications engineering 1,278 3.3 1,460 3.4 1,212 3.0 1,967 4.1 1,938 3.8 1,879 3.4 2,193 3.8 NA
Industrial and manufacturing engineering 196 0.5 246 0.6 230 0.6 279 0.6 226 0.4 249 0.5 381 0.7 NA
Materials science engineering 365 0.9 483 1.1 364 0.9 646 1.3 743 1.5 937 1.7 1,136 2.0 NA
Mechanical engineering 855 2.2 929 2.2 771 1.9 1,071 2.2 1,220 2.4 1,398 2.6 1,676 2.9 NA
Other engineering 1,216 3.1 1,258 3.0 902 2.3 1,362 2.8 1,757 3.4 2,258 4.1 2,502 4.3 NA
Education 6,677 17.2 6,577 15.5 6,508 16.3 6,448 13.4 4,802 9.4 4,826 8.8 4,509 7.8 NA
Education administration 1,984 5.1 2,050 4.8 2,351 5.9 2,161 4.5 1,057 2.1 922 1.7 734 1.3 NA
Education research 2,503 6.4 2,695 6.3 2,776 6.9 2,671 5.5 2,516 4.9 2,373 4.3 2,289 4.0 NA
Teacher education 407 1.0 291 0.7 262 0.7 297 0.6 156 0.3 114 0.2 110 0.2 NA
Teaching fields 1,008 2.6 919 2.2 686 1.7 873 1.8 757 1.5 925 1.7 890 1.5 NA
Other education 775 2.0 622 1.5 433 1.1 446 0.9 316 0.6 492 0.9 486 0.8 NA
Humanities and arts 4,387 11.3 5,285 12.4 5,297 13.2 5,085 10.6 5,561 10.9 5,286 9.7 4,464 7.8 NA
Foreign languages and literature 562 1.4 652 1.5 627 1.6 607 1.3 684 1.3 618 1.1 442 0.8 NA
History 724 1.9 965 2.3 1,031 2.6 937 1.9 1,086 2.1 1,058 1.9 750 1.3 NA
Letters 1,278 3.3 1,550 3.6 1,455 3.6 1,340 2.8 1,638 3.2 1,462 2.7 1,292 2.2 NA
Other humanities and arts 1,823 4.7 2,118 5.0 2,184 5.5 2,201 4.6 2,153 4.2 2,148 3.9 1,980 3.4 NA
Other 2,206 5.7 2,191 5.2 2,138 5.3 2,841 5.9 2,734 5.4 3,152 5.8 3,144 5.5 NA
Business management and administration 1,248 3.2 1,245 2.9 1,113 2.8 1,506 3.1 1,404 2.8 1,565 2.9 1,450 2.5 NA
Communication 330 0.8 331 0.8 397 1.0 560 1.2 595 1.2 622 1.1 580 1.0 NA
Non-science and engineering fields nec 628 1.6 615 1.4 628 1.6 775 1.6 735 1.4 965 1.8 1,114 1.9 NA

In this code block, I prepare the data for tidying by renaming the columns and removing the first four rows (which do not contain any useful data).

colnames(raw_data) <- c("Field", "docs1992", "percent1992", "docs1997", "percent1997", "docs2002", "percent2002", "docs2007", "percent2007", "docs2012", "percent2012", "docs2017", "percent2017", "docs2022", "percent2022")

prep_data <- raw_data[-c(1:4),]

kable(prep_data, format = "pipe", caption = "Data Prepped for Tidying", align = "lcccccccccccccccc")
Data Prepped for Tidying
Field docs1992 percent1992 docs1997 percent1997 docs2002 percent2002 docs2007 percent2007 docs2012 percent2012 docs2017 percent2017 docs2022 percent2022 NA
5 All fields 38,886 100.0 42,539 100.0 40,031 100.0 48,132 100.0 50,943 100.0 54,552 100.0 57,596 100.0 NA
6 Life sciences 7,172 18.4 8,421 19.8 8,478 21.2 10,702 22.2 11,964 23.5 12,554 23.0 13,211 22.9 NA
7 Agricultural sciences and natural resources 1,261 3.2 1,212 2.8 1,129 2.8 1,321 2.7 1,255 2.5 1,493 2.7 1,434 2.5 NA
8 Biological and biomedical sciences 4,799 12.3 5,788 13.6 5,695 14.2 7,238 15.0 8,322 16.3 8,566 15.7 9,218 16.0 NA
9 Health sciences 1,112 2.9 1,421 3.3 1,654 4.1 2,143 4.5 2,387 4.7 2,495 4.6 2,559 4.4 NA
10 Physical sciences and earth sciences 4,517 11.6 4,550 10.7 3,875 9.7 4,956 10.3 5,419 10.6 6,082 11.1 6,649 11.5 NA
11 Chemistry 2,213 5.7 2,148 5.0 1,922 4.8 2,318 4.8 2,416 4.7 2,699 4.9 3,060 5.3 NA
12 Geosciences, atmospheric sciences, and ocean sciences 767 2.0 803 1.9 689 1.7 875 1.8 941 1.8 1,169 2.1 1,181 2.1 NA
13 Physics and astronomy 1,537 4.0 1,599 3.8 1,264 3.2 1,763 3.7 2,062 4.0 2,214 4.1 2,408 4.2 NA
14 Mathematics and computer sciences 1,927 5.0 2,032 4.8 1,729 4.3 3,042 6.3 3,496 6.9 3,842 7.0 4,854 8.4 NA
15 Computer and information sciences 869 2.2 909 2.1 809 2.0 1,654 3.4 1,793 3.5 1,998 3.7 2,606 4.5 NA
16 Mathematics and statistics 1,058 2.7 1,123 2.6 920 2.3 1,388 2.9 1,703 3.3 1,844 3.4 2,248 3.9 NA
17 Psychology and social sciences 6,562 16.9 7,369 17.3 6,925 17.3 7,309 15.2 8,498 16.7 9,034 16.6 9,235 16.0 NA
18 Psychology 3,262 8.4 3,557 8.4 3,207 8.0 3,276 6.8 3,599 7.1 3,925 7.2 3,990 6.9 NA
19 Anthropology 320 0.8 434 1.0 496 1.2 512 1.1 547 1.1 446 0.8 415 0.7 NA
20 Economics 910 2.3 1,030 2.4 908 2.3 1,004 2.1 1,243 2.4 1,239 2.3 1,287 2.2 NA
21 Political science and government 513 1.3 665 1.6 606 1.5 588 1.2 724 1.4 743 1.4 678 1.2 NA
22 Sociology 495 1.3 577 1.4 547 1.4 576 1.2 633 1.2 683 1.3 611 1.1 NA
23 Other social sciences 1,062 2.7 1,106 2.6 1,161 2.9 1,353 2.8 1,752 3.4 1,998 3.7 2,254 3.9 NA
24 Engineering 5,438 14.0 6,114 14.4 5,081 12.7 7,749 16.1 8,469 16.6 9,776 17.9 11,530 20.0 NA
25 Aerospace, aeronautical, and astronautical engineering 234 0.6 273 0.6 209 0.5 267 0.6 307 0.6 379 0.7 374 0.6 NA
26 Bioengineering and biomedical engineering 147 0.4 211 0.5 246 0.6 637 1.3 943 1.9 1,032 1.9 1,228 2.1 NA
27 Chemical engineering 607 1.6 662 1.6 607 1.5 817 1.7 840 1.6 931 1.7 1,142 2.0 NA
28 Civil engineering 540 1.4 592 1.4 540 1.3 703 1.5 495 1.0 713 1.3 898 1.6 NA
29 Electrical, electronics, and communications engineering 1,278 3.3 1,460 3.4 1,212 3.0 1,967 4.1 1,938 3.8 1,879 3.4 2,193 3.8 NA
30 Industrial and manufacturing engineering 196 0.5 246 0.6 230 0.6 279 0.6 226 0.4 249 0.5 381 0.7 NA
31 Materials science engineering 365 0.9 483 1.1 364 0.9 646 1.3 743 1.5 937 1.7 1,136 2.0 NA
32 Mechanical engineering 855 2.2 929 2.2 771 1.9 1,071 2.2 1,220 2.4 1,398 2.6 1,676 2.9 NA
33 Other engineering 1,216 3.1 1,258 3.0 902 2.3 1,362 2.8 1,757 3.4 2,258 4.1 2,502 4.3 NA
34 Education 6,677 17.2 6,577 15.5 6,508 16.3 6,448 13.4 4,802 9.4 4,826 8.8 4,509 7.8 NA
35 Education administration 1,984 5.1 2,050 4.8 2,351 5.9 2,161 4.5 1,057 2.1 922 1.7 734 1.3 NA
36 Education research 2,503 6.4 2,695 6.3 2,776 6.9 2,671 5.5 2,516 4.9 2,373 4.3 2,289 4.0 NA
37 Teacher education 407 1.0 291 0.7 262 0.7 297 0.6 156 0.3 114 0.2 110 0.2 NA
38 Teaching fields 1,008 2.6 919 2.2 686 1.7 873 1.8 757 1.5 925 1.7 890 1.5 NA
39 Other education 775 2.0 622 1.5 433 1.1 446 0.9 316 0.6 492 0.9 486 0.8 NA
40 Humanities and arts 4,387 11.3 5,285 12.4 5,297 13.2 5,085 10.6 5,561 10.9 5,286 9.7 4,464 7.8 NA
41 Foreign languages and literature 562 1.4 652 1.5 627 1.6 607 1.3 684 1.3 618 1.1 442 0.8 NA
42 History 724 1.9 965 2.3 1,031 2.6 937 1.9 1,086 2.1 1,058 1.9 750 1.3 NA
43 Letters 1,278 3.3 1,550 3.6 1,455 3.6 1,340 2.8 1,638 3.2 1,462 2.7 1,292 2.2 NA
44 Other humanities and arts 1,823 4.7 2,118 5.0 2,184 5.5 2,201 4.6 2,153 4.2 2,148 3.9 1,980 3.4 NA
45 Other 2,206 5.7 2,191 5.2 2,138 5.3 2,841 5.9 2,734 5.4 3,152 5.8 3,144 5.5 NA
46 Business management and administration 1,248 3.2 1,245 2.9 1,113 2.8 1,506 3.1 1,404 2.8 1,565 2.9 1,450 2.5 NA
47 Communication 330 0.8 331 0.8 397 1.0 560 1.2 595 1.2 622 1.1 580 1.0 NA
48 Non-science and engineering fields nec 628 1.6 615 1.4 628 1.6 775 1.6 735 1.4 965 1.8 1,114 1.9 NA

In this code block, I continue to clean the data. The data is organized with broad and specific fields in the same column, so I create a new column for the broad fields and copy the data into that column. I then create a new data frame without the rows that contained only broad field data. As a final preparatory step, I remove the commas from the data points in the cells and convert the resulting strings into numbers so that I can perform calculations on them later. In retrospect, it would have been easier to do this after pivoting the data since I would’ve had to only do it to one column. A lesson for next time! The data is now cleaned and ready to be pivoted into a tidy format.

prep_data$broadfield <- ""
prep_data$broadfield[3] <- prep_data$Field[2]
prep_data$broadfield[7] <- prep_data$Field[6]
prep_data$broadfield[11] <- prep_data$Field[10]
prep_data$broadfield[14] <- prep_data$Field[13]
prep_data$broadfield[21] <- prep_data$Field[20]
prep_data$broadfield[31] <- prep_data$Field[30]
prep_data$broadfield[37] <- prep_data$Field[36]
prep_data$broadfield[42] <- prep_data$Field[41]

tidy_data <- prep_data[-c(1,2,6,10,13,20,30,36,41),c(17,1,2,4,6,8,10,12,14)]

tidy_data$docs1992 <- as.numeric(gsub(",","",tidy_data$docs1992))
tidy_data$docs1997 <- as.numeric(gsub(",","",tidy_data$docs1997))
tidy_data$docs2002 <- as.numeric(gsub(",","",tidy_data$docs2002))
tidy_data$docs2007 <- as.numeric(gsub(",","",tidy_data$docs2007))
tidy_data$docs2012 <- as.numeric(gsub(",","",tidy_data$docs2012))
tidy_data$docs2017 <- as.numeric(gsub(",","",tidy_data$docs2017))
tidy_data$docs2022 <- as.numeric(gsub(",","",tidy_data$docs2022))

kable(tidy_data, format = "pipe", caption = "Clean Data", align = "llccccccc")
Clean Data
broadfield Field docs1992 docs1997 docs2002 docs2007 docs2012 docs2017 docs2022
7 Life sciences Agricultural sciences and natural resources 1261 1212 1129 1321 1255 1493 1434
8 Biological and biomedical sciences 4799 5788 5695 7238 8322 8566 9218
9 Health sciences 1112 1421 1654 2143 2387 2495 2559
11 Physical sciences and earth sciences Chemistry 2213 2148 1922 2318 2416 2699 3060
12 Geosciences, atmospheric sciences, and ocean sciences 767 803 689 875 941 1169 1181
13 Physics and astronomy 1537 1599 1264 1763 2062 2214 2408
15 Mathematics and computer sciences Computer and information sciences 869 909 809 1654 1793 1998 2606
16 Mathematics and statistics 1058 1123 920 1388 1703 1844 2248
18 Psychology and social sciences Psychology 3262 3557 3207 3276 3599 3925 3990
19 Anthropology 320 434 496 512 547 446 415
20 Economics 910 1030 908 1004 1243 1239 1287
21 Political science and government 513 665 606 588 724 743 678
22 Sociology 495 577 547 576 633 683 611
23 Other social sciences 1062 1106 1161 1353 1752 1998 2254
25 Engineering Aerospace, aeronautical, and astronautical engineering 234 273 209 267 307 379 374
26 Bioengineering and biomedical engineering 147 211 246 637 943 1032 1228
27 Chemical engineering 607 662 607 817 840 931 1142
28 Civil engineering 540 592 540 703 495 713 898
29 Electrical, electronics, and communications engineering 1278 1460 1212 1967 1938 1879 2193
30 Industrial and manufacturing engineering 196 246 230 279 226 249 381
31 Materials science engineering 365 483 364 646 743 937 1136
32 Mechanical engineering 855 929 771 1071 1220 1398 1676
33 Other engineering 1216 1258 902 1362 1757 2258 2502
35 Education Education administration 1984 2050 2351 2161 1057 922 734
36 Education research 2503 2695 2776 2671 2516 2373 2289
37 Teacher education 407 291 262 297 156 114 110
38 Teaching fields 1008 919 686 873 757 925 890
39 Other education 775 622 433 446 316 492 486
41 Humanities and arts Foreign languages and literature 562 652 627 607 684 618 442
42 History 724 965 1031 937 1086 1058 750
43 Letters 1278 1550 1455 1340 1638 1462 1292
44 Other humanities and arts 1823 2118 2184 2201 2153 2148 1980
46 Other Business management and administration 1248 1245 1113 1506 1404 1565 1450
47 Communication 330 331 397 560 595 622 580
48 Non-science and engineering fields nec 628 615 628 775 735 965 1114

In this code block I start by filling in the empty cells in the broad field column. I used a for loop for this on the week 5 assignment, but picked up a better trick from looking at the solution that Molly Siebecker shared for that assignment that I wanted to try for this one. I first convert all of the empty cells to “NA” values and then use the “fill” function to finish out the data in the column. I then use pivot_longer to convert the data into a format that includes a variable for the year, rather than having each year be its own variable. I then use group_by and mutate to add an additional column calculating what percentage of the doctorates from that year each field represents. Finally, I use a regular expression to eliminate the first four characters of the “Year” column. Since the entire column is formatted as “docs[YYYY]” this results in a column that just displays the year. The data is now “tidy”.

library(tidyr)
library(dplyr)
library(tidyverse)

tidy_data <- tidy_data %>%
  mutate(broadfield = na_if(broadfield, "")) %>%
  fill(broadfield) %>%
  pivot_longer(
    cols = -c("broadfield", "Field"),
    names_to = "Year",
    values_to = "Doctorates"
  ) %>%
  group_by(Year) %>%
  mutate(Year_Percent = round(Doctorates / sum(Doctorates) * 100, 1))

tidy_data$Year <- as.integer(sub("^....","",tidy_data$Year))

kable(tidy_data, format = "pipe", caption = "Tidy Doctorate Data", align = "llccc")
Tidy Doctorate Data
broadfield Field Year Doctorates Year_Percent
Life sciences Agricultural sciences and natural resources 1992 1261 3.2
Life sciences Agricultural sciences and natural resources 1997 1212 2.8
Life sciences Agricultural sciences and natural resources 2002 1129 2.8
Life sciences Agricultural sciences and natural resources 2007 1321 2.7
Life sciences Agricultural sciences and natural resources 2012 1255 2.5
Life sciences Agricultural sciences and natural resources 2017 1493 2.7
Life sciences Agricultural sciences and natural resources 2022 1434 2.5
Life sciences Biological and biomedical sciences 1992 4799 12.3
Life sciences Biological and biomedical sciences 1997 5788 13.6
Life sciences Biological and biomedical sciences 2002 5695 14.2
Life sciences Biological and biomedical sciences 2007 7238 15.0
Life sciences Biological and biomedical sciences 2012 8322 16.3
Life sciences Biological and biomedical sciences 2017 8566 15.7
Life sciences Biological and biomedical sciences 2022 9218 16.0
Life sciences Health sciences 1992 1112 2.9
Life sciences Health sciences 1997 1421 3.3
Life sciences Health sciences 2002 1654 4.1
Life sciences Health sciences 2007 2143 4.5
Life sciences Health sciences 2012 2387 4.7
Life sciences Health sciences 2017 2495 4.6
Life sciences Health sciences 2022 2559 4.4
Physical sciences and earth sciences Chemistry 1992 2213 5.7
Physical sciences and earth sciences Chemistry 1997 2148 5.0
Physical sciences and earth sciences Chemistry 2002 1922 4.8
Physical sciences and earth sciences Chemistry 2007 2318 4.8
Physical sciences and earth sciences Chemistry 2012 2416 4.7
Physical sciences and earth sciences Chemistry 2017 2699 4.9
Physical sciences and earth sciences Chemistry 2022 3060 5.3
Physical sciences and earth sciences Geosciences, atmospheric sciences, and ocean sciences 1992 767 2.0
Physical sciences and earth sciences Geosciences, atmospheric sciences, and ocean sciences 1997 803 1.9
Physical sciences and earth sciences Geosciences, atmospheric sciences, and ocean sciences 2002 689 1.7
Physical sciences and earth sciences Geosciences, atmospheric sciences, and ocean sciences 2007 875 1.8
Physical sciences and earth sciences Geosciences, atmospheric sciences, and ocean sciences 2012 941 1.8
Physical sciences and earth sciences Geosciences, atmospheric sciences, and ocean sciences 2017 1169 2.1
Physical sciences and earth sciences Geosciences, atmospheric sciences, and ocean sciences 2022 1181 2.1
Physical sciences and earth sciences Physics and astronomy 1992 1537 4.0
Physical sciences and earth sciences Physics and astronomy 1997 1599 3.8
Physical sciences and earth sciences Physics and astronomy 2002 1264 3.2
Physical sciences and earth sciences Physics and astronomy 2007 1763 3.7
Physical sciences and earth sciences Physics and astronomy 2012 2062 4.0
Physical sciences and earth sciences Physics and astronomy 2017 2214 4.1
Physical sciences and earth sciences Physics and astronomy 2022 2408 4.2
Mathematics and computer sciences Computer and information sciences 1992 869 2.2
Mathematics and computer sciences Computer and information sciences 1997 909 2.1
Mathematics and computer sciences Computer and information sciences 2002 809 2.0
Mathematics and computer sciences Computer and information sciences 2007 1654 3.4
Mathematics and computer sciences Computer and information sciences 2012 1793 3.5
Mathematics and computer sciences Computer and information sciences 2017 1998 3.7
Mathematics and computer sciences Computer and information sciences 2022 2606 4.5
Mathematics and computer sciences Mathematics and statistics 1992 1058 2.7
Mathematics and computer sciences Mathematics and statistics 1997 1123 2.6
Mathematics and computer sciences Mathematics and statistics 2002 920 2.3
Mathematics and computer sciences Mathematics and statistics 2007 1388 2.9
Mathematics and computer sciences Mathematics and statistics 2012 1703 3.3
Mathematics and computer sciences Mathematics and statistics 2017 1844 3.4
Mathematics and computer sciences Mathematics and statistics 2022 2248 3.9
Psychology and social sciences Psychology 1992 3262 8.4
Psychology and social sciences Psychology 1997 3557 8.4
Psychology and social sciences Psychology 2002 3207 8.0
Psychology and social sciences Psychology 2007 3276 6.8
Psychology and social sciences Psychology 2012 3599 7.1
Psychology and social sciences Psychology 2017 3925 7.2
Psychology and social sciences Psychology 2022 3990 6.9
Psychology and social sciences Anthropology 1992 320 0.8
Psychology and social sciences Anthropology 1997 434 1.0
Psychology and social sciences Anthropology 2002 496 1.2
Psychology and social sciences Anthropology 2007 512 1.1
Psychology and social sciences Anthropology 2012 547 1.1
Psychology and social sciences Anthropology 2017 446 0.8
Psychology and social sciences Anthropology 2022 415 0.7
Psychology and social sciences Economics 1992 910 2.3
Psychology and social sciences Economics 1997 1030 2.4
Psychology and social sciences Economics 2002 908 2.3
Psychology and social sciences Economics 2007 1004 2.1
Psychology and social sciences Economics 2012 1243 2.4
Psychology and social sciences Economics 2017 1239 2.3
Psychology and social sciences Economics 2022 1287 2.2
Psychology and social sciences Political science and government 1992 513 1.3
Psychology and social sciences Political science and government 1997 665 1.6
Psychology and social sciences Political science and government 2002 606 1.5
Psychology and social sciences Political science and government 2007 588 1.2
Psychology and social sciences Political science and government 2012 724 1.4
Psychology and social sciences Political science and government 2017 743 1.4
Psychology and social sciences Political science and government 2022 678 1.2
Psychology and social sciences Sociology 1992 495 1.3
Psychology and social sciences Sociology 1997 577 1.4
Psychology and social sciences Sociology 2002 547 1.4
Psychology and social sciences Sociology 2007 576 1.2
Psychology and social sciences Sociology 2012 633 1.2
Psychology and social sciences Sociology 2017 683 1.3
Psychology and social sciences Sociology 2022 611 1.1
Psychology and social sciences Other social sciences 1992 1062 2.7
Psychology and social sciences Other social sciences 1997 1106 2.6
Psychology and social sciences Other social sciences 2002 1161 2.9
Psychology and social sciences Other social sciences 2007 1353 2.8
Psychology and social sciences Other social sciences 2012 1752 3.4
Psychology and social sciences Other social sciences 2017 1998 3.7
Psychology and social sciences Other social sciences 2022 2254 3.9
Engineering Aerospace, aeronautical, and astronautical engineering 1992 234 0.6
Engineering Aerospace, aeronautical, and astronautical engineering 1997 273 0.6
Engineering Aerospace, aeronautical, and astronautical engineering 2002 209 0.5
Engineering Aerospace, aeronautical, and astronautical engineering 2007 267 0.6
Engineering Aerospace, aeronautical, and astronautical engineering 2012 307 0.6
Engineering Aerospace, aeronautical, and astronautical engineering 2017 379 0.7
Engineering Aerospace, aeronautical, and astronautical engineering 2022 374 0.6
Engineering Bioengineering and biomedical engineering 1992 147 0.4
Engineering Bioengineering and biomedical engineering 1997 211 0.5
Engineering Bioengineering and biomedical engineering 2002 246 0.6
Engineering Bioengineering and biomedical engineering 2007 637 1.3
Engineering Bioengineering and biomedical engineering 2012 943 1.9
Engineering Bioengineering and biomedical engineering 2017 1032 1.9
Engineering Bioengineering and biomedical engineering 2022 1228 2.1
Engineering Chemical engineering 1992 607 1.6
Engineering Chemical engineering 1997 662 1.6
Engineering Chemical engineering 2002 607 1.5
Engineering Chemical engineering 2007 817 1.7
Engineering Chemical engineering 2012 840 1.6
Engineering Chemical engineering 2017 931 1.7
Engineering Chemical engineering 2022 1142 2.0
Engineering Civil engineering 1992 540 1.4
Engineering Civil engineering 1997 592 1.4
Engineering Civil engineering 2002 540 1.3
Engineering Civil engineering 2007 703 1.5
Engineering Civil engineering 2012 495 1.0
Engineering Civil engineering 2017 713 1.3
Engineering Civil engineering 2022 898 1.6
Engineering Electrical, electronics, and communications engineering 1992 1278 3.3
Engineering Electrical, electronics, and communications engineering 1997 1460 3.4
Engineering Electrical, electronics, and communications engineering 2002 1212 3.0
Engineering Electrical, electronics, and communications engineering 2007 1967 4.1
Engineering Electrical, electronics, and communications engineering 2012 1938 3.8
Engineering Electrical, electronics, and communications engineering 2017 1879 3.4
Engineering Electrical, electronics, and communications engineering 2022 2193 3.8
Engineering Industrial and manufacturing engineering 1992 196 0.5
Engineering Industrial and manufacturing engineering 1997 246 0.6
Engineering Industrial and manufacturing engineering 2002 230 0.6
Engineering Industrial and manufacturing engineering 2007 279 0.6
Engineering Industrial and manufacturing engineering 2012 226 0.4
Engineering Industrial and manufacturing engineering 2017 249 0.5
Engineering Industrial and manufacturing engineering 2022 381 0.7
Engineering Materials science engineering 1992 365 0.9
Engineering Materials science engineering 1997 483 1.1
Engineering Materials science engineering 2002 364 0.9
Engineering Materials science engineering 2007 646 1.3
Engineering Materials science engineering 2012 743 1.5
Engineering Materials science engineering 2017 937 1.7
Engineering Materials science engineering 2022 1136 2.0
Engineering Mechanical engineering 1992 855 2.2
Engineering Mechanical engineering 1997 929 2.2
Engineering Mechanical engineering 2002 771 1.9
Engineering Mechanical engineering 2007 1071 2.2
Engineering Mechanical engineering 2012 1220 2.4
Engineering Mechanical engineering 2017 1398 2.6
Engineering Mechanical engineering 2022 1676 2.9
Engineering Other engineering 1992 1216 3.1
Engineering Other engineering 1997 1258 3.0
Engineering Other engineering 2002 902 2.3
Engineering Other engineering 2007 1362 2.8
Engineering Other engineering 2012 1757 3.4
Engineering Other engineering 2017 2258 4.1
Engineering Other engineering 2022 2502 4.3
Education Education administration 1992 1984 5.1
Education Education administration 1997 2050 4.8
Education Education administration 2002 2351 5.9
Education Education administration 2007 2161 4.5
Education Education administration 2012 1057 2.1
Education Education administration 2017 922 1.7
Education Education administration 2022 734 1.3
Education Education research 1992 2503 6.4
Education Education research 1997 2695 6.3
Education Education research 2002 2776 6.9
Education Education research 2007 2671 5.5
Education Education research 2012 2516 4.9
Education Education research 2017 2373 4.3
Education Education research 2022 2289 4.0
Education Teacher education 1992 407 1.0
Education Teacher education 1997 291 0.7
Education Teacher education 2002 262 0.7
Education Teacher education 2007 297 0.6
Education Teacher education 2012 156 0.3
Education Teacher education 2017 114 0.2
Education Teacher education 2022 110 0.2
Education Teaching fields 1992 1008 2.6
Education Teaching fields 1997 919 2.2
Education Teaching fields 2002 686 1.7
Education Teaching fields 2007 873 1.8
Education Teaching fields 2012 757 1.5
Education Teaching fields 2017 925 1.7
Education Teaching fields 2022 890 1.5
Education Other education 1992 775 2.0
Education Other education 1997 622 1.5
Education Other education 2002 433 1.1
Education Other education 2007 446 0.9
Education Other education 2012 316 0.6
Education Other education 2017 492 0.9
Education Other education 2022 486 0.8
Humanities and arts Foreign languages and literature 1992 562 1.4
Humanities and arts Foreign languages and literature 1997 652 1.5
Humanities and arts Foreign languages and literature 2002 627 1.6
Humanities and arts Foreign languages and literature 2007 607 1.3
Humanities and arts Foreign languages and literature 2012 684 1.3
Humanities and arts Foreign languages and literature 2017 618 1.1
Humanities and arts Foreign languages and literature 2022 442 0.8
Humanities and arts History 1992 724 1.9
Humanities and arts History 1997 965 2.3
Humanities and arts History 2002 1031 2.6
Humanities and arts History 2007 937 1.9
Humanities and arts History 2012 1086 2.1
Humanities and arts History 2017 1058 1.9
Humanities and arts History 2022 750 1.3
Humanities and arts Letters 1992 1278 3.3
Humanities and arts Letters 1997 1550 3.6
Humanities and arts Letters 2002 1455 3.6
Humanities and arts Letters 2007 1340 2.8
Humanities and arts Letters 2012 1638 3.2
Humanities and arts Letters 2017 1462 2.7
Humanities and arts Letters 2022 1292 2.2
Humanities and arts Other humanities and arts 1992 1823 4.7
Humanities and arts Other humanities and arts 1997 2118 5.0
Humanities and arts Other humanities and arts 2002 2184 5.5
Humanities and arts Other humanities and arts 2007 2201 4.6
Humanities and arts Other humanities and arts 2012 2153 4.2
Humanities and arts Other humanities and arts 2017 2148 3.9
Humanities and arts Other humanities and arts 2022 1980 3.4
Other Business management and administration 1992 1248 3.2
Other Business management and administration 1997 1245 2.9
Other Business management and administration 2002 1113 2.8
Other Business management and administration 2007 1506 3.1
Other Business management and administration 2012 1404 2.8
Other Business management and administration 2017 1565 2.9
Other Business management and administration 2022 1450 2.5
Other Communication 1992 330 0.8
Other Communication 1997 331 0.8
Other Communication 2002 397 1.0
Other Communication 2007 560 1.2
Other Communication 2012 595 1.2
Other Communication 2017 622 1.1
Other Communication 2022 580 1.0
Other Non-science and engineering fields nec 1992 628 1.6
Other Non-science and engineering fields nec 1997 615 1.4
Other Non-science and engineering fields nec 2002 628 1.6
Other Non-science and engineering fields nec 2007 775 1.6
Other Non-science and engineering fields nec 2012 735 1.4
Other Non-science and engineering fields nec 2017 965 1.8
Other Non-science and engineering fields nec 2022 1114 1.9

Data Analysis

Jonathan shared this data set and suggested looking at “how the make up of doctorate degrees has changed throughout the years.” Following his advice, I will analyze the data to determine which broad fields have experienced the greatest increase and greatest decrease in their share of doctorate degrees since 1992.

In this code block, create a new data frame consisting of the list of broad fields and what percent each of those fields represented of total doctorates from that year.

broadfield_annual <- tidy_data %>%
  group_by(broadfield, Year) %>%
  summarise(Percent = sum(Year_Percent))
## `summarise()` has grouped output by 'broadfield'. You can override using the
## `.groups` argument.
kable(broadfield_annual, format = "pipe", caption = "Broad Field % of Doctorates by Year", align = "lcc")
Broad Field % of Doctorates by Year
broadfield Year Percent
Education 1992 17.1
Education 1997 15.5
Education 2002 16.3
Education 2007 13.3
Education 2012 9.4
Education 2017 8.8
Education 2022 7.8
Engineering 1992 14.0
Engineering 1997 14.4
Engineering 2002 12.6
Engineering 2007 16.1
Engineering 2012 16.6
Engineering 2017 17.9
Engineering 2022 20.0
Humanities and arts 1992 11.3
Humanities and arts 1997 12.4
Humanities and arts 2002 13.3
Humanities and arts 2007 10.6
Humanities and arts 2012 10.8
Humanities and arts 2017 9.6
Humanities and arts 2022 7.7
Life sciences 1992 18.4
Life sciences 1997 19.7
Life sciences 2002 21.1
Life sciences 2007 22.2
Life sciences 2012 23.5
Life sciences 2017 23.0
Life sciences 2022 22.9
Mathematics and computer sciences 1992 4.9
Mathematics and computer sciences 1997 4.7
Mathematics and computer sciences 2002 4.3
Mathematics and computer sciences 2007 6.3
Mathematics and computer sciences 2012 6.8
Mathematics and computer sciences 2017 7.1
Mathematics and computer sciences 2022 8.4
Other 1992 5.6
Other 1997 5.1
Other 2002 5.4
Other 2007 5.9
Other 2012 5.4
Other 2017 5.8
Other 2022 5.4
Physical sciences and earth sciences 1992 11.7
Physical sciences and earth sciences 1997 10.7
Physical sciences and earth sciences 2002 9.7
Physical sciences and earth sciences 2007 10.3
Physical sciences and earth sciences 2012 10.5
Physical sciences and earth sciences 2017 11.1
Physical sciences and earth sciences 2022 11.6
Psychology and social sciences 1992 16.8
Psychology and social sciences 1997 17.4
Psychology and social sciences 2002 17.3
Psychology and social sciences 2007 15.2
Psychology and social sciences 2012 16.6
Psychology and social sciences 2017 16.7
Psychology and social sciences 2022 16.0

In this code block, I use ggplot to graph the data for each broad field as a line graph over time. Visually, it appears that since 1992 Engineering has grown the most as a share of doctorates and Education has declined the most.

library(ggplot2)

ggplot(broadfield_annual, aes(x = Year, y = Percent, color = broadfield)) + 
  geom_line() + 
  labs(title = "Major Field Share of Doctorates", y = "Percent of Doctorates", color = "Major Field") +
  scale_x_continuous(breaks = seq(1992, 2022, by = 5))

In this code block, I confirm the findings I had identified visually from the graph. I use pivot_wider to get the data for each broad field by year in a different column, then mutate to get a new column representing the change in the percent of doctorates that the broad field had from 1992 to 2022, then remove all columns except the broad field and the change, then sort the data by change in descending order. The output confirms my visual observations: Engineering increased its share of doctorates by 6% from 1992 to 2022 (the largest increase), while Education decreased its share by 9.3% (the largest decrease).

broadfield_changes <- broadfield_annual %>%
  filter(Year %in% c(1992, 2022)) %>%
  pivot_wider(
    names_from = "Year",
    values_from = "Percent"
  ) %>%
  mutate(Change = `2022` - `1992`) %>%
  select(broadfield, Change) %>%
  arrange(desc(Change))

kable(broadfield_changes, format = "pipe", caption = "Change in Broad Field % of Doctorates, 1992-2022", align = "lc")
Change in Broad Field % of Doctorates, 1992-2022
broadfield Change
Engineering 6.0
Life sciences 4.5
Mathematics and computer sciences 3.5
Physical sciences and earth sciences -0.1
Other -0.2
Psychology and social sciences -0.8
Humanities and arts -3.6
Education -9.3

Findings and Recommendations

This is the second consecutive assignment in which I have used pivot_longer to tidy wide data and then during data analysis used pivot_wider to return it to a wide format. I think I have benefited from the experience of having to convert data in both directions, gaining a deeper understanding of the benefits of both formats (as well as the code used to convert between them). There is a connection to my job as a high school math teacher here, the high school math standards in New York emphasize the benefits to students of understanding and converting between “multiple equivalent representations” of functions (such as a graph, equation, and table of input-output pairs) and I think understanding and converting between multiple representations of data has been very helpful to me. Going forward, I’ll continue to look for opportunities to work in “two directions” during data analysis.