Job Search Results: Academia to Data Science

Kinga Stryszowska-Hill

September 14, 2021

Background

I am a wetland ecologist with a PhD in Environmental Science. Currently, I am a Postdoctoral Associate at Murray State University researching restored wetlands along Mississippi River tributaries. A few months ago, I made the decision to transition from academia to industry and I identified data analytics/data science as a path that aligns with what I enjoy doing and what I do well. This week I am negotiating two industry job offers. Today, I am making a Sankey chart in R to visualize the final results of my job search process.

Timeline of My Transition

April 2021: I have an epiphany that I don’t want to stay in academia. I also have the thought that data analytics is something that I love doing and something that is worthwhile pursuing in industry.
May 2021: I conduct my first informational interview with an acquaintance who was a data scientist for the federal government. I learn the difference between data science and data analytics. Over the course of 4 months, I conduct 47 informational interviews and I love every single conversation.
June 2021: Over the period of 1 month, I submit 40 applications through LinkedIn and Indeed. I use the LinkedIn Easy Apply feature. I do not tailor my resumes nor do I contact hiring managers or recruiters. This is my attempt at the “spray and pray” technique.
July 2021: The “spray and pray” approach yields poor results and also makes me feel frantic, like I always need to be looking for and applying to jobs. Since I am in no rush to leave my research position, I dial back and continue exclusively conducting informational interviews. Along the way, if an interviewee suggests that I apply to a position, I submit my resume tailored specifically to the position.
August 2021: My resume lands on the desk of a hiring manager and he likes it so much that he creates a special management position in analytics just for me. And it pays more than double my current salary. I am surprised and thrilled.
September 2021: I receive two job offers in one week. Both are extremely competitive and both in excellent companies. I have trouble choosing which position is best for me. In the end, I choose a role as a data scientist at a green tech startup.

Let’s build the Sankey chart that summarizes this process. For this chart, I followed the tutorial by the R Graph Gallery: https://www.r-graph-gallery.com/321-introduction-to-interactive-sankey-diagram-2.html

Load Libraries

library(networkD3)
library(dplyr)
library(kableExtra)

Load Data

data_applications <- read.csv("DataApplications.csv")

View Table Structure

There are two columns: Source (how each interaction started) and Target (how each interaction ended). Each value is the name of the interaction (ex. Networking, Interview, Ghosted, etc.). I wanted the labels on the Sankey chart to include the number of interactions so I built them into the value names (ex. Networking: 10).

knitr::kable(head(data_applications)) %>%   
kableExtra::column_spec(1, width = "150px", extra_css = "vertical-align:middle;")

Source	Target
Networking: 10	Ghosted: 35
Networking: 10	Ghosted: 35
Networking: 10	Interview: 9
Networking: 10	Interview: 9
Networking: 10	Interview: 9
Networking: 10	Interview: 9

Summarize

Summarize the applications data to get it ready for the Sankey chart. This summary counts the number of instances for each Target group. These counts are going to represent the strength of the flow between the nodes. For example, for the Networking interaction (10 total), we can see that the Target was 2 Ghosted, 6 Interview, and 2 Offer.

links <- data_applications %>% 
  group_by(Source, Target) %>% 
  summarise(Value = n())


knitr::kable(links) %>%   
kableExtra::column_spec(1, width = "150px", extra_css = "vertical-align:middle;")

Source	Target	Value
Networking: 10	Ghosted: 35	2
Networking: 10	Interview: 9	6
Networking: 10	Offer: 2	2
Online: 42	Ghosted: 35	30
Online: 42	Interview: 9	1
Online: 42	Rejected: 11	11
Recruiter: 3	Ghosted: 35	3

Define the nodes.

Nodes are the groupings on the left and right side of the chart.

nodes <- data.frame(name = c(as.character(links$Source),
                             as.character(links$Target))) %>% unique()

links$IDSource <- match(links$Source, nodes$name)-1 
links$IDTarget <- match(links$Target, nodes$name)-1

Define custom colors that I want to use for the chart.

I have 7 categories

my_color <- 'd3.scaleOrdinal() .domain(["Online: 42", "Recruiter: 3","Networking: 8", "Ghosted: 35", "Rejected: 11", "Interview: 7", "Offer: 2"]) .range(["#435560", "#435560" , "#435560", "#F1F9F9", "#91C788", "#52734D", "#F38BA0"])'

#Adjust margins
margins <- c(top = 0, right = 0, bottom = 0, left = 100)

Build the chart

sankeyNetwork(Links = links, Nodes = nodes,
              Source = "IDSource", Target = "IDTarget",
              Value = "Value",
              NodeID = "name", 
              fontSize = 14,
              colourScale = my_color,
              width = 900,
              margin = margins,
              sinksRight=FALSE)

Summary

42 Online applications resulted in 1 interview
All recruiters who contacted me resulted in ghosting, but if I contacted a recruiter I had a better response
Rejection is part of this process
Networking was a very effective strategy for me. I had a positive response from recruiters, managers, and others who I reached out to
Out of 10 networking-based applications, only 1 ghosted me, all others led to interviews or offers (that’s a 90% conversion rate)
I built relationships along the way and I learned so much about what I want to do in this field