Introduction
This report is for Javin only. The overall goal is to analyze the type of jobs in the biotechnology field. I scrape indeed.com and look for jobs with the following keywords:
biotechnology
biotech
biotech sales
biotechnology sales
biotech manufacturing
biotech marketing
I applied the following non-scientific method to look for the keywords. First, I typed the word “biotech” on the indeed.com website. While typing, indeed.com also gives other recommended words like biotech sales or biotech manufacturing. Second, I followed the same procedure by typing the word “biotechnology”.
I also filtered so that I am only looking for full-time entry level jobs in United States.
What does the data look like?
Using all the fiters, I scraped 1260 listings. Table 1 shows the first three listings. The first listing of a research associate is located in Newark CA. From the job summary, it is evident that this job asks the applicant to commericialize products. The third listing is more interesting. For example, the job summary of the third listing does not make sense. The lack of comprehension may be either due to web scraping error or it may be a mistake in the job listing itself. Nonetheless, the Natural Language Processing (NLP) that I know does not take into account if a listing makes sense or not. The NLP that I use is more of a counting exercise frankly.
| title | company | location | summary |
|---|---|---|---|
| Research Associate, Library Prep | Stealth Mode Startup | Newark, CA | Play a critical role in the development and commercialization of products that will have a major impact on the medical and life science fields. |
| Technicians in molecular clinical testing and microbiology t… | Potomac Urology | Alexandria, VA | Independently perform PCR runs, analyze data, and prepare technical reports according to standard operating procedures·. |
| Associate Researcher | Icahn School of Medicine at Mount Sinai | New York, NY | Susmita Sahoo, at the Icahn School of Medicine at Mount Sinai, New York, NY. |
Unfortunately, most of the 1260 listings are not unique. For example, it may be that keywords biotech and biotechnology led to the same job listing. Therefore, I want to filter the data for unique jobs. But, the issue of searching for unique listings is tricky. Consider a job as a research associate offered by company X in two different locations. In this case, Company X posts two job listings with the same job title, job summary and job description. These two listings differ only by the location. Similarly, for unknown reasons, companies post the same job description for different job titles. Hence, I choose to not filter by title, summary and description.
To look for unique jobs, I create another variable that combines the title and location. Looking at Table 1, this another variable takes the value of Research Associate, Library Prep_Stealth Mode Startup. Using this variable as a filter, I remove duplicated listings. After the filter, the number of unique job listings decreases to 622 listings — a reduction of approximately 51%. Upon first inspection, this amount of reduction seems high. To get myself comfortable, I filtered using job title, job description and job summary. Such filters also yield similar reduction.
To summarize, I work with 622 listings for most of the analysis.
What are the most popular jobs?
Figure 1: This plot shows the most frequent bi-words in the job title
Figure 1 shows that the being a research associte is the most common title that an undergraduate can get. In fact, 19% of all jobs are research associate positions.
Two other observations are in order:
From the presence of the word scientist, most of the jobs are technical.
Sales jobs are less than what I had expected.
Note that in Figure 1, I report the top 10 jobs. Starting from the fourth most popular job — research fellow — all other jobs occur less than 25 times. That is, I cannot sense a pattern among the job titles.
Figure 2: This plot shows the relationship between the bivariate words
Figure 2 shows the relationship between the bi-variate words. Starting from the right, the words R&D and laboratory scientist are the most related. Similarly cell culture, associate cell and cell biology are the most related. This implies that the words appear together in the job title. Now, looking at the left, the word research associate is the most disparate from others even though it is the most frequent. In the same spirit, the words associate scientist is more related to research scientist than research associte.
Figure 2’s relationship graph implies that the job functions of a research associate may be different than all other job titles. Figure 2 also says that a laboratory scientist is probably involved wih R&D at the company.
Where are the jobs located?
Based on information in Figure 2, I classify the jobs into three types. The first type is the research associate — this job listing is the most frequent. The second type is the amalgamation of the other three types based on the cluster diagram. This type includes jobs with titles “research scientist”, “associate scientist” and “research fellow”. The third type is other which encompasses all other jobs.
Figure 3 plots the number of jobs for a few states that advertise ten or more jobs. For example. Of all the unique jobs listed, Texas has twelve jobs. The Figure also classifies the number of jobs by the three job types. Two observations are in order:
The green bar representing the research associate type are only prevalent in California and Massachusetts. In fact, of the 118 research associate jobs, 65 were in California and 27 are in Massachusetts — both states combine to host 78% of the jobs.
The second job type that I label scientist or fellow — consisting of words “research scientist”, “associate scientist” and “research fellow” — are also only prevalent in California and Massachusetts — both states host 29 jobs out of the 60 total jobs. Note that NewYork hosts six job but Figure 3 excludes it because the number of jobs is less than ten.
The north east region consisting of Maryland, Pennyslavania, New York and New Jersey also seem to host “Other” type of jobs.