Introduction

This report is for Javin only. The overall goal is to analyze the type of jobs in the biotechnology field. I scrape indeed.com and look for jobs with the following keywords:

  1. biotechnology

  2. biotech

  3. biotech sales

  4. biotechnology sales

  5. biotech manufacturing

  6. biotech marketing

I applied the following non-scientific method to look for the keywords. First, I typed the word “biotech” on the indeed.com website. While typing, indeed.com also gives other recommended words like biotech sales or biotech manufacturing. Second, I followed the same procedure by typing the word “biotechnology”.

I also filtered so that I am only looking for full-time entry level jobs in United States.

What does the data look like?

Using all the fiters, I scraped 1260 listings. Table 1 shows the first three listings. The first listing of a research associate is located in Newark CA. From the job summary, it is evident that this job asks the applicant to commericialize products. The third listing is more interesting. For example, the job summary of the third listing does not make sense. The lack of comprehension may be either due to web scraping error or it may be a mistake in the job listing itself. Nonetheless, the Natural Language Processing (NLP) that I know does not take into account if a listing makes sense or not. The NLP that I use is more of a counting exercise frankly.

Table 1: This table shows the first three listings. I do not show the description for expositional clarity
title company location summary
Research Associate, Library Prep Stealth Mode Startup Newark, CA Play a critical role in the development and commercialization of products that will have a major impact on the medical and life science fields.
Technicians in molecular clinical testing and microbiology t… Potomac Urology Alexandria, VA Independently perform PCR runs, analyze data, and prepare technical reports according to standard operating procedures·.
Associate Researcher Icahn School of Medicine at Mount Sinai New York, NY Susmita Sahoo, at the Icahn School of Medicine at Mount Sinai, New York, NY.

Unfortunately, most of the 1260 listings are not unique. For example, it may be that keywords biotech and biotechnology led to the same job listing. Therefore, I want to filter the data for unique jobs. But, the issue of searching for unique listings is tricky. Consider a job as a research associate offered by company X in two different locations. In this case, Company X posts two job listings with the same job title, job summary and job description. These two listings differ only by the location. Similarly, for unknown reasons, companies post the same job description for different job titles. Hence, I choose to not filter by title, summary and description.

To look for unique jobs, I create another variable that combines the title and location. Looking at Table 1, this another variable takes the value of Research Associate, Library Prep_Stealth Mode Startup. Using this variable as a filter, I remove duplicated listings. After the filter, the number of unique job listings decreases to 622 listings — a reduction of approximately 51%. Upon first inspection, this amount of reduction seems high. To get myself comfortable, I filtered using job title, job description and job summary. Such filters also yield similar reduction.

To summarize, I work with 622 listings for most of the analysis.

Where are the jobs located?

Figure 3: This plot shows the jobs by state. The colors show different job types.Figure 3: This plot shows the jobs by state. The colors show different job types.

Based on information in Figure 2, I classify the jobs into three types. The first type is the research associate — this job listing is the most frequent. The second type is the amalgamation of the other three types based on the cluster diagram. This type includes jobs with titles “research scientist”, “associate scientist” and “research fellow”. The third type is other which encompasses all other jobs.

Figure 3 plots the number of jobs for a few states that advertise ten or more jobs. For example. Of all the unique jobs listed, Texas has twelve jobs. The Figure also classifies the number of jobs by the three job types. Two observations are in order:

  1. The green bar representing the research associate type are only prevalent in California and Massachusetts. In fact, of the 118 research associate jobs, 65 were in California and 27 are in Massachusetts — both states combine to host 78% of the jobs.

  2. The second job type that I label scientist or fellow — consisting of words “research scientist”, “associate scientist” and “research fellow” — are also only prevalent in California and Massachusetts — both states host 29 jobs out of the 60 total jobs. Note that NewYork hosts six job but Figure 3 excludes it because the number of jobs is less than ten.

  3. The north east region consisting of Maryland, Pennyslavania, New York and New Jersey also seem to host “Other” type of jobs.