Review: ‘My Experience As a Freelance Data Scientist’

Article Summary

Having spent a year working as a freelance data consultant, Greg Reda shares some of his experiences and learnings on the job. What were some of his hard-learned lessons? The main four are below:

1. Keep it simple, stupid

a. Allow the client to forge direction of work and don't do more than told
b. Keep it simple unless told otherwise

2. Try to get systems access before the project begins

a. Getting access to data can take days, best to get ball rolling before start of project

3. Productize the consulting

a. Have a deliverable product to give the client at the end of the project 
b. Have a fixed price already in mind for creating it

4. Don’t bill hourly

a. Tracking hours is difficult and limits margins 
b. Better to sell daily/weekly rates or productized consulting

Ultimately, Greg concludes that it was a good year where he grew a lot as a data scientist. He went into it with an idea that there are:

a population of companies trying to figure out how to utilize their data, who are not interested in bringing on a consulting firm ($$$), and don’t necessarily know if they need a data scientist full-time yet.

Unfortunately, a lifestyle as a freelance data scientist can be difficult to maintain long term. While the freedom and options that are inherent in freelance work are nice, naturally there are downsides. There is no team working with you and, as one person, projects can become difficult to tackle. Additionally, companies are risk-averse and will not often hire an outside contractor to design an integral part of their infrastructure. Rather, freelancers are usually stuck with the dirty work that nobody on the company payroll wants to do. Still, for those who can tough out that mercenary sort of lifestyle, where you just do the job and get paid, Greg says there is certainly demand and pay for the roving freelance data scientist.

Statistics & Interpretations

According to the website Kaggle’s State of Data Science and Machine Learning 2019, the current breakdown of data scientists is:

Company Size	Count
0-49 employees	3530
50-249 employees	2088
250-999 employees	1651
1000-9,999 employees	2418
> 10,000 employees	2810

As seen in the table above, exactly 3,530 data scientists find themselves in companies of 0-49 employees (the smallest bracket that the survey measured). This is the largest portion of the participants and it is quite possible that a number of this group are freelancers (where they are the only employee). Interestingly, the second largest group of data scientists (2,810) are in companies with >10,000 employees, suggesting that there is a wide range of vocational options for those in the data science field. The Frequency Table of Company Size by Number of Employees is shown below:

Further information can be found at the executive summary of the Kaggle 2019 Survey of Data Science here.

More About the Author

Greg Reda

Greg Reda is a Californian data scientist and software engineer currently based in San Francisco. Before San Fran, he grew up and spent much of his life in Chicago. His hobbies include pizza-making, cycling, baking, and music. Currently, Greg works as a Machine Learning Engineer on the Delivery Logistics Team at Instacart, an app-based company that delivers groceries to people’s doors. You can see his full resume here.

His blog website were he talks all things data science can be reached here.

My Opinion About the Article

I believe this article gives great advice for those data scientists who are considering whether to work full-time or freelance. It breaks down the benefits of independent contracting (particularly in pay, freedom, and flexibility), while not shying away from mentioning the negatives (tediousness, lack of fulfillment). I like the list of lessons Greg mentions he learned from the experience - it can help all people who work with data to remember to keep it simple & productize their work. It would have been helpful if Greg had also discussed his experience working as a full-time employee with a single company, perhaps more about his previous work as a Data Analyst Manager at GrubHub or more about his work with Instacart now.

Related Sources

Similar articles discussing data science freelancing can be found through the links below:

I Became a Freelancing Data Scientist by Wei Lin (August 4, 2019)

How to become a freelance Data Scientist by Carl Dawson (March 9, 2019)

How to Become a Freelance Data Scientist by Michael Rundell (August 1, 2016)

Freelancer or Employee: Your Best Arguments by Melanie Pinola (February 25, 2016)

Greg Reda’s Other Stuff

Greg Reda discusses a number of interesting data science topics on his website, some particularly eyecatching articles are linked below:

Data-Informed vs Data-Driven

Principles of Good Data Analysis

Cohort Analysis with Python

Hiring Data Scientists

Example Dataset

Using the datasets from the “nycflights13” package we can take advantage of Rmarkdown’s flexibility to examine the data of flights recorded as coming into and leaving airports in the NYC area over the course of a particular timeframe. Looking at the table below of mean arrival and departure delays we can see that the flight carrier with the worst average delays is Frontier Airlines Inc. (with 20.2 min departure delay and 21.9 min arrival delay averages).

name	mean(Departure_Delay)	mean(Arrival_Delay)
AirTran Airways Corporation	18.605984	20.1159055
Alaska Airlines Inc.	5.830748	-9.9308886
American Airlines Inc.	8.569130	0.3642909
Delta Air Lines Inc.	9.223950	1.6443409
Endeavor Air Inc.	16.439574	7.3796692
Envoy Air	10.445381	10.7747334
ExpressJet Airlines Inc.	19.838929	15.7964311
Frontier Airlines Inc.	20.201175	21.9207048
Hawaiian Airlines Inc.	4.900585	-6.9152047
JetBlue Airways	12.967548	9.4579733
Mesa Airlines Inc.	18.898897	15.5569853
SkyWest Airlines Inc.	12.586207	11.9310345
Southwest Airlines Co.	17.661657	9.6491199
United Air Lines Inc.	12.016908	3.5580111
US Airways Inc.	3.744693	2.1295951
Virgin America	12.756646	1.7644644

Furthermore, when we filter the dataset to focus on just the flight carrier United Airlines Inc., we can take the Departure Delay times and the Arrival Delay times for each UA flight and plot them to create a correlational scatterplot (see below). This shows a strong positive correlation between the two values - as one goes up, so does the other. This makes sense, a large departure delay would correspond with a commensurate arrival delay. A correlation coefficient was calculated and found to be 0.885 (see below).

cor(flightsUA$Departure_Delay, flightsUA$Arrival_Delay)
0.8853862

References Section

To get the graphic of the data scientist: Data Scientist

To get the picture of Greg Reda: Greg Reda

To help create the map of California using ggplot2: Drawing beautiful maps programmatically with R, sf, and ggplot2

To get the graphic of the social media icons: Social Media