Article Link: My Experience As a Freelance Data Scientist
Written by Greg Reda on January 07, 2017.
The Freelance Data Scientist
Having spent a year working as a freelance data consultant, Greg Reda shares some of his experiences and learnings on the job. What were some of his hard-learned lessons? The main four are below:
1. Keep it simple, stupid
a. Allow the client to forge direction of work and don't do more than told
b. Keep it simple unless told otherwise
2. Try to get systems access before the project begins
a. Getting access to data can take days, best to get ball rolling before start of project
3. Productize the consulting
a. Have a deliverable product to give the client at the end of the project
b. Have a fixed price already in mind for creating it
4. Don’t bill hourly
a. Tracking hours is difficult and limits margins
b. Better to sell daily/weekly rates or productized consulting
Ultimately, Greg concludes that it was a good year where he grew a lot as a data scientist. He went into it with an idea that there are:
a population of companies trying to figure out how to utilize their data, who are not interested in bringing on a consulting firm ($$$), and don’t necessarily know if they need a data scientist full-time yet.
Unfortunately, a lifestyle as a freelance data scientist can be difficult to maintain long term. While the freedom and options that are inherent in freelance work are nice, naturally there are downsides. There is no team working with you and, as one person, projects can become difficult to tackle. Additionally, companies are risk-averse and will not often hire an outside contractor to design an integral part of their infrastructure. Rather, freelancers are usually stuck with the dirty work that nobody on the company payroll wants to do. Still, for those who can tough out that mercenary sort of lifestyle, where you just do the job and get paid, Greg says there is certainly demand and pay for the roving freelance data scientist.
According to the website Kaggle’s State of Data Science and Machine Learning 2019, the current breakdown of data scientists is:
| Company Size | Count |
|---|---|
| 0-49 employees | 3530 |
| 50-249 employees | 2088 |
| 250-999 employees | 1651 |
| 1000-9,999 employees | 2418 |
| > 10,000 employees | 2810 |
As seen in the table above, exactly 3,530 data scientists find themselves in companies of 0-49 employees (the smallest bracket that the survey measured). This is the largest portion of the participants and it is quite possible that a number of this group are freelancers (where they are the only employee). Interestingly, the second largest group of data scientists (2,810) are in companies with >10,000 employees, suggesting that there is a wide range of vocational options for those in the data science field. The Frequency Table of Company Size by Number of Employees is shown below:
Further information can be found at the executive summary of the Kaggle 2019 Survey of Data Science here.
I believe this article gives great advice for those data scientists who are considering whether to work full-time or freelance. It breaks down the benefits of independent contracting (particularly in pay, freedom, and flexibility), while not shying away from mentioning the negatives (tediousness, lack of fulfillment). I like the list of lessons Greg mentions he learned from the experience - it can help all people who work with data to remember to keep it simple & productize their work. It would have been helpful if Greg had also discussed his experience working as a full-time employee with a single company, perhaps more about his previous work as a Data Analyst Manager at GrubHub or more about his work with Instacart now.
Using the datasets from the “nycflights13” package we can take advantage of Rmarkdown’s flexibility to examine the data of flights recorded as coming into and leaving airports in the NYC area over the course of a particular timeframe. Looking at the table below of mean arrival and departure delays we can see that the flight carrier with the worst average delays is Frontier Airlines Inc. (with 20.2 min departure delay and 21.9 min arrival delay averages).
| name | mean(Departure_Delay) | mean(Arrival_Delay) |
|---|---|---|
| AirTran Airways Corporation | 18.605984 | 20.1159055 |
| Alaska Airlines Inc. | 5.830748 | -9.9308886 |
| American Airlines Inc. | 8.569130 | 0.3642909 |
| Delta Air Lines Inc. | 9.223950 | 1.6443409 |
| Endeavor Air Inc. | 16.439574 | 7.3796692 |
| Envoy Air | 10.445381 | 10.7747334 |
| ExpressJet Airlines Inc. | 19.838929 | 15.7964311 |
| Frontier Airlines Inc. | 20.201175 | 21.9207048 |
| Hawaiian Airlines Inc. | 4.900585 | -6.9152047 |
| JetBlue Airways | 12.967548 | 9.4579733 |
| Mesa Airlines Inc. | 18.898897 | 15.5569853 |
| SkyWest Airlines Inc. | 12.586207 | 11.9310345 |
| Southwest Airlines Co. | 17.661657 | 9.6491199 |
| United Air Lines Inc. | 12.016908 | 3.5580111 |
| US Airways Inc. | 3.744693 | 2.1295951 |
| Virgin America | 12.756646 | 1.7644644 |
Furthermore, when we filter the dataset to focus on just the flight carrier United Airlines Inc., we can take the Departure Delay times and the Arrival Delay times for each UA flight and plot them to create a correlational scatterplot (see below). This shows a strong positive correlation between the two values - as one goes up, so does the other. This makes sense, a large departure delay would correspond with a commensurate arrival delay. A correlation coefficient was calculated and found to be 0.885 (see below).
| cor(flightsUA\(Departure_Delay, flightsUA\)Arrival_Delay) |
|---|
| 0.8853862 |
To get the graphic of the data scientist: Data Scientist
To get the picture of Greg Reda: Greg Reda
To help create the map of California using ggplot2: Drawing beautiful maps programmatically with R, sf, and ggplot2
To get the graphic of the social media icons: Social Media
Social Media
Greg Reda’s Social Media:
You can reach his Linkedin here.
Here is the link to his Twitter.
Access to his Github here.