I will discuss the use of ML/AI at DoorDash to power its on-demand food delivery logistics. Although there are challenges In providing a three-sided marketplace, the company mitigates these challenges through reinforcement learning and supervised learning algorithms. Moreover, the company also uses operations research to tackle multiple decisions by feeding objective goals and constraints to retrain the models. Overall, DoorDash is one of the most successful food delivery companies in America as it is evidenced by its ability to outperform its competitors such as Foodora and Ubereats.
I will try to answer the following questions:
How does DoorDash use ML/AI to optimize its business?
What type of data, ML algorithms and tricks to improve their model’s performance are they using (give examples)?
What challenges and problems has the company encountered in terms of implementation and operation of their ML/AI systems?
What are the most important business gains and benefits that ML/AI creates for DoorDash?
How does ML/AI governance work within DoorDash?
Many people still associate Artificial Intelligence (AI) with science fiction dystopias, but this perception is changing as AI is gradually advancing and becoming a more commonplace in our daily lives. Today, technological progress will undoubtedly continue to change the way we work, live and operate our businesses. This is proven by companies’ enthusiasm to apply AI and Machine Learning (ML) in their businesses for various reasons such as reduced operation cost through predictive analysis. The number of companies implementing AI & ML remains high as it is evidenced by an increase in spending and performing research & development (R&D) in companies towards AI and ML implementation. Since more companies profess to be AI/ML-driven, the effects of using AI/ML have already started to be amplified as industries like manufacturing, retailing, service delivery, finance and education are transforming their core processes and business models. According to the most popular AI and ML authors Russell & Norvig, Machine learning is one of the most common types of Artificial Intelligence used in business development today and it is primarily used to quickly process large amounts of data to deliver business objectives. There are many companies from various industries which have managed to implement ML&AI in their businesses to reach their goals and increase revenue. Hence, i will address the use of ML&AI at DoorDash food delivery company on how ML & AI helped Door Dash mitigate its’ three-sided marketplace business model challenges. I will mainly focus on how DoorDash benefited from using reinforcement learning algorithm and recommendation system to optimize its business as it is evidenced by its ability to outperform its competitors.
By using ML at DoorDash, the company’s logistics engine serves personalized restaurant recommendations and delivery-time predictions to customers who want on-demand access to their local businesses by using a combination of machine-learning models. Meanwhile, it assigns Dashers to orders and sorts through trillions of options to find the best routes while dynamically calculating delivery prices. Moreover, the company has a digital logistics engine which connects three-sided marketplace of merchants, customers and independent contractors known as dashers. Thus, each side of the marketplace uses the platform for a variety of reasons. However, these reasons have their own challenges which i will address. To mitigate these challenges, the use of Reinforcement Learning algorithm, Gradient boost tree and logistics regression will also be discussed as DoorDash uses them to optimize business.
One of the most difficult challenge which was and is still being faced at DoorDash is high variance environment. Due to high variance environment, it is difficult for the company to train consistent ML models as there are so many natural and special events such as COVID-19 and the Super Bowl which affects the performance of models trained. This affects the deployment, execution and performance of models since they will be more difficult to monitor as they will be receiving or seeing new parameters of data now and again they have not been trained for.
Moreover, the also company faces some operational challenges such as scaling and complexity. This is a challenge since it is difficult to manage data on a massive scale for various users with different skill sets. Since the company relies on tremendous amount of data, controlling costs is also one of the main operational challenges which are being faced at DoorDash. This is so because, it is very expensive to pay for data storage, vendors such as Amazon Web Services (AWS) and licensing costs and it is also difficult to manage.
Figure 1. Assignment
problem at DoorDash Source: (Hwang, Ren & Tang, 2018).
As shown in the diagram figure 1, between dashers and merchants, there is an assignment problem of how to assign dashers to deliveries as well as having travel estimates and hotspots to show dashers where its busy (Hwang, Ren & Tang, 2018). Moreover, between merchants and consumers, there is typical e-commerce problems such as recommendations or personalization, search ranking and demand distribution. In addition to that, given a huge selection of merchants, it was difficult to select which store the consumer is interested in.
DoorDash relies on real-time data from merchants, consumers and dashers. This data consist type of food, prices and time needed to prepare the food from merchant, traveling time from merchant to consumers and consumer’s preferred merchants. By using ML alogorithms and models, all this data is used to analyze real-time estimates, recommendations to consumers and search ranking of consumers’ preferred merchants with suggestions as well. Since the company uses multiple matrices to match multiples deliveries, dashers and merchants. DoorDash uses some tricks to retrain its models by applying mix engineer programming with Kotlin, deep neuron agent, ML and operations research system.
Supervised learning algorithm (Gradient boost tree and regression logistics)
DoorDash uses a supervised learning algorithm which is one of the AI techniques that can be used for complex and multiple decisions. As shown in the diagram figure 1, through supervised learning, DoorDash has the liberty to getting classification techniques by which enables them to predict discrete responses.
Figure 2. ML technique,
Supervised learning. Source: owner’s source.
The company also benefits from the regression technique as it helps the company to predict continuous responses (MathWorks, 2020). Overall, of supervised learning algorithm trains a model by using a known set of input data and known responses to the data (output) to generate reasonable predictions for the response to new data. This is demonstrated to detail in the diagram below figure 2. Gradient boost model starts by training a decision tree with equal weights for each observation, this means in boosting, each new tree is a fit on a modified version of the original data set (Singh, 2018).
Figure 2. Schematic diagram of a gradient boosted ensemble of decision trees. Source: (Shoaran, et,. al 2018)
Under supervised learning algorithm, DoorDash specifically uses gradient boosted decision tree model (GBM) (Cai, 2020). This model used dozens of estimators and features taken from a table which has been designed to aggregate features which are useful for their machine learning tasks such as showing the days when there was high demand of deliveries, number of customers gained and whether the day was a holiday or a special event. Thus, this model at DoorDash uses quantile loos function (Anggraina et,. al 2019 ).
Reinforcement learning algorithm (deep neuron agent),
To mitigate the assignment challenge aforementioned before, DoorDash resorted to one of the most powerful AL techniques in ML called reinforcement learning algorithm. Shown in figure 3 the algorithm relies on raw data such as consumer quoted times, estimated order ready times, travel estimates, routing (multiple-delivery assignments) and dasher utilization as an input. All data is then used by the deep neuron agent which gives feedback (reward) (Hwang, Ren & Tang, 2018).
Figure 3. Reinforcement learning algorithm Source: (TechVidvan, 2020)
As demonstrated in the diagram above figure 3 reinforcement learning has a feedback type of algorithm unlike supervised and unsupervised learning algorithms (TechVidvan, 2020). Since DoorDash uses a three-sided marketplace which is supposed to support merchants, dashers and consumers at the same time, hence, company needs an algorithm which can execute multiple matrices and make a variety of decisions as per request by any of its marketplace sides. Through the use of Deep neuron agent, DoorDash trains its models to make a sequence of decisions (Ren, 2020). Thus, as shown in figure 3, In an uncertain, potentially complex environment, the agent learns to achieve a goal.
By using a logistics engine, the ML/AI at DoorDash is the one which powers all its deliveries. The high level goal of the logistics engine at DoorDash is to execute deliveries quickly and efficiency. This means ML/AI is used to optimize on-time delivery, maintain the warmth of the food and provide an efficient delivery marketplace. ML is used used by the dispatch algorithm to decide ho to assign dashers to deliveries as well as predicting travel estimates and generating hotspots to show where its busy. Moreover, because of ML/AI, DoorDash is able to predict demand and supply ahead of time and pull levels to ensure that there will be a balance (Hwang, Ren & Tang, 2018).
Since DoorDash uses a variety of ML algorithms, models and techniques. There are many business benefits which the company gained since it started. These benefits includes, reduced operational expenses, increased trust (brand equity) from both internal stakeholders (employees/dashers) and external stakeholders (consumers and merchants), increased revenue more deals signed between DoorDash and Merchants and increased consumers. All these benefits came through the use of Reinforcement learning, supervised learning and operations research system (Ren, 2020).
Through the used of supervised learning algorithm, DoorDash became one of the most reliable, flexible and efficient food delivery company in America in terms of precise time-delivery estimate, travel-time estimates, dynamic pricing and food preparation time estimates. This is enable by their use of classification technique which helps in predicting discrete responses. Moreover, DoorDash has since gained trust for their consistency in predicting continuous responses since it uses regression techniques. Because of this, the company has gained more customers leading to an increase in revenue (Tonse, 2020).
By using reinforcement learning algorithm, DoorDash achived in outperforming its competitors such as GrubHub, UberEats, Postmates, and Foodora. This makes DoorDash a unique food delivery company as it can simplify complex challenges such as recommendations, assignments in service delivery industry that supports a stable on-demand logistics (Tonse, 2020).
To ensure all decisions concerning ML/AI at DoorDash are organised, the company relies on fours pillars of ML platform which are modeling library, model training pipeline, features service and prediction service. Moreover, operations research and mathematics is used to calculated estimates on different matrices.
Figure 4. Architecture
of DoorDash ML Source: (Reddy, 2020)
Modeling library
As shown in the diagram above, figure 4, DoorDash uses python library for as its modeling library to train, create model artifacts and is only loaded by the prediction service in making offline predictions (Reddy, 2020).
M_odel Training Pipeline
For production use, a pipeline is build so that it creates a platform where models can be trained. As long as a production script is pushed into a repo, a built pipeline will take care of the training the model and uploading the artifacts to the model store (Reddy, 2020). This means, if the modeling library is the compiler that generates the model, then the model training pipeline is the build system(Reddy, 2020).
Features Service
To capture the environment state required for prediction, feature computation, feature storage, and feature serving is needed. Feature computations can be either historical or real-time (Reddy, 2020).
Prediction Service
By suing this service, it is in charge of loading models from the model store, evaluating the model in response to a request, fetching features from the Feature Store, generating prediction logs, and supporting shadowing and A/B testing (Reddy, 2020).
As DoorDash relies on information from merchants, consumers, and dashers to run its on-demand logistics efficiently. The company is considered to be data hungry. This has been shown by its reliance for real-time data and ML/AI. Moreover, compared to its competitors such as Foodora and UberEats, DoorDash outperforms most of them since it uses ML/AI which has helped the company to gain trust and increases revenue hence, It has been discovered that it is important for the company to maintain the supply and demand balance to various levels such as dynamic pricing and delivery time between consumers and dashers since DoorDash aims to provide a reliable, flexible and efficient three-sided marketplace. Moreover, DoorDash made a good choice of use reinforcement learning algorithm and supervised learning algorithm. However, it is not enough as there are multiple matrices which needs to execute multiple decisions to merchants, consumers and dashers. To balance this, DoorDash also uses operations system research. This part of operations system research helps DoorDash when retraining its models it feeds it into optimization system by using objective goals and constraints.
In conclusion, as companies in service delivery industry profess themselves to be ML/AI-driven. There are challenges that can be faced during the implementation of ML/AI such as data storage cost, high variance, scaling and complexity. Since on-demand logistics requires tremendous amounts of data for analysis, DoorDash once had a challenge when it became very expensive for data storage as they use AWS. Due to the complexity of matrices that DoorDash needed to tackle based on their three-sided marketplace, it was also a challenge to scale down so that all decisions required are executable. Although the company faced these challenges, it managed to mitigate all the challenges by using supervised learning algorithm (gradient boost decision tree), reinforcement learning algorithm (deep neuron agent), PDM and operations research systems. By using these ML/AI algorithms, models and techniques, DoorDash realized various business benefits such as increased revenue, stakeholder trust, brand equity and also managed to outperform majority of its competitors.