SYNDICATED POST
By Shrinath Agarwal - Software Developer at Swiggy. Hosted on Medium.
Machine learning models play a critical role in providing a world-class customer experience and taking business decisions at Swiggy. DSP (Data Science Platform) powers ML and AI in production at Swiggy. For example, this platform provides personalized recommendations, curated search results, helps assign delivery executives (DE) to orders, recognizes fraud, etc. It is important for the DSP to implement services that can serve machine learning at scale with bounded latency SLAs. This post focuses on optimizations that the DSP team has done to improve latency at scale.
DSP system is built on top of Akka to support the Reactive principles mentioned in Reactive Architecture. All tasks within the DSP system happen via actors. There are multiple actors ( e.g. ModelManager, ModelDiscovery, ModelPredictor, FeatureExtractor, etc.) are involved during the entire serving of the prediction request. All these actors communicate using message-driven architecture along with ensuring elasticity, resiliency, and responsiveness. All internal codebases are well written using Scala as a primary language due to its known benefits. DSP system internally uses Redis as a primarily supported feature store (i.e., centralized repositories for all precomputed features) along with feature sharing and discovery capabilities across the organization.
Many use cases require multiple predictions in a single API call. This fanout (a single API call creating multiple prediction calls) results in increased throughput on DSP services. The primary challenge then becomes bounding the latency of the API call at high throughputs. The throughput itself is a function of business growth for many top of the funnel business support ML models. For example, one such use case is Order-Assignment flow at Swiggy. To decide “which” order should be assigned to “which” delivery executive in a near-optimal way. This process involves multiple machine learning models.
Generally, Order-Assignment is fulfilled by city-level batch jobs which trigger at a configured amount of time interval and make predictions for each edge (Order X DE) level. This is then used as input to an algorithm that arrives at the most optimal assignment.
This requires predictions for a large number of combinations of possible Order X DE pairs at once within a strict latency bound. Other use cases such as discounting, ads, etc require batch predictions at scale due to the high fanout factor.
Above challenges boil down to exposing a batch prediction API to clients to make bulk predictions (i.e., combine multiple requests into one). From a client’s point of view, this will reduce the total network round trip time (RTT) in calling multiple individual requests for each prediction. DSP exposes a predict-set API significantly reducing the number of calls needed from the clients.
While this addresses engineering problems to some extent, it creates a different set of challenges. For example, clients need lower latency profiles for some business-critical use cases (e.g., The entire Order-Assignment flow has the expectation to complete all its tasks for a given city within 1 minute.)
To achieve latency improvement at this scale, we analyzed expensive operations involved in the prediction flow and how we can optimize them. We performed five types of optimizations:
We tested batch predictions using a high throughput use case — Order-Assignment. Every order on Swiggy is scored against available delivery agents within a given geographic boundary. With this optimization in place, DSP was able to handle over 5K peak predictions per second for a single model, with a P99 latency of 71 ms for a batch size of 30. Earlier P99 latency was 144 ms (almost 2x gain).
Latencies before and after the change
This rollout demonstrated DSP’s readiness to handle much higher throughputs for several use cases such as discounting, ads, listing, etc. Although DSP’s role as the predictor for Swiggy has just begun, it has already been an exciting and active one!
Thanks to Siddhant Srivastava, Abhay Kumar Chaturvedi, and Pranay Mathur for your involvement in the development of this feature.