Modern search systems at scale are often architected as a real-time processing pipeline where the query and its results flow through multiple stages before returning them to the user. Concretely, a search query flowing through this pipeline might be stemmed by a stemmer, tokenized by a "parts of speech" natural language parser, transformed to a query plan conforming to a boolean retrieval model, executed against an inverted index, and finally the top-k results could be reordered based on an online machine-learned ranker. Many of these transformations of queries and results require performing a network IO to external specialized services. Our proposal is to model the search pipeline as a monad transformer composed of Reader monad and a Future.
About 1500 people search for handmade and vintage items on Etsy every second. Several different backends power Etsy's search, among them Solr, Elasticsearch, our own key-value-store Arizona, and services for machine learning and inference. How do all these systems work together, present a common interface to Etsy's developers and a coherent search experience to our users?
This problem requires a distributed system that scales very well, and has a state space that is still easy to reason about. We have built a smart proxy in scala with minimal state to solve this problem. It expands on the ideas of the "Your Server as a Function" paper. The idea is that basically all program state comes in via the request encoded in a Reader monad, the proxy calls out to the appropriate backend services and combines their responses, behaving like a pure function.
The proxy retrieves search results from a Solr backend, ranks them based on a machine learning model that incorporates the user's context, and returns the ranked results. If a search is unsuccessful, it can decide to kick off alternative search requests and return their results instead, without needing any frontend interaction. Via an intermediate tree representation that encodes a boolean retrieval model which is independent of the backends, we can combine key-value-store, machine learning and search indexes to improve the search result for the end user. The interplay of the different backends to create new services is achieved via a configuration system. It makes heavy use of Scala's type system to lift the types from the data fields in the storage backends into the scala code and ensure consistent types from the incoming thrift request throughout the entire system.
Twitter’s Finagle library provides us with RPC clients that abstract over different protocols like thrift, mux or http and lets us interact with them in an asynchronous manner via Futures. Using Futures, we can build custom machine learning pipelines by composing sequentially & concurrently over calls to backend services. It becomes easy to add custom query pipelines for machine learning, such as a recommendation query pipeline that queries a user database, gets the user’s purchase history and favorites, and constructs a recommendation set using a model that takes the user context as input. With experimentation as a first class citizen in our design, A/B testing allows us to test different feature sets and modelling techniques in our machine learning services.
In order to maintain independent failure domains for suggestions and search we run the proxy on Kubernetes as separate services. The Kubernetes autoscaler helps us to react to changing traffic patterns in a flexible way. We will talk about our learnings along the way of building this proxy, and trying to find the right abstraction for the search problem.