Apache Solr has come a long way from being used for simple full-text search to modern day analytics, geospatial, media and multi-tenant search applications. However, it suffers from the inductive problem of “schema-resolution”.
While there exists a “schema-less” mode in Apache Solr, it doesn't really solve the above problem, as it generates a very generic schema under the hood. At Unbxd, a multi-tenant e-commerce search platform, arriving at the most optimal schema is critical for performance and functionality.
This talk presents our contribution to Solr (SOLR-11741), a “schema-learning mode” that leverages “field type hierarchy” to solve the schema inference problem by learning from source documents and run-time query patterns. The talk additionally focuses on how searching, sorting & faceting can become more efficient with this feature and provide insights into data anomalies.