AI-Powered Search

Matthew Emerick

Hey, Doug Turnbull! Thanks for doing this!
The main thing I remember from my AI courses a couple decades ago is the A* algorithm. Has that been unseated as the “ultimate” search algorithm? Will it ever be?

Doug Turnbull

oh ha! A* is about graph search, no? This book is about natural language search 🙂 Of which I doubt there ever will be an ultimate algorithm given how diverse the space and domains are

Doug Turnbull

Hi all, I’m excited to be here. Hopefully I can drag fellow authors here a swell!

WingCode

Hi Doug Turnbull , good to have you here again 🙂

What are the commonly used “dials and knobs” used in search engine to fine-tune its behaviour? example: Synonym groups to handle domain level business jargons. Who usually controls these “dials and knobs”? Is it the data scientists, business team or someone else?

Doug Turnbull

Hey WingCode -

Basically anything that defines the structure of the index and how it’s queried is open game. Maybe some major groupings?
◦ Stages of ranking from first pass retrieval and later reranking against different criteria / loss functions
◦ Synonyms, stemming, lematization, any kind of NLP between the content and the index (or query and querying the index)
◦ Any kind of statistic that might indicate quality (page rank, sales, clicks, etc)
◦ Really the limit is your imagination!
I find its best if the domain expert manages direct synonyms, but the relevance/data team has to decide exactly how they interface into the main algorithm

WingCode

What are the characteristics of your dream search engine? example: For me personally, it is not using any of the facets or “sort by” options. The search engine knows my favorite color is red and usually I look out for the cheapest product out there.

Doug Turnbull

If by search engine you mean the underlying search index technology programmable to build a search solution, I want

A math-oriented, not text-match-oriented, API (see Vespa’s ranking steps)
An ability mix traditional sparse and dense vector indices for hybrid retrieval
Doing all those things at high speed
Declarative configuration, not programmatic configuration, so we can iterate on the search solution independent of the end application
Built in ability to execute arbitrary python code at query and index time with the classic data science toolkit

xnot

Since you mentioned vespa, I’m curious if you would advise picking it over ES as a base search stack ? 🙂

Doug Turnbull

Probably yes these days, but I usually don’t recommend people go through extensive search rewrites just for the sake of the underlying index…

WingCode

Thank you Doug for the detailed answers

xnot

When is a good time to start thinking about investing in LTR capabilities in your search stack?

Doug Turnbull

I think you should always think about it, because the limiting factor is training data, and you would want good training data for a non-LTR solution anyway. Once you figure out the training data side the LTR optimization becomes “easy”

xnot

Great, does the book cover the event analytics side of gathering relevant training data?

Doug Turnbull

Yes! Very much so

xnot

Awesome

xnot

What are the low hanging fruits in AI powered search - stuff with the highest ROI in short amount of time

Doug Turnbull

Anything around query understanding. Can you classify queries into categories? Types of intents, etc based on simple click statistics?

xnot

Thanks! I’m looking forward to this part in the book. I have struggled with query understanding because of lack of click data. Simple methods like string matching lead to a lot of weird edge cases

xnot

That and spelling correction. I have tried the standard stuff like edit distance and metaphone based algos, but they still fall short of expectations

xnot

On a similar note, what are the things which will take a long time to give results, but in the end will be worth all the effort?

Doug Turnbull

This question is really hard as its so domain specific, IMO you really gotta work to spike ideas with an experiment before digging too deep. I’ve seen teams really invest heavily upfront, but not see payoff at the end. That’s a big thing to avoid

xnot

Quite a few of AI capabilities in search require decent amount of data. How do you deal with this if you are starting from scratch?

Doug Turnbull

Data as in clickstream data?
Well earlier I mentioned query understanding as an obvious win. But this classification can also be done through manual labelers (given a sufficiently well formulated task). Of course it breaks down as queries grow in complexity or domain specificity, but that’s a good start.

xnot

Ah yes, does the book cover a the proper ways to approach manual labelling? That space also seems to have exploded (so many tools!)

Doug Turnbull

Sadly we don’t cover this that much, but its a great topic!

Adhi

Hey Doug Turnbull. any thoughts on assessing and tackling position bias through empirical methods like Randpair vs theoretical methods built into LTR models?

Adhi

we found that the position biases calculated using methods like EM didnt line up with what what we found with randpair - so wanted to know whats more typical in industry. thanks!

Doug Turnbull

I haven’t. messed with these, interesting!
Usually my approach is to debias the training data itself using a click model, so I’m not overly coupled to the LTR model itself and these sorts of assumptions…

Doug Turnbull

Because then you can take that training data and study it independent of any model, and decide whether it reflects reality

Carsten Schnober

In practice, search relevancy can be highly subjective, which makes results hard to evaluate and optimise for actual users. Do you think that “AI-powered search” is affected by this more/equally/less than “traditional” (keyword-based) search?

Doug Turnbull

yes it can be highly subjective. I think it means that you have to know, given a keyword, the many possible intents it could have and instead of ranking to one of those intents, you give them a mixture of them. Then as your confidence grows in their intent (through personalization? or just better knowledge about queries), you zero-in on one of the intents…
I wrote an articel about that! https://opensourceconnections.com/blog/2019/09/05/diversity-vs-relevance/

Alexey Grigorev

Speaking of labels, what do you think about clickstream data vs manual assessment (via platforms like mturk and similar)? What are the pros and cons for each? And how can we combine the two?

Doug Turnbull

clickstream data will be like implicit data, where users tell you what they wouldn’t say outloud - whether because they don’t want to say it out loud, or because they’re not conscious of it!
Mechanical turk is great to overcome cold start problems. But might not reflect the reality of your usecase/app. Especially if your app is domain specific
“Combining” is tough, rather I use them as different perspectives on the problem.

Alexey Grigorev

How do we know it’s time to add ML to our search pipeline?

Alexey Grigorev

And finally, what’s the easiest way of adding ML to our search pipeline? let’s say we already have search on our platform, and use solr or elasticsearch for indexing all the documents

Doug Turnbull

This is really tough, and depends so much on the org. Some teams become Lucene experts and get into the nitty gritty of modifying the guts of the search engine. These teams tend to be very engineering heavy and don’t mind doing this kind of work. This solution is nice because it turns your search engine into a “one stop shop”, without needing extra services, to solve your search problems.
But of course, if you’re more data scientist heavy, you’d prefer to work in python as much as possible 🙂 Such teams tend to build search services that front the search system. The nice thing about this is its not a single point of failure, you can fallover to Solr or Elasticsearch if the service becomes unavailable.
Of course, my dream solution would be a search service that lets me host and run the python stack as the query side, exposing to my python code the underlying data structures. Or something that lets me deploy tensorflow or other models into the engine. The closest things out there are Vespa or Jina from what I’ve seen…

Doug Turnbull

I’ll ask a question! What does your search, discovery, or recommendations platform look like at a high level? What do you like, dislike about it? (For example, do you just use Elasticsearch, or do you use Vespa fronted by a Java service? etc etc)

Alexey Grigorev

Is it a question for you Doug Turnbull or for us? 😆

Doug Turnbull

Oh for all of you sorry 😛

Alexey Grigorev

Maybe Cristian Javier Martinez can say a few words about OLX =)

xnot

The one I’m working on right now is pretty simple at a high level. LB-> Gateway -> Reco/Search Backend -> ES/Redis

xnot

Backend is written in Go. Using official lib of es to connect to it. Redis is used for caching of user historical features / recently watched items which we then use for reco

xnot

Don’t use any vector search engine right now. Content size is small enough that we got away with in memory ann index which gets built on service startup (this is used to serve reco)

xnot

What I don’t like - quality of query understanding + spelling correction components, lack of good quality labelled data so to build good classifiers, ES’s json based DSL is a pain, want to eventually decouple ANN from this system

Rishabh Bhargava

Doug Turnbull thanks for doing this!
I’m curious to understand how the best teams measure and understand performance of their search systems on an ongoing basis. What are the dashboards and alerts they’ll set up, and how do they use those to make incremental improvements to their search models?

Doug Turnbull

good question! It takes a lot of different effort and a deep appreciation of the pros and cons of different metrics:

Human ratings, including judgments (evaluation of relevance of each result) and whole SERP evaluation (how ‘good’ does this search results page look)
General search conversion rates over time (though these can be influenced by factors like checkout or product page design)
Search CTR (understanding this is a combination of relevance, UX, perf). Another flaw here is users won’t click if they get their answer from the Search UX itself
Roundtrip latency to the user and other performance metrics, like p90 latency, etc. Super critical and highly correlated with performance
Best and worst performing queries -> a great product person can analyze to see what patterns you do well / do poorly with
Typeahead success: after users clickthrough to a typeahead query suggestion, do they take a follow on action or was the click in the end perhaps not so great
Content performance: what content does well / poorly in the search system? Are there areas where the content itself needs to be tuned to be more findable
Probably a ton others, but the really great teams, work really really hard here. It takes a lot of great data work to make use of these metrics!

Rishabh Bhargava

This is super interesting. Quick follow-up: how do teams quantify best and worst performing queries?

Tim Becker

Hi Doug Turnbull, interesting book! I have some basic questions for you.

Tim Becker

Could you please explain what a search engine actually is?

Doug Turnbull

omg I have no idea. Some people use the term to mean just the technology that serves results from sparse (classic inverted) or dense vector (approximate nearest neighbor) indices. This is a piece of infrastructure
Other people use the term to refer to a full search solution that solves a specific problem. In the latter case lots of pieces of infra can be involved, not just the index-serving part but query understanding and rerannkig layers

Tim Becker

What kind of ML models are usually applied to search problems?

Doug Turnbull

Classically GBDT (Gradient Boosted Decision Trees) have worked well and fast for ranking, as these play well with existing technologies. But increasingly, the problem can be distilled to similarity function modeled in a nearest-neighbor index. This similarity might be the result of an embedding generated from a deep learning or other model

Tim Becker

What was the most exiting AI powered search problem you have been working on?

Doug Turnbull

Aside from Shopify, I got to work on the Elasticsearch Learning to Rank plugin that helps power Wikipedia search. Exciting from a tech, data, and impact perspective 🙂 Also very high scale!

Tim Becker

cool! Thank you for your answers 🙂

Adhi

Doug Turnbull any thoughts on the right way to combine filtering and nearest neighbor search? do we do the former first and then the latter or the other way around? is there a way to do both reliably?

Doug Turnbull

I actually don’t know 😅
With a faster NN index, I’d like to do the NN part first as it helps improve recall. Then filter out candidates. But I think combining the two is still an art, and very much an open area of research

DataTalks.Club

by Trey Grainger, Doug Turnbull, Max Irwin

The book of the week from 01 Nov 2021 to 05 Nov 2021

Questions and Answers