Draft: use derived fields to implement per-ingredient recipe scoring #121

jayaddison · 2024-08-31T20:33:56Z

Describe the reason for these changes and the problem that they solve

Duing recipe search, we currently calculate matching-scores for each ingredient against each recipe. The implementation we've followed summates these scores -- each represented by a constant_score within an independent power-of-ten numeric range -- and then subsequently infers the number of exact matches and inexact matches that occurred by, essentially, inspecting the digits of that single floating-point number.

This functions as expected, but it encounters a problem when more than 38 ingredients are entered by a user; that precondition causes an overflow of the floating-point value.

This changeset implements a different approach: when a user query is performed, the query will dynamically construct a derived field -- a field that doesn't exist in the indexed recipe documents -- containing a list of boolean values with the same length as the list of query ingredients. For each recipe, each of the boolean values may be null (no match for that ingredient), false (matched, but not exactly), or true (matched exactly).

(inexact-matches are for query terms such as tofu matching against a recipe that mentions silken tofu as an ingredient)

The derived field should provide a much more intuitive way to represent the match-status of each ingredient, and also it is a convenient datastructure to use when calculating total exact-match and inexact-match counts, features needed to sort (rank) the recipe results.

Unfortunately scoring and sorting using derived fields isn't supported in OpenSearch yet, but it may be soon.

Briefly summarize the changes

Derive a multi-valued boolean _found field at query-time, to replace the existing implementation that multiplexes power-of-ten scores into the floating-point _score value.
Update the built-in docstring documentation accordingly.

How have the changes been tested?

Local development testing (this requires an OpenSearch service instance).

List any issues that this change relates to
Resolves #114
Relates to opensearch-project/OpenSearch#12281

jayaddison added 8 commits August 31, 2024 00:17

search: Use derived fields to calculate ingredient match scores

6bda49f

search: Remove use of function_score

5bfc933

search: Cleanup: remove use of constant_score

13f7651

search: Refactor to accomodate black line length limits

2e9539c

search: Rectify derived-field generation method name

1ad9471

search: Relocate method declaration and usage to mirror query pipeline

b29b01c

search: Refactor match scripting to use set-based logic

9638e85

search: Refactor to use boolean instead of long values

1ad3dee

jayaddison mentioned this pull request Aug 31, 2024

Feature request: sorting using the output of multiple script scores opensearch-project/OpenSearch#3715

Closed

jayaddison changed the title ~~Draft: used derived fields to implement per-ingredient recipe scoring~~ Draft: use derived fields to implement per-ingredient recipe scoring Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: use derived fields to implement per-ingredient recipe scoring #121

Draft: use derived fields to implement per-ingredient recipe scoring #121

jayaddison commented Aug 31, 2024

Draft: use derived fields to implement per-ingredient recipe scoring #121

Are you sure you want to change the base?

Draft: use derived fields to implement per-ingredient recipe scoring #121

Conversation

jayaddison commented Aug 31, 2024

Describe the reason for these changes and the problem that they solve

Briefly summarize the changes

How have the changes been tested?