Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: use derived fields to implement per-ingredient recipe scoring #121

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jayaddison
Copy link
Member

Describe the reason for these changes and the problem that they solve

Duing recipe search, we currently calculate matching-scores for each ingredient against each recipe. The implementation we've followed summates these scores -- each represented by a constant_score within an independent power-of-ten numeric range -- and then subsequently infers the number of exact matches and inexact matches that occurred by, essentially, inspecting the digits of that single floating-point number.

This functions as expected, but it encounters a problem when more than 38 ingredients are entered by a user; that precondition causes an overflow of the floating-point value.

This changeset implements a different approach: when a user query is performed, the query will dynamically construct a derived field -- a field that doesn't exist in the indexed recipe documents -- containing a list of boolean values with the same length as the list of query ingredients. For each recipe, each of the boolean values may be null (no match for that ingredient), false (matched, but not exactly), or true (matched exactly).

(inexact-matches are for query terms such as tofu matching against a recipe that mentions silken tofu as an ingredient)

The derived field should provide a much more intuitive way to represent the match-status of each ingredient, and also it is a convenient datastructure to use when calculating total exact-match and inexact-match counts, features needed to sort (rank) the recipe results.

Unfortunately scoring and sorting using derived fields isn't supported in OpenSearch yet, but it may be soon.

Briefly summarize the changes

  1. Derive a multi-valued boolean _found field at query-time, to replace the existing implementation that multiplexes power-of-ten scores into the floating-point _score value.
  2. Update the built-in docstring documentation accordingly.

How have the changes been tested?

  1. Local development testing (this requires an OpenSearch service instance).

List any issues that this change relates to
Resolves #114
Relates to opensearch-project/OpenSearch#12281

@jayaddison jayaddison changed the title Draft: used derived fields to implement per-ingredient recipe scoring Draft: use derived fields to implement per-ingredient recipe scoring Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Unable to search for more than 38 ingredients
1 participant