Skip to content

Commit

Permalink
v1.11: AI-powered search updates (#3011)
Browse files Browse the repository at this point in the history
---------

Co-authored-by: Louis Dureuil <[email protected]>
  • Loading branch information
guimachiavelli and dureuill authored Oct 9, 2024
1 parent 85da89e commit a3072d4
Show file tree
Hide file tree
Showing 4 changed files with 42 additions and 11 deletions.
4 changes: 2 additions & 2 deletions .code-samples.meilisearch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1242,7 +1242,7 @@ search_parameter_guide_hybrid_1: |-
"q": "kitchen utensils",
"hybrid": {
"semanticRatio": 0.9,
"embedder": "default"
"embedder": "EMBEDDER_NAME"
}
}'
search_parameter_guide_vector_1: |-
Expand Down Expand Up @@ -1321,7 +1321,7 @@ search_parameter_reference_retrieve_vectors_1: |-
"q": "kitchen utensils",
"retrieveVectors": true,
"hybrid": {
"embedder": "default"
"embedder": "EMBEDDER_NAME"
}
}'
search_parameter_reference_distinct_1: |-
Expand Down
6 changes: 3 additions & 3 deletions learn/ai_powered_search/getting_started_with_ai_search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -50,15 +50,15 @@ curl \

Next, you must generate vector embeddings for all documents in your dataset. Embeddings are mathematical representations of the meanings of words and sentences in your documents. Meilisearch relies on external providers to generate these embeddings. Use OpenAI for this tutorial.

Use the `embedders` index setting of the [update `/settings` endpoint](/reference/api/settings?utm_campaign=vector-search&utm_source=docs&utm_medium=vector-search-guide) to configure a default [OpenAI](https://platform.openai.com/) embedder:
Use the `embedders` index setting of the [update `/settings` endpoint](/reference/api/settings?utm_campaign=vector-search&utm_source=docs&utm_medium=vector-search-guide) to configure an [OpenAI](https://platform.openai.com/) embedder:

```sh
curl \
-X PATCH 'http://localhost:7700/indexes/kitchenware/settings' \
-H 'Content-Type: application/json' \
--data-binary '{
"embedders": {
"default": {
"openai": {
"source": "openAi",
"apiKey": "OPEN_AI_API_KEY",
"model": "text-embedding-3-small",
Expand Down Expand Up @@ -91,7 +91,7 @@ curl \
--data-binary '{
"q": "kitchen utensils made of wood",
"hybrid": {
"embedder": "default",
"embedder": "openai",
"semanticRatio": 0.7
}
}'
Expand Down
8 changes: 7 additions & 1 deletion reference/api/search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -1181,7 +1181,7 @@ Configures Meilisearch to return search results based on a query's meaning and c

`hybrid` must be an object. It accepts two fields: `embedder` and `semanticRatio`.

`embedder` must be a string indicating an embedder configured with the `/settings` endpoint. If you don't specify an embedder and your index contains a single embedder, Meilisearch uses it by default. If an index contains multiple embedders, Meilisearch will use the embedder named `default`.
`embedder` must be a string indicating an embedder configured with the `/settings` endpoint. It is mandatory to specify a valid embedder when performing AI-powered searches.

`semanticRatio` must be a number between `0.0` and `1.0` indicating the proportion between keyword and semantic search results. `0.0` causes Meilisearch to only return keyword results. `1.0` causes Meilisearch to only return meaning-based results. Defaults to `0.5`.

Expand All @@ -1205,6 +1205,12 @@ Use a custom vector to perform a search query. Must be an array of numbers corre

`vector` dimensions must match the dimensions of the embedder.

<Capsule intent="note">
If a query does not specify `q`, but contains both `vector` and `hybrid.semanticRatio` bigger than `0`, Meilisearch performs a pure semantic search.

If `q` is missing and `semanticRatio` is explicitly set to `0`, Meilisearch performs a placeholder search without any vector search results.
</Capsule>

#### Example

<CodeSamples id="search_parameter_guide_vector_1" />
Expand Down
35 changes: 30 additions & 5 deletions reference/api/settings.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2183,12 +2183,14 @@ These embedder objects may contain the following fields:
| **`url`** | String | `http://localhost:11434/api/embeddings` | The URL Meilisearch contacts when querying the embedder |
| **`apiKey`** | String | Empty | Authentication token Meilisearch should send with each request to the embedder. If not present, Meilisearch will attempt to read it from environment variables |
| **`model`** | String | Empty | The model your embedder uses when generating vectors |
| **`documentTemplate`** | String | `{% for field in fields %}{{field.name}}: {{field.value}}\n{% endfor %}` | Template defining the data Meilisearch sends the embedder |
| **`documentTemplate`** | String | `{% for field in fields %} {% if field.is_searchable and not field.value == nil %}{{ field.name }}: {{ field.value }} {% endif %} {% endfor %}` | Template defining the data Meilisearch sends to the embedder |
| **`documentTemplateMaxBytes`** | Integer | `400` | Maximum allowed size of rendered document template |
| **`dimensions`** | Integer | Empty | Number of dimensions in the chosen model. If not supplied, Meilisearch tries to infer this value |
| **`revision`** | String | Empty | Model revision hash |
| **`distribution`** | Object | Empty | Describes the natural distribution of search results. Must contain two fields, `mean` and `sigma`, each containing a numeric value between `0` and `1` |
| **`request`** | Object | Empty | A JSON value representing the request Meilisearch makes to the remote embedder |
| **`response`** | Object | Empty | A JSON value representing the request Meilisearch expects from the remote embedder |
| **`binaryQuantized`** | Boolean | Empty | Once set to `true`, irreversibly converts all vector dimensions to 1-bit values |

### Get embedder settings

Expand Down Expand Up @@ -2242,6 +2244,7 @@ Partially update the embedder settings for an index. When this setting is update
"apiKey": <String>,
"model": <String>,
"documentTemplate": <String>,
"documentTemplateMaxBytes": <Integer>,
"dimensions": <Integer>,
"revision": <String>,
"distribution": {
Expand All @@ -2250,7 +2253,8 @@ Partially update the embedder settings for an index. When this setting is update
},
"request": { },
"response": { },
"headers": { }
"headers": { },
"binaryQuantized": <Boolean>
}
}
```
Expand Down Expand Up @@ -2295,7 +2299,7 @@ This field is incompatible with `huggingFace` and `userProvided` embedders.

The model your embedder uses when generating vectors. These are the officially supported models Meilisearch supports:

- `openAi`: `openai-text-embedding-ada-002`, `text-embedding-3-small`, and `text-embedding-3-large`
- `openAi`: `text-embedding-3-small`, `text-embedding-3-large`, `openai-text-embedding-ada-002`
- `huggingFace`: `BAAI/bge-base-en-v1.5`

Other models, such as [HuggingFace's BERT models](https://huggingface.co/models?other=bert) or those provided by Ollama and REST embedders may also be compatible with Meilisearch.
Expand All @@ -2313,12 +2317,25 @@ This field is incompatible with `rest` and `userProvided` embedders.
You may use the following context values:

- `{{doc.FIELD}}`: `doc` stands for the document itself. `FIELD` must correspond to an attribute present on all documents value will be replaced by the value of that field in the input document
- `{{fields}}`: a list of all the `field`s appearing in any document in the index. Each `field` object in this list has two properties: `name` and `value`. If a `field` does not exist in a document, `value` is `nil`
- `{{fields}}`: a list of all the `field`s appearing in any document in the index. Each `field` object in this list has the following properties:
- `name`: the field's attribute
- `value`: the field's value
- `is_searchable`: whether the field is present in the searchable attributes list

For best results, build short templates that only contain highly relevant data. If working with a long field, consider [truncating it](https://shopify.github.io/liquid/filters/truncatewords/). If you do not manually set it, `documentTemplate` will include all document fields. This may lead to suboptimal performance and relevancy.
If a `field` does not exist in a document, its `value` is `nil`.

For best results, build short templates that only contain highly relevant data. If working with a long field, consider [truncating it](https://shopify.github.io/liquid/filters/truncatewords/). If you do not manually set it, `documentTemplate` will include all searchable and non-null document fields. This may lead to suboptimal performance and relevancy.

This field is optional but strongly encouraged for all embedders.

##### `documentTemplateMaxBytes`

The maximum size of a rendered document template. Longer texts are truncated to fit the configured limit.

`documentTemplateMaxBytes` must be an integer. It defaults to `400`.

This field is optional for all embedders.

##### `dimensions`

Number of dimensions in the chosen model. If not supplied, Meilisearch tries to infer this value.
Expand Down Expand Up @@ -2460,6 +2477,14 @@ This field is optional when using the `rest` embedder.

This field is incompatible with all other embedders.

##### `binaryQuantized`

When set to `true`, compresses vectors by representing each of its dimensions with 1-bit values. This reduces relevancy of semantic search results, but greatly reduces database size.

<Capsule intent="danger" title="Binary quantization is an irreversible process">
**Activating `binaryQuantized` is irreversible.** Once enabled, Meilisearch converts all vectors and discards all vector data that does fit within 1-bit. The only way to recover the vectors' original values is to re-vectorize the whole index in a new embedder.
</Capsule>

#### Example

<CodeSamples id="update_embedders_1" />
Expand Down

0 comments on commit a3072d4

Please sign in to comment.