Skip to content

Commit

Permalink
Merge pull request #43 from matthiasthomas/matthias/add-support-for-l…
Browse files Browse the repository at this point in the history
…atest-embeddings-models

feat: add support for latest embedding models
  • Loading branch information
pkoukk authored May 21, 2024
2 parents 475cdcd + 14e35cc commit 5fef437
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 2 deletions.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ func NumTokensFromMessages(messages []openai.ChatCompletionMessage, model string
# Available Encodings
| Encoding name | OpenAI models |
| ----------------------- | ---------------------------------------------------- |
| `cl100k_base` | `gpt-4`, `gpt-3.5-turbo`, `text-embedding-ada-002` |
| `cl100k_base` | `gpt-4`, `gpt-3.5-turbo`, `text-embedding-ada-002`, `text-embedding-3-small`, `text-embedding-3-large` |
| `p50k_base` | Codex models, `text-davinci-002`, `text-davinci-003` |
| `r50k_base` (or `gpt2`) | GPT-3 models like `davinci` |

Expand Down Expand Up @@ -208,6 +208,8 @@ func NumTokensFromMessages(messages []openai.ChatCompletionMessage, model string
| text-davinci-edit-001 | p50k_edit |
| code-davinci-edit-001 | p50k_edit |
| text-embedding-ada-002 | cl100k_base |
| text-embedding-3-small | cl100k_base |
| text-embedding-3-large | cl100k_base |
| text-similarity-davinci-001 | r50k_base |
| text-similarity-curie-001 | r50k_base |
| text-similarity-babbage-001 | r50k_base |
Expand Down
4 changes: 3 additions & 1 deletion README_zh-hans.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ func NumTokensFromMessages(messages []openai.ChatCompletionMessage, model string
# available encodings
| Encoding name | OpenAI models |
| ----------------------- | ---------------------------------------------------- |
| `cl100k_base` | `gpt-4`, `gpt-3.5-turbo`, `text-embedding-ada-002` |
| `cl100k_base` | `gpt-4`, `gpt-3.5-turbo`, `text-embedding-ada-002`, `text-embedding-3-small`, `text-embedding-3-large` |
| `p50k_base` | Codex models, `text-davinci-002`, `text-davinci-003` |
| `r50k_base` (or `gpt2`) | GPT-3 models like `davinci` |

Expand Down Expand Up @@ -200,6 +200,8 @@ func NumTokensFromMessages(messages []openai.ChatCompletionMessage, model string
| text-davinci-edit-001 | p50k_edit |
| code-davinci-edit-001 | p50k_edit |
| text-embedding-ada-002 | cl100k_base |
| text-embedding-3-small | cl100k_base |
| text-embedding-3-large | cl100k_base |
| text-similarity-davinci-001 | r50k_base |
| text-similarity-curie-001 | r50k_base |
| text-similarity-babbage-001 | r50k_base |
Expand Down
2 changes: 2 additions & 0 deletions encoding.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ var MODEL_TO_ENCODING = map[string]string{
"code-davinci-edit-001": MODEL_P50K_EDIT,
// embeddings
"text-embedding-ada-002": MODEL_CL100K_BASE,
"text-embedding-3-large": MODEL_CL100K_BASE,
"text-embedding-3-small": MODEL_CL100K_BASE,
// old embeddings
"text-similarity-davinci-001": MODEL_R50K_BASE,
"text-similarity-curie-001": MODEL_R50K_BASE,
Expand Down

0 comments on commit 5fef437

Please sign in to comment.