Skip to content

Commit

Permalink
Couple docs edits
Browse files Browse the repository at this point in the history
  • Loading branch information
rjrudin committed Sep 24, 2024
1 parent 8ed927e commit a81d414
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 17 deletions.
16 changes: 7 additions & 9 deletions docs/embedding.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,6 @@ title: Embedding Examples
nav_order: 5
---

## Table of contents
{: .no_toc .text-delta }

- TOC
{:toc}

## Adding embeddings with langchain4j

The vector queries shown in the [langchain](../rag-langchain-python/README.md),
[langchain4j](../rag-langchain-java), and [langchain.js](../rag-langchain-js/README.md) RAG examples
depend on embeddings - vector representations of text - being added to documents in MarkLogic. Vector queries can
Expand All @@ -21,6 +13,12 @@ This project demonstrates the use of a
the [MarkLogic Data Movement SDK](https://docs.marklogic.com/guide/java/data-movement) for adding embeddings to
documents in MarkLogic.

## Table of contents
{: .no_toc .text-delta }

- TOC
{:toc}

## Setup

This example depends both on the [main setup for all examples](../setup/README.md) and also on having run the
Expand All @@ -29,7 +27,7 @@ This example depends both on the [main setup for all examples](../setup/README.m
the text in Enron email documents and write each chunk of text to a separate document. This example will then use
langchain4j to generate an embedding for the chunk of text and add it to each chunk document.

## Add embeddings example
## Adding embedding to documents

To try the embedding example, run the following Gradle task:

Expand Down
16 changes: 8 additions & 8 deletions docs/splitting.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,21 @@ title: Splitting Examples
nav_order: 4
---

## Table of contents
{: .no_toc .text-delta }

- TOC
{:toc}

## Splitting documents with langchain4j

A RAG approach typically benefits from sending multiple smaller segments or "chunks" of text to an LLM. While MarkLogic
can efficiently ingest and index large documents, sending all the text in even a single document may either exceed
the number of tokens allowed by your LLM or may result in slower and more expensive responses from the LLM. Thus,
when importing or reprocessing documents in MarkLogic, your RAG approach may benefit from splitting the searchable
text in a document into smaller segments or "chunks" that allow for much smaller and more relevent segments of text
to be sent to the LLM.

## Table of contents
{: .no_toc .text-delta }

- TOC
{:toc}

## Overview

This project demonstrates two different approaches to splitting documents:

1. Splitting the text in a document and storing each chunk in a new separate document.
Expand Down

0 comments on commit a81d414

Please sign in to comment.