Merge pull request #19 from marklogic/feature/docs-fixes

Various doc fixes
marklogic · Sep 24, 2024 · 5a729a4 · 5a729a4
2 parents fcadf30 + 6a5142b
commit 5a729a4
Show file tree

Hide file tree

Showing 6 changed files with 30 additions and 29 deletions.
diff --git a/docs/embedding.md b/docs/embedding.md
@@ -4,8 +4,8 @@ title: Embedding Examples
 nav_order: 5
 ---
 
-The vector queries shown in the [langchain](../rag-langchain-python/README.md),
-[langchain4j](../rag-langchain-java), and [langchain.js](../rag-langchain-js/README.md) RAG examples
+The vector queries shown in the [LangChain](rag-examples/rag-python.md),
+[langchain4j](rag-examples/rag-java.md), and [LangChain.js](rag-examples/rag-javascript.md) RAG examples
 depend on embeddings - vector representations of text - being added to documents in MarkLogic. Vector queries can
 then be implemented using [the new vector functions](https://docs.marklogic.com/12.0/js/vec) in MarkLogic 12.
 This project demonstrates the use of a
@@ -21,9 +21,9 @@ documents in MarkLogic.
 
 ## Setup
 
-This example depends both on the [main setup for all examples](../setup/README.md) and also on having run the
+This example depends both on the [main setup for all examples](setup.md) and also on having run the
 "Split to multiple documents" example program in the
-[document splitting examples](../splitting-langchain-java/README.md). That example program used langchain4j to split
+[document splitting examples](splitting.md). That example program used langchain4j to split
 the text in Enron email documents and write each chunk of text to a separate document. This example will then use
 langchain4j to generate an embedding for the chunk of text and add it to each chunk document.
 

diff --git a/docs/index.md b/docs/index.md
@@ -15,27 +15,27 @@ execute these examples as-is, you will need an Azure OpenAI account and API key.
 
 ## Setup
 
-If you would like to try out the example programs, please [follow these instructions](setup/README.md).
+If you would like to try out the example programs, please [follow these instructions](setup.md).
 
 ## RAG Examples
 
 MarkLogic excels at supporting RAG, or ["Retrieval-Augmented Generation"](https://python.langchain.com/docs/tutorials/rag/),
 via its schema-agnostic nature as well as it's powerful and flexible indexing. This repository contains the following
 examples of RAG with MarkLogic:
 
-- The [rag-langchain-python](rag-langchain-python/README.md) project demonstrates RAG with Python, langchain, and MarkLogic.
-- The [rag-langchain-java](rag-langchain-java/README.md) project demonstrates RAG with Java, langchain4j, and MarkLogic.
-- The [rag-langchain-js](rag-langchain-js/README.md) project demonstrates RAG with JavaScript, langchain.js, and MarkLogic.
+- The [LangChain](rag-examples/rag-python.md) project demonstrates RAG with Python, LangChain, and MarkLogic.
+- The [langchain4j](rag-examples/rag-java.md) project demonstrates RAG with Java, langchain4j, and MarkLogic.
+- The [LangChain.js](rag-examples/rag-javascript.md) project demonstrates RAG with JavaScript, LangChain.js, and MarkLogic.
 
 ## Splitting / Chunking Examples
 
 A RAG approach typically benefits from sending multiple smaller segments or "chunks" of text to an LLM. Please
-see [this guide on splitting documents](splitting-langchain-java/README.md) for more information on how to split
+see [this guide on splitting documents](splitting.md) for more information on how to split
 your documents and why you may wish to do so.
 
 ## Embedding examples
 
 To utilize the vector queries shown in the RAG Examples listed above, embeddings - vector representations of text -
 should be added to your documents in MarkLogic.
-See [this guide on adding embeddings](embedding-langchain-java/README.md) for more information. 
+See [this guide on adding embeddings](embedding.md) for more information. 
 
diff --git a/docs/rag-examples/rag-java.md b/docs/rag-examples/rag-java.md
@@ -28,7 +28,7 @@ A key feature of MarkLogic is its ability to index all text in a document during
 with MarkLogic is to select documents based on the words in a user's question.
 
 To demonstrate this, you can run the Gradle `askWordQuery` task with any question. This example program uses a custom
-langchain retriever that selects documents in the `ai-examples-content` MarkLogic database containing one or more words
+langchain4j retriever that selects documents in the `ai-examples-content` MarkLogic database containing one or more words
 in the given question. It then includes the top 10 most relevant documents in the request that it sends to Azure OpenAI.
 For example:
 
@@ -46,8 +46,8 @@ of the configured deployment model):
 
 You can alter the value of the `-Pquestion=` parameter to be any question you wish.
 
-Note as well that if you have tried the [Python langchain examples](../rag-langchain-python/README.md), you will notice
-some differences in the results. These differences are primarily due to the different prompts used by langchain and
+Note as well that if you have tried the [Python LangChain examples](rag-python.md), you will notice
+some differences in the results. These differences are primarily due to the different prompts used by LangChain and
 langchain4j. See [the langchain4j documentation](https://docs.langchain4j.dev/intro) for more information on prompt
 templates when using langchain4j.
 
@@ -99,7 +99,7 @@ the following process:
 
 To try RAG with a vector query, you will need to have installed MarkLogic 12 and also have defined
 `AZURE_EMBEDDING_DEPLOYMENT_NAME` in your `.env` file. Please see the
-[top-level README in this repository](../README.md) for more information.
+[setup guide](../setup.md) for more information.
 
 You can now run the Gradle `vectorQueryExample` task:
 
@@ -117,11 +117,11 @@ An example result is shown below:
 The results are similar but slightly different to the results shown above for a simple word query. You can compare
 the document URIs printed by each program to see that a different set of document is selected by each approach.
 
-For an example of how to add embeddings to your data, please see [this embeddings example](../embedding-langchain-java/README.md).
+For an example of how to add embeddings to your data, please see [this embeddings example](../embedding.md).
 
 ## Summary
 
 The three RAG approaches shown above - a simple word query, a contextual query, and a vector query - demonstrate how
-easily data can be queried and retrieved from MarkLogic using langchain. Identifying the optimal approach for your own
+easily data can be queried and retrieved from MarkLogic using langchain4j. Identifying the optimal approach for your own
 data will require testing the approaches you choose and possibly leveraging additional MarkLogic indexes and/or
 further enriching your data. 
diff --git a/docs/rag-examples/rag-javascript.md b/docs/rag-examples/rag-javascript.md
@@ -23,8 +23,9 @@ Minimum versions of npm are dependent on the version of Node.
 See [Node Releases](https://nodejs.org/en/about/previous-releases#looking-for-latest-release-of-a-version-branch)
 for more information.
 
-For this LangChain.js example, in addition to the environment variables in the `.env` file described in the README in the
-root directory of this project, you'll also need to add the `AZURE_OPENAI_API_INSTANCE_NAME` setting to the `.env` file.
+For this LangChain.js example, in addition to the environment variables in the `.env` file described in the 
+[setup guide](../setup.md), you'll also need to add the `AZURE_OPENAI_API_INSTANCE_NAME` setting to the `.env` file.
+
 ```
 OPENAI_API_VERSION=2023-12-01-preview
 AZURE_OPENAI_ENDPOINT=<Your Azure OpenAI endpoint>
@@ -69,14 +70,14 @@ documents are first selected in a manner similar to the approaches shown above -
 set of indexes that have long been available in MarkLogic. The documents are then further filtered and sorted via
 the following process:
 
-1. An embedding of the user's question is generated using [langchain and Azure OpenAI](https://python.langchain.com/docs/integrations/text_embedding/).
+1. An embedding of the user's question is generated using [LangChain.js and Azure OpenAI](https://python.langchain.com/docs/integrations/text_embedding/).
 2. Using MarkLogic's new vector API, the generated embedding is compared against the embeddings in each
    selected crime event document to generate a similarity score for each document.
 3. The documents with the highest similarity scores are sent to the LLM to augment the user's question.
 
 To try the `askVectorQuery.js` module, you will need to have installed MarkLogic 12 and also have defined
 `AZURE_EMBEDDING_DEPLOYMENT_NAME` in your `.env` file. Please see the
-[top-level README in this repository](../README.md) for more information.
+[setup guide](../setup.md) for more information.
 
 You can now run `askVectorQuery.js`:
 ```
@@ -97,6 +98,6 @@ the document URIs printed by each program to see that a different set of documen
 ## Summary
 
 The three RAG approaches shown above - a simple word query, a contextual query, and a vector query - demonstrate how
-easily data can be queried and retrieved from MarkLogic using langchain. Identifying the optimal approach for your own
+easily data can be queried and retrieved from MarkLogic using LangChain.js. Identifying the optimal approach for your own
 data will require testing the approaches you choose and possibly leveraging additional MarkLogic indexes and/or
 further enriching your data.
diff --git a/docs/rag-examples/rag-python.md b/docs/rag-examples/rag-python.md
@@ -6,7 +6,7 @@ nav_order: 1
 ---
 
 [Retrieval Augmented Generation (RAG)](https://python.langchain.com/docs/tutorials/rag/) can be implemented in Python 
-with [langchain](https://python.langchain.com/docs/introduction/) and MarkLogic via a "retriever". The examples in this
+with [LangChain](https://python.langchain.com/docs/introduction/) and MarkLogic via a "retriever". The examples in this
 directory demonstrate three different kinds of retrievers that you can consider for your own AI application.
 
 ## Table of contents
@@ -28,7 +28,7 @@ python -m venv .venv
 source .venv/bin/activate
 ```
 
-Once you have a virtual environment created, run the following to install the necessary langchain dependencies along
+Once you have a virtual environment created, run the following to install the necessary LangChain dependencies along
 with the [MarkLogic Python client](https://pypi.org/project/marklogic-python-client/):
 
     pip install --quiet --upgrade langchain langchain-community langchain_openai marklogic_python_client
@@ -40,7 +40,7 @@ You are now ready to execute the example RAG programs.
 A key feature of MarkLogic is its ability to index all text in a document during ingest. Thus, a simple approach to RAG
 with MarkLogic is to select documents based on the words in a user's question.
 
-To demonstrate this, you can run the `ask_word_query.py` module with any question. The module uses a custom langchain
+To demonstrate this, you can run the `ask_word_query.py` module with any question. The module uses a custom LangChain
 retriever that selects documents in the `ai-examples-content` MarkLogic database containing one or more of the words
 in the given question. It then includes the top 10 most relevant documents in the request that it sends to Azure OpenAI.
 For example:
@@ -85,14 +85,14 @@ documents are first selected in a manner similar to the approaches shown above -
 set of indexes that have long been available in MarkLogic. The documents are then further filtered and sorted via
 the following process:
 
-1. An embedding of the user's question is generated using [langchain and Azure OpenAI](https://python.langchain.com/docs/integrations/text_embedding/).
+1. An embedding of the user's question is generated using [LangChain and Azure OpenAI](https://python.langchain.com/docs/integrations/text_embedding/).
 2. Using MarkLogic's new vector API, the generated embedding is compared against the embeddings in each
    selected crime event document to generate a similarity score for each document.
 3. The documents with the highest similarity scores are sent to the LLM to augment the user's question.
 
 To try the `ask_vector_query.py` module, you will need to have installed MarkLogic 12 and also have defined
 `AZURE_EMBEDDING_DEPLOYMENT_NAME` in your `.env` file. Please see the
-[top-level README in this repository](../README.md) for more information.
+[setup guide](../setup.md) for more information.
 
 You can now run `ask_vector_query.py`:
 
@@ -107,11 +107,11 @@ An example result is shown below:
 The results are similar but slightly different to the results shown above for a simple word query. You can compare
 the document URIs printed by each program to see that a different set of document is selected by each approach.
 
-For an example of how to add embeddings to your data, please see [this embeddings example](../embedding-langchain-java/README.md).
+For an example of how to add embeddings to your data, please see [this embeddings example](../embedding.md).
 
 ## Summary
 
 The three RAG approaches shown above - a simple word query, a contextual query, and a vector query - demonstrate how
-easily data can be queried and retrieved from MarkLogic using langchain. Identifying the optimal approach for your own
+easily data can be queried and retrieved from MarkLogic using LangChain. Identifying the optimal approach for your own
 data will require testing the approaches you choose and possibly leveraging additional MarkLogic indexes and/or
 further enriching your data. 
diff --git a/docs/splitting.md b/docs/splitting.md
@@ -30,7 +30,7 @@ to show how easily you can split and store chunks of text and thus get you start
 
 ## Setup
 
-Assuming you have followed the [setup instructions for these examples](../setup/README.md), then you already have a
+Assuming you have followed the [setup instructions for these examples](setup.md), then you already have a
 database in your MarkLogic cluster named `ai-examples-content`. This database contains a small set - specifically,
 3,034 text documents - of the
 [Enron email dataset](https://www.loc.gov/item/2018487913/) in a collection named `enron`. These documents are good