Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC-287 Mountable TS topics #725

Open
wants to merge 23 commits into
base: v-WIP/24.3
Choose a base branch
from
Open

DOC-287 Mountable TS topics #725

wants to merge 23 commits into from

Conversation

kbatuigas
Copy link
Contributor

@kbatuigas kbatuigas commented Aug 27, 2024

Description

Resolves https://github.com/redpanda-data/documentation-private/issues/2504
Related: Migrations API reference

Review deadline: 9 Oct.

Page previews

24.3 beta: Mountable Topics

Checks

  • New feature
  • Content gap
  • Support Follow-up
  • Small fix (typos, links, copyedits, etc)

Copy link

netlify bot commented Aug 27, 2024

Deploy Preview for redpanda-docs-preview ready!

Name Link
🔨 Latest commit 835221f
🔍 Latest deploy log https://app.netlify.com/sites/redpanda-docs-preview/deploys/6712a783fddfc20008045831
😎 Deploy Preview https://deploy-preview-725--redpanda-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.


. Enable xref:manage:tiered-storage.adoc[Tiered Storage] for specific topics, or for the entire cluster (all topics).
. xref:get-started:rpk-install.adoc[Install `rpk`], or ensure that you have access to the Admin API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any limitations that users should know about? For example, number of topics you can include in a migration? Amount of time that a topic can "hibernate" in object storage until it is mounted (although maybe that's more to do with their object storage configuration)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of any limitations like this. I've successfully ran it with as many topics and partitions as I had no problems to create.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bashtanov I saw in the rpk PR that a topic must have at least 3 partitions for it to be mounted from TS to a cluster, does that apply anywhere in this doc as well?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never enforced or even heard about this restriction

Copy link
Contributor

@gene-redpanda gene-redpanda Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that in the RFC for migrations, no idea if it is actually true as the testing at the time I wrote that was against a version of migrations that wasn't actually working.

- `cut_over`
- `finished`

== Monitor progress
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When or why might users encounter errors? Can and should they retry?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think of the following situations:

  • attempting to mount a topic that does not exist in the cloud storage
  • attempting to mount a topic that is already mounted to this or another cluster
  • any failures, such as tiered storage availability problems or multiple redpanda nodes going down.

All operations are retried indefinitely, so it's really unlikely that cancelling and restarting a migration would help. If there is any underlying problem fixing it should help without restarting.

@kbatuigas kbatuigas changed the title [draft] Mountable TS topics [do not merge] Mountable TS topics Aug 30, 2024
@kbatuigas kbatuigas mentioned this pull request Oct 5, 2024
4 tasks
@kbatuigas kbatuigas changed the base branch from main to v-WIP/24.3 October 5, 2024 17:22
@kbatuigas kbatuigas changed the title [do not merge] Mountable TS topics Mountable TS topics Oct 7, 2024
@kbatuigas kbatuigas marked this pull request as ready for review October 7, 2024 17:21
@kbatuigas kbatuigas requested a review from a team as a code owner October 7, 2024 17:21
}
```

You may optionally include the topic namespace (`ns`). Default value: `kafka`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also it is the only one supported so far

Copy link
Contributor

@asimms41 asimms41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Just left one question but happy to approve.


|===

It is not currently possible to unmount a topic whose name matches multiple topics in the origin cluster.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattschumpert I removed the "Troubleshoot" heading and whittled it down to just these few scenarios. From chatting with @bashtanov it sounds like there is additional work to do to enable the user to specify which topic or "incarnation" they want if they try to mount a topic with multiple matches on the name. And also some work to clarify log messages when running into issues. If there are other scenarios that have to be described here please let me know and we can get that into the next iteration of this doc.

@Deflaimun
Copy link
Contributor

Should we ELI5 what exactly is "topic mount vs. topic unmount"? Or do we assume that our users are already familiar with the subject?

Something like:
"In Redpanda mountable topics are topics that [contains these properties]. You can make a topic mountable by doing X. To unmount a topic do Y.
(Are redpanda topics mountable by default? When they become mountable?)
"


== Unmount a topic from a cluster to object storage

When you unmount a topic, all incoming writes to the topic are blocked as Redpanda unmounts the topic from the cluster to object storage. Producers and consumers of the topic receive an error message indicating that the topic is no longer available. The unmounted topic is deleted in the source cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The unmounted topic is deleted in the source cluster."
When does that happen? After the command is successfully ran? Is the command async? If yes, can I track its progress? Is it possible that the command halts mid operation? If yes, what do I do recover?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I believe we should add more emphasis to this part as it's super important. Suggest bold, but INFO admonition could also work

--
======

You cannot cancel mount and unmount operations in the following <<monitor-progress,states>>:
Copy link
Contributor

@Deflaimun Deflaimun Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I check those states?
Monitor should be before cancel, no? We talk about the states before introducing them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 142 below--isn't that what you are asking about here @Deflaimun ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Deflaimun I think in a previous commit the monitor section was actually before cancel. I'll change it back.

| State | Unmount operation (outbound) | Mount operation (inbound)

| `planned`
2+| Redpanda validates the operation definition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the preview, this presentation looks weird. Looks like it only applies to the first header. See screenshot. As a workaround, consider making this text longer.
We should look into fixing it from the UI side, by centering the row in these cases
image

@@ -1,8 +1,8 @@
For topics with Tiered Storage enabled, you can mount and unmount topics to transfer the topic data between your cluster and object storage. This allows you to free up and reclaim unused partition space, or migrate a topic to a different cluster and hibernate or decommission the topic or even the entire cluster.
For topics with Tiered Storage enabled, you can unmount a topic to detach segment data that is still on disk to object storage, and unmount a topic from object storage to attach the topic data to either the same origin cluster, or a different one. This allows you to hibernate a topic and free up and reclaim system resources taken up by the topic, or migrate a topic to a different cluster.
Copy link
Contributor

@Deflaimun Deflaimun Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff made the page more tech accurate but also harder to understand. Consider simplifying. Also see #725 (comment) . Is unmount similar to just uploading the topic+metadata to Object Storage?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems odd that it is called Mountable, but most of the content here is about unmounting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

free up/reclaim system resources are synonymous. Keep one or the other, but not both

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Feediver1 That's my mistake, I'll change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Deflaimun (@mattschumpert to correct me if this is wrong) It's detaching the topic. So most of topic data would have already been uploaded to Tiered Storage (based on how quickly users have configured segment data to be moved to the cloud). Unmounting takes what is still on segment in the disk, moves it to the cloud (and stops reads and writes) so that it's all ready to "attach" again to a cluster.

@@ -0,0 +1,211 @@
For topics with Tiered Storage enabled, you can unmount a topic to detach segment data that is still on disk to object storage, and unmount a topic from object storage to attach the topic data to either the same origin cluster, or a different one. This allows you to hibernate a topic and free up and reclaim system resources taken up by the topic, or migrate a topic to a different cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For topics with Tiered Storage enabled, you can unmount a topic to detach segment data that is still on disk to object storage, and unmount a topic from object storage to attach the topic data to either the same origin cluster, or a different one. This allows you to hibernate a topic and free up and reclaim system resources taken up by the topic, or migrate a topic to a different cluster.
For topics with Tiered Storage enabled, you can unmount a topic to detach segment data that is still on disk to object storage, and unmount a topic from object storage to attach the topic data to either the same origin cluster, or a different one. This allows you to hibernate a topic and free up or reclaim system resources taken up by the topic, or migrate a topic to a different cluster.


Redpanda also transfers topic definitions when mounting or unmounting a topic.
Redpanda also transfers topic manifests when mounting or unmounting a topic, so the topic can quickly accept reads and writes again and you can resume cluster workloads with ease.
Copy link
Contributor

@Deflaimun Deflaimun Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait. If the topic is deleted from the source cluster after unmounting, how does it accept read and writes again?
Maybe this needs to be expanded.

Suggested change
Redpanda also transfers topic manifests when mounting or unmounting a topic, so the topic can quickly accept reads and writes again and you can resume cluster workloads with ease.
Redpanda also transfers topic manifests when mounting or unmounting a topic, making possible to quickly resume operations after re-mounting.

(or something to that effect)

| The topic data in object storage is no longer available to mount to any clusters.

| `finished`
| The operation is complete and then deleted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| The operation is complete and then deleted.
| The operation is complete and deleted.

Copy link
Contributor

@Deflaimun Deflaimun Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is deleted? the operation itself or the topic?
if topic, consider this suggestion.

Suggested change
| The operation is complete and then deleted.
| The operation is complete and topic is deleted from the source cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Deflaimun The underlying migration is deleted. I think the topic itself would have already been deleted in the prior state. @bashtanov can you confirm?

Copy link
Contributor

@Deflaimun Deflaimun Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operation being deleted seems weird to me. If deleted then I can't query about it anymore, thus the state will never be accessible. What if I want to audit the operation? Can the operation be offloaded to a log or something? @bashtanov


|===

It is not currently possible to unmount a topic whose name matches multiple topics in the origin cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It is not currently possible to unmount a topic whose name matches multiple topics in the origin cluster.
It is not possible to unmount a topic whose name matches multiple topics in the origin cluster.

@@ -11,7 +9,11 @@ An unmounted topic in object storage is detached from all clusters. The original

== Unmount a topic from a cluster to object storage

When you unmount a topic, all incoming writes to the topic are blocked as Redpanda unmounts the topic from the cluster to object storage. Producers and consumers of the topic receive an error message indicating that the topic is no longer available. The unmounted topic is deleted in the source cluster.
When you unmount a topic, all incoming writes to the topic are blocked as Redpanda unmounts the topic from the cluster to object storage. Producers and consumers of the topic receive an error message indicating that the topic is no longer available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the expected error message? How do I differentiate the error between an unmounted topic and one that never existed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bashtanov just to check if you have those expected messages available.

Copy link

@bashtanov bashtanov Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is failed to download manifest for topic and it is logged with warning severity

@@ -0,0 +1,213 @@
For topics with Tiered Storage enabled, you can unmount a topic to detach segment data that is still on disk to object storage, and mount that topic to either the same origin cluster, or a different one. This allows you to hibernate a topic and free up system resources taken up by the topic, or migrate a topic to a different cluster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'detach segment data' is implementation detail.

Something like:

'you can unmount a topic to safely detach it from a cluster while keeping the topic's data in the cluster's cloud storage bucket/container'

Copy link

@mattschumpert mattschumpert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still some unaddressed comments

@@ -0,0 +1,213 @@
For topics with Tiered Storage enabled, you can unmount a topic to detach segment data that is still on disk to object storage, and mount that topic to either the same origin cluster, or a different one. This allows you to hibernate a topic and free up system resources taken up by the topic, or migrate a topic to a different cluster.

Redpanda also transfers topic manifests when mounting or unmounting a topic, making it possible to quickly resume operations after mounting to the destination cluster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that this is a useful statement to a user @kbatuigas . Nobody knows what topic manifests are and how they affect the user experience. No one ever interacts with a manifest directly. To them they are just mounting and unmounting a topic. Am I missing something @nvartolomei ?

Also, transferring is misleading. for TS we already have manifests in the bucket. This is a 'detach/reattach' operation.

I think we can just remove this statement.


== Additional considerations

It is not possible to unmount a topic whose name matches multiple topics in the origin cluster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the point of this. There is no possibility for this ever to happen whatsoever. By definition the origin cluster cannot have duplicate topic names in the first place so there is no potential for this to be a problem. cc @nvartolomei

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattschumpert That's my mistake, Matt. I meant to describe a scenario that I believe will be handled by this change so I'll go ahead and remove this line.

"source_topic": {"ns": "kafka", "topic": "<source-topic-2-name>"}
},
{
"source_topic": {"ns": "kafka", "topic": "source-topic-3-name"},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

angle brackets missing?

Copy link

@bashtanov bashtanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only minor suggestions

@@ -7,7 +7,7 @@ For topics with Tiered Storage enabled, you can unmount a topic to safely detach

== Unmount a topic from a cluster to object storage

When you unmount a topic, all incoming writes to the topic are blocked as Redpanda unmounts the topic from the cluster to object storage. Producers and consumers of the topic receive an error message indicating that the topic is no longer available.
When you unmount a topic, all incoming writes to the topic are blocked as Redpanda unmounts the topic from the cluster to object storage. Producers and consumers of the topic receive a warning `Failed to download manifest for topic` indicating that the topic is no longer available.
Copy link

@bashtanov bashtanov Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change to lower case f in failed please? I would imagine them using grep, which is case-sensitive by default, to search for the line.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And change "indicating that the topic is no longer available" to "indicating that either the topic is unavailable or there are multiple topics under the specified name" or something like this. It will be clear from the rest of the message which one is the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bashtanov yes, thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bashtanov actually, would "multiple topics under the specified name" ever apply in the case of unmount?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I messed up everything. This error -- failed to download manifest for topic -- is for mounting when the topic is not available or cannot be unambiguously defined. It will be in logs.

As for producing into a topic that is about to be unmounted, it is invalid_topic_exception or resource_is_being_migrated they will be getting. When fetching from a not-yet-ready topic it'll be invalid_topic_exception as well. These will be in the protocol replies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants