Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: docs/design: Topics #4496

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

WIP: docs/design: Topics #4496

wants to merge 1 commit into from

Conversation

PhilipMetzger
Copy link
Collaborator

@PhilipMetzger PhilipMetzger commented Sep 17, 2024

Continues the conversation started in Discord and hopefully concludes it with a final design.

Rendered

docs/design/topics.md Outdated Show resolved Hide resolved
Continues the conversation started in Discord and hopefully concludes it with a final
design.

##### Obsoletion of `[experimental.auto-advance-bookmarks]`

Since Topics are infectious by nature, they can perfectly map to a Git branch.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would a topic with multiple heads map to a single Git branch (or maybe multiple branches)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is unclear to me as well. I'm guessing what it's supposed to mean is that topics "perfectly map to" what people expect from Git branches, rather than that they'd actually be implemented as a 1:1 equivalent for Git branches. But since Git branches don't actually work the way Topics are described (i.e. they are not merely "infectious" but actually implicitly apply to all ancestor commits, and they cannot have multiple heads), they don't actually "map perfectly" to Git branches as understood by people who deeply understand the Git model (which is probably the minority of Git users, but certainly a nontrivial contingent).

Certainly this sentence cannot mean that topics are always represented in the git storage backend as branches, since that would be incompatible with both of the differences cited above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would a topic with multiple heads map to a single Git branch (or maybe multiple branches)?

Each separate head will get its own Git branch. There's also the variant of connecting them, but I do not think that is desirable, as the different heads should represent different kind of logical work.

This sentence is unclear to me as well. I'm guessing what it's supposed to mean is that topics "perfectly map to" what people expect from Git branches, rather than that they'd actually be implemented as a 1:1 equivalent for Git branches. But since Git branches don't actually work the way Topics are described (i.e. they are not merely "infectious" but actually implicitly apply to all ancestor commits, and they cannot have multiple heads), they don't actually "map perfectly" to Git branches as understood by people who deeply understand the Git model (which is probably the minority of Git users, but certainly a nontrivial contingent).

That is correct. But sharing the conceptual behavior never meant that we as Jujutsu VCS need to represent them the same internally.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushing the multiple heads of a topic as foo-1, foo-2, etc., is surprising to me. I was expecting topics to work roughly like this:

  • A commit can manually be labeled a part of topic "foo".
  • A new commit created on top of that commit will automatically be labeled "foo".
  • If heads(topic(foo)) has a single revision, the topic may be pushed as a git branch named "foo" with a command like jj git push -t foo
  • If heads(topic(foo)) has more than one revision, pushing the topic fails with an error (similar to hg refusing to create multiple remote heads)

I'm not sure why I would ever want to push a topic and get branches foo-1, foo-2, etc. The only reason might be to create a remote backup of my work, but I don't expect topics to fix this problem (it's something to solve for any anonymous head).

I think forges like Gerrit would support pushing multiple heads in the same topic, but I would expect jj gerrit to handle this.

I also don't know how creating multiple branches for a topic works as the graph changes over time. Suppose I had commits B and C in a single topic, both descending from A, and I pushed, creating branches foo-1 and foo-2. Then I create a merge commit D also in the topic. Does pushing delete foo-1 and foo-2, and replace them with just foo?

D
|\
B |  foo-1@origin
| C  foo-2@origin
|/
A


#### Git Interop

For all continuous Topics we can just simply export them as Git Branches. For
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guessing this is a typo, especially since it's contrasted with "non-contiguous":

Suggested change
For all continuous Topics we can just simply export them as Git Branches. For
For all contiguous Topics we can just simply export them as Git Branches. For


For all continuous Topics we can just simply export them as Git Branches. For
Topics which contain non-contiguous parts, we should allow exporting them
one-by-one or with a generated name, such as `<topic-name>-1` for each part.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would "one-by-one" mean, and why is it distinct from having a generated name? I'm guessing it means that the CLI would let the user decide to export just one contiguous "section" of a topic without exporting the others, but [a] what would the actual benefit to the user be, and [b] wouldn't there still need to be an auto-generated name in order to prevent future conflict when exporting another "section" of the same topic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would "one-by-one" mean, and why is it distinct from having a generated name?

The distinction is mostly manual.

I'm guessing it means that the CLI would let the user decide to export just one contiguous "section" of a topic without exporting the others, but [a] what would the actual benefit to the user be, and [b] wouldn't there still need to be an auto-generated name in order to prevent future conflict when exporting another "section" of the same topic?

On point A, we give the user the choice of when they want to export another slice of a topic and on B the previous slice's branch may have already been deleted, so we'd be free to choose the name again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is something to handle whenever the topic has multiple heads, not just if the topic is disconnected. For example, the following topography is connected but can't be exported as one git branch:

B    topic=foo
|
| C  topic=foo
|/
A    topic=foo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be export restrictions on Topics, as we cannot represent this as a Git Branch. This will hit the auto-generated name I mentioned previously.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, we currently skip export of bookmarks when they're conflicted, so it wouldn't be that different to skip exporting topics when they have multiple heads.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, we currently skip export of bookmarks when they're conflicted, so it wouldn't be that different to skip exporting topics when they have multiple heads.

Yes, that would also would be a variant of dealing with the issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this section now disagrees with the backend-implementation section below:

For the Git backend, we could either embed them in the message, like Arcanist or Gerrit do or store them as Git Notes, if necessary.

Or am I misunderstanding, and topics will be exported to Git as notes and branches?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda like the idea of exporting them as both. Notes will avoid modifying the commit hashes while supporting faithfully exporting discontiguous topics. We could perhaps use notes as the source of truth for syncing and treat branches as purely for compatibiltiy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Git backend, we could either embed them in the message, like Arcanist or Gerrit do or store them as Git Notes, if necessary.

Or am I misunderstanding, and topics will be exported to Git as notes and branches?

These are options beside exporting them as branches, which is the desired implementation for Git interop. We should enumerate all options to fully consider the whole design space.

We could perhaps use notes as the source of truth for syncing and treat branches as purely for compatibiltiy?

That's a good idea. I just don't know what that encompasses.


### Detailed Design

#### Git Interop
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a part of this that seems to be missing: where does a topic "end"? I'm guessing that you would export one git branch for each head commit in the topic, rather than one git branch for every commit in the topic (since that wouldn't really have much advantage compared to jj git push -c). But git doesn't have a concept of where a branch "starts" except relative to other branches, and then only in the context of commands (such as rebase) that start by finding common ancestors, which implicitly "prune" the branch. So for the "change-archival" use-case, what commits would be included in the archive topic?

I'm guessing the answer is something like: topic(x) := ::heads(topic) ~ ::trunk(), i.e. topic x contains the heads plus all ancestors minus everything on the trunk "branch". I think this is consistent with the property of hg topics that they "disappear" as soon as they are merged (though I haven't used hg and am not totally sure I followed the linked topic documentation). But whatever the rule is, it should be explicit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a part of this that seems to be missing: where does a topic "end"? I'm guessing that you would export one git branch for each head commit in the topic, rather than one git branch for every commit in the topic (since that wouldn't really have much advantage compared to jj git push -c).

Yes, this is also correct.

So for the "change-archival" use-case, what commits would be included in the archive topic?

This is @avamsi's use-case from #4180. This means that you locally add a description like archive: to your commits and then let jj infer it in someway or occasionally do it manually with a command.

I'm guessing the answer is something like: topic(x) := ::heads(topic) ~ ::trunk(), i.e. topic x contains the heads plus all ancestors minus everything on the trunk "branch".

Yes, that is correct and thanks for the formulation.

But whatever the rule is, it should be explicit.

Yes, I'll write it down.

@PhilipMetzger
Copy link
Collaborator Author

Also cc @necauqua and @noahmayr since they were heavily involved in the first discussion.

//
// This avoids rewriting the Change-ID, but must be implemented.
#[ContentHash(ignore)]
topics: HashMap<String, String>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do key and value represent here? key = topic_name, value = ??? Isn't it just a set of topic names?

FWIW, I assume topics are commit metadata, hence the commit id changes when topics are updated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do key and value represent here? key = topic_name, value = ??? Isn't it just a set of topic names?

How would a hashset implement a Topic on two different remotes then? It may also be worth it to just rename it to metadata.

WIW, I assume topics are commit metadata, hence the commit id changes when topics are updated.

I choose this option because of #3613 (comment), as you should be able to rewrite those lightweight tags easily. If we can change the underlying commit-id w/o rewriting the Change-ID it should be unnecessary.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, are you saying that topics will not be stored in the commit metadata? I think that needs to be clarified then. It wasn't clear at all to me. (The doc says that topics will be stored in a metadata map, but it doesn't say where that map is stored.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, are you saying that topics will not be stored in the commit metadata?

Where did I say that?

The doc says that topics will be stored in a metadata map, but it doesn't say where that map is stored.)

On the Commit which makes this dance I think necessary.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did I say that?

I was just asking if you did because it wasn't clear what you meant. You said "I choose this option" without saying which option you meant. It was in reply to Yuya's "I assume topics are commit metadata", which made me think it would be stored in the commit, but then you linked to a comment that said "After some discussion with @PhilipMetzger we came to the conclusion that we can't actually store topics on commits", which made me think that you meant that it should not be stored on commits.

On the Commit which makes this dance I think necessary.

I see. I think that information needs to go in the doc too. Thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a hashset implement a Topic on two different remotes then? It may also be worth it to just rename it to metadata.

Oh, so topics is a map of remote_name: topic_name? I was just wondering why it was a map at all, because I just thought topic is a local thing (and mirrored to Git branch at remote.)

@necauqua
Copy link
Collaborator

necauqua commented Oct 15, 2024

While this works it falls short of having the metadata synced by multiple clients, which is something desirable. The prototype thus also avoids rewriting the Change-ID which is a good thing, but makes them only locally available.

Uh, the part where this was discussed completely escaped me?.
Why is "having the metadata synced by multiple clients" something desirable? I mean it surely would be nice, with an actuall jj remote that's yet to exist, but without that we have to resort to annoying and intrusive things like rewriting commit messages or similar.

I thought we were settled on them being locally available (like the oplog, for example) and for easy editing of topics without actually rewriting commits.

And for sharing them through a git remote, roundtripping through git branches - while only really works nice for single-head topics that branch off the trunk - is plenty imo.

Interoping with some gerrit features would be then the job of the gerrit integration etc.

## Overview

Until now, Jujutsu had no native set of topological branches, just
[Bookmarks][bm] which interact poorly with Git's expectation of branches.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it makes sense to refer to "Git's expectation" separately from the expectations of Git users, but if we do want to call out poor interaction between bookmarks and branches, then I think we should be more specific about where the tooling actually diverges in behavior. Git has a concept of a "current branch" that jj completely lacks, and it auto-advances the current branch, whereas jj doesn't auto-advance bookmarks except using the experimental feature that this design hopes to deprecate.

Suggested change
[Bookmarks][bm] which interact poorly with Git's expectation of branches.
[Bookmarks][bm] which interact poorly with Git's expectation of a "current branch" that is advanced automatically as commits are added.

@scott2000
Copy link
Collaborator

scott2000 commented Oct 16, 2024

I have a few questions which don't seem to be answered yet in the design doc:

  • Will jj bookmarks still be able to track remote git branches (meaning they'll be updated on fetch)?
    • If so, can a bookmark and a topic track the same remote git branch?
  • Will jj bookmarks still be pushed as git branches?
  • Will git branches still be fetched as jj remote bookmarks?
  • Can git branches be fetched as topics even if they were created by someone else?
    • If so, how does jj know what the base branch is?
      • Is there a command to set the base branch for fetching a topic?
      • If it defaults to trunk(), is it impossible to fetch trunk() (or a branch that's an ancestor of trunk()) as a topic since there's zero commits in that range?
      • How does this work for projects with multiple long-lived branches?
  • Is there such a thing as a "remote topic"?
    • If so, what determines whether a git branch is fetched as a remote bookmark or a remote topic?
    • Or are there only remote bookmarks still, and there's an explicit command to start tracking a remote bookmark as a local topic?
  • Can a topic and a bookmark have the same name?
    • If so, which takes precedence? Do they have different syntax?
  • Can a topic be conflicted?
    • If so, how are topic conflicts resolved?
  • If there's a series of commits trunk(), A, B, C and I set a topic as B, C and push it, I assume the remote git branch will be pointing to C. What happens if a coworker rebases the branch, creating the new series of commits trunk(), X, A', B', C' with the remote git branch pointing to C' now? When I fetch, does my topic get expanded to X, A', B', C', or does it become B', C' somehow? Or does it become conflicted?

@PhilipMetzger
Copy link
Collaborator Author

I'm going to create threads for top-level questions, since GitHub's UI for them isn't great and I expect that this will receive more than a handful of comments. Since this is isn't nearly done yet.

Comment on lines +142 to +144
While this works it falls short of having the metadata synced by multiple
clients, which is something desirable. The prototype thus also avoids rewriting
the Change-ID which is a good thing, but makes them only locally available.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@necauqua

Uh, the part where this was discussed completely escaped me?.
Why is "having the metadata synced by multiple clients" something desirable? I mean it surely would be nice, with an actuall jj remote that's yet to exist, but without that we have to resort to annoying and intrusive things like rewriting commit messages or similar.

This Design Doc contains two features which I desire for jj:

  1. Topics (Git Interop, a tagging mechanism)
  2. Commit Metadata (Topics are a part of them)

And we always should design for the native future and something which is compatible with the Google backend.

I only remembered the drawback of keeping them out of band from an earlier Design Doc, but I am sadly unable to find that conversation with Torquestomp.

I thought we were settled on them being locally available (like the oplog, for example) and for easy editing of topics without actually rewriting commits.

Yes, we agreed on that for Topics. But for pure Commit Metadata we never decided on it and it is feasible without to much cost, in my opinion.

And for sharing them through a git remote, roundtripping through git branches - while only really works nice for single-head topics that branch off the trunk - is plenty imo.

Agreed.

Interoping with some gerrit features would be then the job of the gerrit integration etc.

Where did you get that from, I never mentioned something like it? AFAIK, Gerrit Topics are once again some special refs on the Gerrit/Git Server.

Copy link
Collaborator

@necauqua necauqua Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Design Doc contains two features

Well the confusing part is then that the title says "design for topics", while as you just clarified it's actually for this generic maybe-shareable metadata that topics could be an instance of.

I think combining the two into one doc was a mistake - there were a lot of discussions and some sort of vision for the out-of-band topics, but this is first time I'm hearing about the other thing and the doc doesn't seem like everything was discussed and decided on because of it, at least at the moment.

Where did you get that from

Honestly don't remember 😂
Someone somewhere at some point mentioned somethings like that idk

Copy link
Collaborator

@necauqua necauqua Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed through the doc again - you mostly talk about topics specifically in it and a high-level results of the discussions that happened - but then starting on the storage paragraph you suddently derail into this generic commit metadata protobuf thing and it gets confusing - right after saying they could be exported to git as branches, no less

Where did you get that from

And it was probably from the "Prior work" section that mentioned gerrit topics :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scott2000

I have a few questions which don't seem to be answered yet in the design doc:

  • Will jj bookmarks still be able to track remote git branches (meaning they'll be updated on fetch)?

    • If so, can a bookmark and a topic track the same remote git branch?

The intent of this work is to remove this behavior and take over the necessary parts from Tracking Branches. Maybe some parts will be left intact to have something equivalent to a Git Tag, but I haven't thought deeply about it.

  • Will jj bookmarks still be pushed as git branches?

No, see the first answer.

  • Will git branches still be fetched as jj remote bookmarks?

Ditto.

  • Can git branches be fetched as topics even if they were created by someone else?

Yes, otherwise this feature would not help with Git interop.

  • If so, how does jj know what the base branch is?

This will continue to use Git tooling as a implementation, so it will be the merge-base.

Is there a command to set the base branch for fetching a topic?

I don't understand your question, please elaborate.

  • If it defaults to trunk(), is it impossible to fetch trunk() (or a branch that's an ancestor of trunk()) as a topic since there's zero commits in that range?

Thanks for this question, this will need some thought. Since the intended design is to allow free rewriting of commits, they can be preserved by the respective backend/server implementation.

  • How does this work for projects with multiple long-lived branches?

I don't understand your question here, since I have no base of your expectations here. Can you elaborate?

  • Is there such a thing as a "remote topic"?

See my first answer.

  • If so, what determines whether a git branch is fetched as a remote bookmark or a remote topic?

You'll hopefully only be able to fetch a Git Branch as a Topic, so this shouldn't occur.

Or are there only remote bookmarks still, and there's an explicit command to start tracking a remote bookmark as a local topic?

See my first answer for my current plan.

  • Can a topic and a bookmark have the same name?

Maybe? This is to be determined on how they reflect on a backend.

  • If so, which takes precedence? Do they have different syntax?

This behavior is to be determined here.

  • Can a topic be conflicted?

I don't think so, since introducing conflicting metadata seems to be a problem I don't understand.

  • If so, how are topic conflicts resolved?

As above.

  • If there's a series of commits trunk(), A, B, C and I set a topic as B, C and push it, I assume the remote git branch will be pointing to C. What happens if a coworker rebases the branch, creating the new series of commits trunk(), X, A', B', C' with the remote git branch pointing to C' now? When I fetch, does my topic get expanded to X, A', B', C', or does it become B', C' somehow? Or does it become conflicted?

I expect the first behavior I think, but that depends on how we want to integrate it.

Copy link
Collaborator

@scott2000 scott2000 Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a command to set the base branch for fetching a topic?

I don't understand your question, please elaborate.

Since a Git branch is just a pointer to a head commit, I believe it's unclear which commits are logically part of a branch unless you're comparing against some base branch/commit, so for fetching new remote branches from Git, I was just wondering how jj will determine which commits to include in the corresponding topic and whether it's user-configurable. It seems like there's a lot of options, such as:

  1. Topic contains ..branch_head
  2. Topic contains trunk()..branch_head
  3. Add a new configurable revset alias like git_topic_base(), then topic contains git_topic_base()..branch_head
  4. Add a new configurable revset alias like git_topic_revisions(branch_head), then topic just contains git_topic_revisions(branch_head)
  5. Store a base commit as metadata with every remote topic, then topic contains base_commit..branch_head

My concern for options 2 and 3 was that it's possible for the imported topic to be resolved to an empty set (if the branch head is an ancestor of the base branch/commit), which would need to be resolved somehow.

How does this work for projects with multiple long-lived branches?

I don't understand your question here, since I have no base of your expectations here. Can you elaborate?

My concern for option 2 is that some repos have multiple long-lived branches (e.g. branches A and B), and then there are branches off of those branches (e.g. C branches off from A, and D branches off from B). Ideally, it would be nice to be able to import C as containing A..C and D as containing B..D, but for option 2 they would be imported as trunk()..C and trunk()..D respectively. If trunk() is allowed to contain multiple revisions, maybe this wouldn't be an issue though since you could set "trunk()" = "main@origin | A@origin | B@origin".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share the concern about branch pointers not being sufficient to indicate which commits are part of a topic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I've been thinking about it more, and it seems like there could be another problem with the current approach. Currently, trunk() is generally defined in terms of remote bookmarks. If we get rid of remote bookmarks and replace them with remote topics, then trunk() would have to be defined based on a remote topic instead. However, from what I've heard, the plan is to exclude commits from a topic if they are present in trunk(). This leads to a circular definition, where trunk() depends on remote topics, but remote topics depend on trunk().

Maybe it would be better to keep remote bookmarks as-is, but just allow tracking a remote bookmark as a topic with a command like jj topic track <bookmark@remote>. Then trunk() could continue to be defined in terms of remote bookmarks, and there would be no circular dependency. This would also allow a command like jj topic track <bookmark@remote> --base <revset> to specify that the topic should include only commits in <revset>..<bookmark@remote> instead of the default trunk()..<bookmark@remote>. I don't think it would even be necessary to store <revset> anywhere, because for future fetches, the local topic could just be updated to roots(local_topic)-..updated_remote_bookmark.

Remote topics still could be supported, but they just wouldn't be used for the Git backend. It's conceivable that a native backend might want to support both remote bookmarks and remote topics, just like Git has both tags and branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants