Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(proposals): Decouple the trestle CLI from the SDK #68

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

jpower432
Copy link
Member

@jpower432 jpower432 commented Aug 22, 2024

Related to #28

@jpower432
Copy link
Member Author

@butler54 @yana1205 I tagged both of you as reviewers for some early feedback. Most of the proposal is filled out, but some areas may still need refinement.

@yana1205
Copy link

yana1205 commented Sep 2, 2024

Schematically, will the Trestle Layered Architecture (fig.2) be separated into Editing/Authoring/transformation APIs and CLIs and Trestle Base?

image


## Future Milestones (Optional)

- Add a Go SDK to better support the C2P Go version and expand potential for integrations in the cloud native space.
Copy link

@yana1205 yana1205 Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for considering the use case of C2P.

This may have already been discussed, but I'd like to ask what you think about creating a Go binding?

The Go binding consists of OSCAL models (Go struct of Catalog/Profile/CDef/SSP/SAP/SAR/PoAM), and interfaces of OSCAL Base Model and validators. The interfaces are bound to the functions of the Python SDK.

Pros: Maintenance cost
Cons: Install Trestle-SDK and Python (or containerize them)

How to implement:

  • Wrap the functions as CLI or API so that Go binding can invoke it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up @yana1205. This is something I have considered, but it has not been discussed. I found it be more straight forward to call Go from Python instead of the other way around. On top of the cons you listed for the binding solution, complexity may add maintenance costs as well IMO.

As I understand it, we have an initial use case for Go from the C2P perspective (through the use case you mentioned below and through some of re-implementation in the C2P /go directory). I think we could explore the trade-offs in more details in a follow-on proposal.

To confirm, does the outcome of binding vs porting affect your thoughts on this proposal which was just SDK separation?

@degenaro
Copy link
Contributor

degenaro commented Sep 3, 2024 via email

@yana1205
Copy link

yana1205 commented Sep 3, 2024

@degenaro
We have discussed more flexibility on the plugin language (oscal-compass/compliance-to-policy#23). OSCAL models in Go is helpful for the plugin development in Go. Having said that, this is a long term plan. Currently, no one implement plugins in Go.

If wrapping the Python CLI only, I don't see much benefit?

Maybe I am misunderstanding this comment but what I was suggesting is not wrapping the Python CLI but the Python SDK, which is for creating a language binding.

If using the Python SDK, that's pretty much replacing the Python code altogether, no?

Instead of the binding, porting the Python codes into Go is, yes, pretty much effort as re-implementing the Python codes in Go.

@jpower432
Copy link
Member Author

jpower432 commented Sep 3, 2024

Schematically, will the Trestle Layered Architecture (fig.2) be separated into Editing/Authoring/transformation APIs and CLIs and Trestle Base?

Yes. This is a really helpful way to split the graphic! For the transformer perspective, I do think OSCAL to OSCAL transformations may in scope of the SDK. That is a weak opinion though and may be something we implement if requested. @butler54 may have a stonger opinion than I do about this.

@degenaro
Copy link
Contributor

degenaro commented Sep 8, 2024

I added this to the topics for Sept. 10.

I advocate clearly identifying API classes and methods. I advocate the CLI employing only the API classes and methods. I'm not fully understanding the separation of python CLI from python API into separate repos, if that is the proposal here.

https://docs.google.com/document/d/1XTYM7xnWlIqd-8Nn5-qtgvgk8kH3NSmYle5yZvaS7qs/edit

@jpower432
Copy link
Member Author

jpower432 commented Sep 9, 2024

@degenaro Happy to discuss this at the Sept 10 community meeting more.

I think some of what I am proposing could be accomplished by separating the SDK into another published package. Below is some of my reasoning:

  1. Ensure that the boundaries are clean and clear between the CLI and SDK and there is a process for expanding the SDK with reusable logic.
  2. Ensure that the SDK can be installed and versioned separately to support SDK users who need to understand when breaking changes occur and do not need the overhead of the CLI supporting code (EDITED in terms dependency management).

I think there are pros and cons to making a separate repository:

Pros:

  • Increased codebase clarity and ability to navigate to make it easier for repositories other than the CLI to make use of it
  • Can make different release cycles for the SDK and CLI simpler
  • Gives us a place to document SDK specific use cases and examples
  • Makes porting to new languages more straight forward

Cons:

  • Additional overhead for CI/CD maintenance (this could be partially mitigated by centralized release management workflow automation)
  • Can increase challenges around SDK and CLI compatibility

@degenaro
Copy link
Contributor

degenaro commented Sep 10, 2024

I may not be on the same page with respect to SDK proposal. Please shoot down my fallacies.

If I understand what is wanted by this proposal, the diagram above is mis-labeled.Of the two separated boxes, the bottom box should be labeled OSCAL-SDK-Python (not comliance-trestle-sdk-python) and the top box should be labeled compliance-trestle-cli.

By OSCAL-SDK-Python (bottom box), what I mean is that there are 7 OSCAL classes that are created as part of the meta-schema to Python classes process using the get_oscal.py script that employs the datamodel-codegen tool. The resulting Python classes provide the ability to read/write OSCAL models (json) and the ability to perform in-memory CRUD operations for each. This is unexciting and yields little value-add from trestle itself.

The top box is where most of the excitement comprising the trestle value-add exists. Here are located, for example, the tasks (with no real API presently) that perform nice transformations, for example from csv-to-oscal-cd, and the code that allows for "agile authoring" enabling markdown <--> json transformations.

With respect to the "SDK" (or bottom box), why would I as a Golang programmer be wanting to download an OSCAl-SDK-Python, then write an adapter that goes between GoLang and Python? Why not simply eliminate the middle man and use the same generation trick that trestle currently does but generate Golang classes directly for each of the 7 OSCAL classes?

And if I were a Golang programmer and wanted to transform between markdown and json I could not do so using just the "SDK'.

With respect to the "CLI" (or top box), the Golang programmer would have to issue trestle CLI commands, should any trestle value-add functionality be desired.

Two other thoughts.

  1. trestle download size comprising SDK + CLI is < 500Kb. Pretty small.
  2. The model ("SDK") is tightly coupled to the "CLI". Model ("SDK") changes necessitate "CLI" changes. Mixing and matching versions could be a test-case nightmare.

Anyway, I've probably missed something fundamental?

@butler54
Copy link
Contributor

This is a really helpful way to split the graphic! For the transformer perspective, I do think OSCAL to OSCAL transformations may in scope of the SDK. That is a weak opinion though and may be something we implement if requested. @butler54 may have a stonger opinion than I do about this.

I think we should at least include the transformations that are effectively defined as part of the OSCAL standard. The one that comes to mind is profile resolution.

Transformations which are opinionated get a little trickier. The criteria I would give would be:

  1. Transformations must be OSCAL to OSCAL only
  2. Transformations cannot rely on external data.

Operations I could see as part of this:

  • Merge operations
  • Inheritance

So as an example CSV -> Component definition is not part of the SDK. Nor is any transformation associated specifically with a product.

@butler54
Copy link
Contributor

In terms of pulling out the sdk. I think the name is up for grabs based on what is in the community

compliance-trestle-sdk-python allows room to manoeuvre to include extra functionality over time.

In terms of versioning I would image that compliance-trestle will pin a range of the sdk (e.g. compliance-trestle-sdk-python>=1.0,<2

I think this will allow us to move faster on compliance-trestle with the api remaining stable. It will also allow us to release an updated SDK for OSCAL changes faster (and compliance trestle can catch up behind).

@butler54
Copy link
Contributor

In terms of approach to rolling out we could do this with minimal inteference:

Step 1:

  • Reorganise packages in compliance-trestle to quarantine the packages we want into a set of sub-modules (e.g. trestle.oscal)

Step 2.

Step 3.

  • Roll "SDK" into the compliance-trestle package as a breaking change (but only from a dependency perspective).

@butler54
Copy link
Contributor

In terms of a golang sdk. At this point in time I think it would be a huge change to couple python and go code.

I would recommend:

  1. Building a parallel SDK
  2. Defining a pool of common tests 'e.g. what we expect our OSCAL SDKs to do`
  3. Then consider connecting across languages.

My reticence on (3) is specifically our ability to manage multi-language builds. While this exists in python typically it's Python + C. I'm not sure how well it would work with go.

@degenaro
Copy link
Contributor

@butler54 is already onto how while I'm still on why. The proposal (in my own words) is to essentially segregate the 7 classes that comprise Python representations of the OSCAL models into a separate repo, call that an SDK and by doing so gain increased adoption of OSCAL. And this is because the excess baggage of "opinionated" trestle is too much to bear for those who don't want that part. As such I think this a more of a marketing problem. Certainly there is opportunity to more rigorously define an API comprising core and extended functionality. But in the end, there is no disadvantage to keeping all of Python trestle in one repo and I am skeptical that there is a contingent of potential users who would adopt OSCAL if trestle would only do the proposed separation. I've seen no one ask here or on gitter, for example, or have I missed something?

@jpower432 jpower432 changed the title feat(proposals): Decouple the trestle SDK from the CLI feat(proposals): Decouple the trestle CLI from the SDK Sep 11, 2024
@jpower432
Copy link
Member Author

@degenaro Thanks for taking the time to provide some feedback. I am trying to better understand your perspective, but the feedback you have provided so far is too general for me to address. Please see my below responses with clarifications and questions.

The proposal (in my own words) is to essentially segregate the 7 classes that comprise Python representations of the OSCAL models into a separate repo, call that an SDK and by doing so gain increased adoption of OSCAL. And this is because the excess baggage of "opinionated" trestle is too much to bear for those who don't want that part.

I think there is a miscommunication here. The intention is to include more than the base classes and as @butler54 mentioned above, it provides a place where we can include functionality over time. This is covered in the proposal document. There is an initial diagram and some links to existing trestle packages there. Could you please provide more concrete feedback on the design details? If there is something you believe would be a beneficial addition to the SDK, please add a suggestion or comment there.

As such I think this a more of a marketing problem.

Could you please elaborate here on the branding issue and where you see a divergence with the current branding? Trestle is branded as an open source OSCAL SDK. From my perspective, that aligns with with the goals of this proposal.

Certainly there is opportunity to more rigorously define an API comprising core and extended functionality. But in the end, there is no disadvantage to keeping all of Python trestle in one repo and

I have elaborated on this in the proposal and in earlier comments. Can you please give some feedback on what specifically you disagree with or what you see as missing or unclear?

I am skeptical that there is a contingent of potential users who would adopt OSCAL if trestle would only do the proposed separation.

Can you please share what you are looking for in terms of community support? This was shared at the last community meeting and what I perceived is that with some updates to the implementation and more understanding of the level of effort, this was supported by the group.

@degenaro
Copy link
Contributor

Let's use the recent 1.0.4 to 1.1.2 OSCAL upgrade for a thought experiment. If 1.0.4 were in 2 separate repos, then what would the steps be?

  1. SDK Repo: Create "SDK" 1.1.2 release
  2. CLI Repo: Create "CLI" 1.0.4 on top of 1.1.2 SDK release
  3. CLI Repo: Create "CLI" 1.1.2 on top of 1.1.2 SDK release

I'm not seeing the benefit of 2. In fact, I think 2 can be quite problematic, for example if SDK 1.1.2 dropped an artifact that the CLI 1.0.4 expected and employed.

Perhaps I'm not understanding separation of CLI from SDK proposal, but I'm not seeing the value presently.

@jflowers
Copy link
Member

@degenaro please read the proposal and comment in line as requested.

### Impact and desired outcome

**Improved Maintainability**: By reducing complexity and streamlining updates, the separation can make it easier to maintain and support both components.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have elaborated on this in the proposal and in earlier comments. Can you please give some feedback on what specifically you disagree with or what you see as missing or unclear?

Maintainability might more difficult. I think that OSCAL-SDK-CLI are tightly bound. Mixing and matching versions of OSCAL-SDK with CLIs might be challenging. It's not clear to me what the benefit of mixing and matching would be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree these are tightly bound. I believe part of the discussion was to explore the level of effort to change that. It agree it could result in initial challenges, but tightly coupled components have challenges as well. I don't think we should be mixing and matching versions of OSCAL and I agree that would be unexpected. The SDK would adopt new versions of OSCAL and the CLI would adopt the new version of the OSCAL through leveraged the SDK. I think this could also make the OSCAL upgrades simpler because it can be done in smaller increments.

- `compliance-trestle` continues to contain the `trestle` CLI logic
- `compliance-trestle-python` is created as the Python SDK repository
- `compliance-trestle` and other OSCAL Compass projects migration to the new SDK for applicable logic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate here on the branding issue and where you see a divergence with the current branding? Trestle is branded as an open source OSCAL SDK. From my perspective, that aligns with with the goals of this proposal.

From the first paragraph of the compliance-trestle README: ...and provides an opinionated approach to OSCAL adoption. It looks to me like the SDK will not be opinionated. Meaning that if one wishes to use the SDK to manipulate opinionated items, that won't be possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you are suggesting adding the ability to manipulate OSCAL extensions defined by trestle through the SDK would remedy this issue?

- Abstractions:
- https://github.com/oscal-compass/compliance-trestle/tree/develop/trestle/core/resolver
- https://github.com/oscal-compass/compliance-trestle/blob/dfe892936e5960ad64f6f387dbe5918314049e89/trestle/core/validator.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a miscommunication here. The intention is to include more than the base classes and as @butler54 mentioned above, it provides a place where we can include functionality over time. This is covered in the proposal document. There is an initial diagram and some links to existing trestle packages there. Could you please provide more concrete feedback on the design details? If there is something you believe would be a beneficial addition to the SDK, please add a suggestion or comment there.

For sure, these classes can be moved to a separate repo. What I question is the value of doing so. It is surely the case that trestle versions are tightly coupled with OSCAL versions today. I'm having trouble imagining that the SDK will not be tightly coupled to OSCAL and that the CLI will not be tightly coupled to the SDK, version-wise. Thus, I'm having trouble understanding the value of separate repo, since a newer version of the SDK might very well be incompatible with and older version of the CLI, and vice-versa.

Copy link
Member Author

@jpower432 jpower432 Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see an SDK with well-defined interfaces as being useful to not only the CLI, but other projects. This provides a place to create reusable logic that is valuable outside of the context of the CLI. If the SDK has logic to support multiple projects interacting with OSCAL, I would think the CLI would have a different release cycle.

  • I agree the SDK would be tightly coupled to the OSCAL version
  • I agree the CLI should be leveraging a specific version of the SDK. For reusable functionality, the CLI can interact with the SDK through these defined interfaces (like the ProfileResolver).

I do not think the CLI would ever leverage the SDK and support different versions of OSCAL and agree that there would be compatibility issues there. If there is some wording in the proposal that indicates this is a use case, please let me know so I can change it.

This how I imagined the process:

  1. Create the release for SDK v1 (OSCAL 1.1.2)
  2. CLI at the current major version can migrate to SDK v1

Future Event - we are ready to move to OSCAL version 1.2.0

  1. SDK BREAKING CHANGE - SDK adopts OSCAL 1.2.0 and releases v2 for the CLI and other projects to leverage.
  2. CLI BREAKING CHANGE - Move to to SDK v2 to adopt OSCAL 1.2.0

Is this clarification helpful?


- What will the contribution process look like for logic in `trestle`?
- Will the SDK be maintained by the `compliance-trestle` maintainer or by a new set of maintainers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please share what you are looking for in terms of community support? This was shared at the last community meeting and what I perceived is that with some updates to the implementation and more understanding of the level of effort, this was supported by the group.

If there is community support for this proposal please move forward. I've been known to be wrong. I will keep quiet now.

Copy link
Contributor

@degenaro degenaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpower432 , thanks for your replies.

I said I will be quiet and I urge you to move forward with the community. I seem to be the odd duck.

I will restate my position for clarity:

  1. I think there is a lot of value in a clearly defined API
  2. I think there is a lot of value in a CLI that employs only the clearly defined API
  3. I think OSCAL-API-CLI are tightly coupled
  4. I think that trestle maintainers should choose OSCAL releases and make quantum leaps of OSCAL-API-CLI together
  5. I think that if the community (including API only users) needs a particular OSCAL version supported, that OSCAL-API-CLI should all advance together
  6. I think that OSCAL-API-CLI should remain co-located in the same repo for simplicity
  7. I think that customers who wish to use only the API part of trestle are free to ignore the CLI (which is small in size since it only uses the API)

Copy link
Member

@jflowers jflowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

giddy up!

@vikas-agarwal76
Copy link
Contributor

vikas-agarwal76 commented Sep 13, 2024

@jpower432 @degenaro @butler54 Here are my thoughts -

  1. I believe we all agree that there is merit in having a well defined API for the SDK and all other parts of trestle like CLI, plugins, transformers, etc. being updated to use that API to manipulate the OSCAL objects.
  2. What is not very clear at this point of time is the need for separating the SDK and CLI into 2 separate projects and the complexity of maintaining them and keeping in sync.

I feel we can divide this effort into 2 phases.

  1. In the first phase we can move ahead with 1. above where we clearly define the API for the SDK and restructure the code accordingly so that rest of the code- CLI, etc. make use of the API. This way our code would be ready to be split if required. This should not effect most of the current users of trestle as the CLI and Markdown commands usage will not change as this is only an internal restructuring. We can then make the API available for use by the community with proper documentation for it.
  2. In the second phase, based on the feedback from the community on whether there are use-cases where the community members are only using the SDK and not the CLI, the community can decide whether to split into 2 projects or not. That way we will have clear evidence of the requirement for a separate SDK before splitting. Also, we would be ready with most of the changes required for splitting.

So I propose lets create an Epic to start with 1. above to create a well defined API and restructure the code in the current project. Second phase can be decided after we complete this and have some feedback from the community.

@degenaro
Copy link
Contributor

@jpower432
Copy link
Member Author

@vikas-agarwal76 I am in favor of the proposed approach. I can update some of the language in proposal to reflect this. I rephrased my understanding of the approach below to ensure we are aligned before I make changes.

Phase 1: Restructure the code in the compliance-trestle package and split it into two published packages, compliance-trestle and compliance-trestle-sdk. This will not cause any breaking changed to the CLI user, but it will cause breaking changes current SDK users. Both packages will be managed in the same repository.
Phase 2: If the SDK begins to evolve independently or there is evidence that a single repository is not favored by the community, move it to another repository.

@vikas-agarwal76
Copy link
Contributor

@jpower432 Hi Jenn, do you intend to have 2 separate PyPI projects (published packages) for SDK and CLI from same git repo or just one package with clear separation between the 2? At least my thought was to keep them in same package (trestle) with proper structure (trestle.sdk, trestle.cli) in phase 1 and then separate out in phase 2 based on the feedback.

@jpower432
Copy link
Member Author

jpower432 commented Sep 23, 2024

Thank you for clarifying Phase 1 @vikas-agarwal76. Yes, my goal with this proposal was to have at least to two separate Python packages and document the considerations around that. That is not to say I disagree with starting with a single package as an implementation detail given that separating the package would require project infrastructure effort. However, I think our opinions differ on whether Phase 2 is optional. If the decision of the community is that Phase 2 is optional, I think we could consider this proposal to be rejected or on hold and the details of Phase 1 could be captured in compliance-trestle as this would affect the code structure of a single project.

Something I would like to understand going forward is what type of evidence would trigger Phase 2. I understand user feedback is a requirement, but there are other considerations that I outlined that could also contribute to the decision.

@jpower432
Copy link
Member Author

@vikas-agarwal76 I created oscal-compass/compliance-trestle#1713 to start capturing requirements for Phase 1. I will move this PR in a draft state until we get some feedback.

@jpower432 jpower432 marked this pull request as draft September 27, 2024 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants