Deciding on Some Changes to Code Organization #18
Replies: 5 comments 6 replies
-
+100 I love this! |
Beta Was this translation helpful? Give feedback.
-
Overall I agree, but I have few comments, but I must emphasize that those are just preferences based on similar pattern I see in other projects.
That is generally good idea, but I prefer to have one exceptions to this rule. A Crate that implements
I feel this might be a bit too excessive for lots of the cases. For instance, this directory. It only contains type definitions, and no logic. Everything thing in it is exported. Many of the types in here are few lines of code at most. I think a middle ground might be nicer, if a file describes data structures only with no (or every little) logic, we might want to bundle them together, so we would have
I didn't contribute any of the code there, but I prefer to keep fixing things there as we go and avoid having to rewrite it again from scratch.
I have no preference regarding using arrays vs readers and writers, But if you prefer readers/writers, did you considered bytes. It should be heavily tested and well optimized. |
Beta Was this translation helpful? Give feedback.
-
Most of the TPM types derive between TPMU/TPMT/TPM2B in order, so keeping those in the same module probably makes sense to avoid extra flipping back and forth when reading/writing. e.g. consider TPM2B_DIGEST (in the 'C' version of TpmTypes.h):
This suggests to me a tpmt-digest.rs meaning where |
Beta Was this translation helpful? Give feedback.
-
Hey All! I opened a poll for us to decide on the direction here (since most of the discussion has been about how to organize the submodules). Please respond there! Poll: #19 Aside from this I believe I have responded to the other concerns.
I will let this discussion stay open throughout the rest of the week! So if you want your voice heard, or disagree with anything I said please respond before then! |
Beta Was this translation helpful? Give feedback.
-
Alright, I'm guessing we're at the end of this discussion at this point. Note that the poll (#19) for how we will split up the specification base types will still be open until the middle of the week (Wednesday ~3PM PST) - if you care to please let your voice be heard! I will note that this will not be the only discussion about codestyle, layout, norms, etc that we will have. There's more to discuss, I just wanted to limit it to a few topics for the first discussion to break it up a bit. Next Steps:
After (3), I will close this discussion to keep the open discussion list clean. I'm trying to become a maintainer (PR) so that I can clean up what is already there, too. Thanks to those of you who contributed to this discussion! |
Beta Was this translation helpful? Give feedback.
-
Hello Everyone,
Nice to meet you all, my name is Trent! I work at Google on various security related things, and I'll be spending a lot of time on TPM-RS in the coming months. Emily has transitioned her leadership of the project to me, and as such I've been spending the past few months wrapping up my other work so that I can focus on TPM-RS.
This discussion isn't just to introduce myself though, I've been busy taking an inventory of this project and the direction it's heading. Over the past quarter I've learned a lot about TPMs by reading the specifications and introducing myself to the codebase. And over the past few weeks of July I've started pulling at threads relating to questions I had about the current code.
I've stepped away with a few thoughts I'd like to share on the direction I think this project should head. I'm hoping to get a gut check from others on this project if this is okay and if so we can start making some of these changes.
A Strong Foundation
I see that we're already adding some commands to the repo, and that's great - BUT I don't feel like the foundational elements of the code are very well-defined yet. I'd like for our focus to stay away from adding new commands for the time being, and towards cleaning up what we have (and adding tests, I think there's a lot of testing lacking).
I have tried a lot (but not all) of the things that I'm about to mention in my own fork. I don't intend on doing anything with that fork repo, it's just for demonstration, but if you want you can peruse it to see how some of my suggestions would pan out: https://github.com/trent-reed/tpm-rs/tree/dev/trent-reed/restructuring
Code Structure
Avoid Nested Crates
Rust has a flat crate structure, I think having crates defined within crates for the structure of the repo is kind of confusing. It makes it hard for newcomers to see what crates this project even owns. Not only that but it lacks clear delineation from one crate to the other, which feels to me like an anti-pattern.
I suggest that we bundle all the crates by their full crate name in a single directory (whether or not that's the root directory or a subdirectory named
crates
makes no difference to me, though since we want documentation to travel with the code I'm not sure we need any other directories; my first suggestion would be the root of the repo).Example: https://github.com/trent-reed/tpm-rs/tree/dev/trent-reed/restructuring
Each Crate Must Have a README
This is for newbies to the project to feel more welcome. I'll admit, I had a very, very hard time understanding where everything was and what everything was for. A simple readme in the root of a crate can help with that (as well as these other code organization strategies; looking at one big
lib.rs
is daunting and difficult to make sense of IMO).Example: https://github.com/trent-reed/tpm-rs/tree/dev/trent-reed/restructuring/tpm2-rs-marshal
Put Code in Separate Files
Today, most of the code that we have is in one single file (
lib.rs
or equivalent).Rust has a fairly comprehensive module system that allows us to move code around. There are two aspects to this module system: What the dev sees, and what the client (using the library) sees. We are able to provide additional segmentation of source files without introducing segmentation of access to types for the client.
This is called "re-exporting":
We should use re-exporting to segment our code for better type isolation, and to allow for an easier time developing and maintaining.
Example: https://github.com/trent-reed/tpm-rs/tree/dev/trent-reed/restructuring/tpm2-rs-base/src/types
Justification
This isn't just for code organization, it's also for controlling access to functions and data. The module system and the visibility system are closely related to one-another. Any type defined in a module file can access the private functions or data of other types within the same module. Similarly, submodules can access the private contents of their parents.
This leads me to my next point...
(Most) Types Should Exist In Their Own Modules
This is for a few reasons, one is better organization, the other is better control over visibility (as stated above). There can be instances where some types need helper types, or the types have very simple definitions. So this is less of a hard restriction and more like a good starting place.
If there is a closely-related type defined in the same module and it starts to become complicated (lots of impls, tests, etc), then we should consider splitting it into a submodule. Submodules are a natural way to extend functionality and limit access to the child types while still having private access from the child to the parent.
Only Modules We Want the Client To See Should be Pub
If you want to introduce a module that the client must access directly, you simply do not (and should not) re-export the types from the module. You should just mark the module as
pub
:Example: https://github.com/trent-reed/tpm-rs/blob/dev/trent-reed/restructuring/tpm2-rs-base/src/lib.rs
Large Tests Should Be Written in a Separate File
Rust has no such requirement that tests have to be defined in the same file as the implementation, and I don't see a point to doing it. It just complicates things by making the implementation messier. If the tests are small, or there are only a few of them and the implementation of the type is simple, maybe it's fine to define the tests in the same file - but in general err towards creating a new file for tests.
Example: https://github.com/trent-reed/tpm-rs/blob/dev/trent-reed/restructuring/tpm2-rs-client/src/client/tests.rs
Avoid
use super::*
Outside of Tests (But We Can Temporarily Use it While Breaking Up the Codebase)I think that for a first step towards splitting the repo into more reasonable submodules may involve using
use super::*
. This is roughly equivalent to the current setup of everything being inlib.rs
without taking a ton of time to restructure each file to re-state all of ouruse
statements. We should not do this in the future, but we can use it to start breaking up the code and then occasionally take cleanup PRs to slowly remove this later.As stated, it's reasonable for tests, because you often want to
use
all of the things in the parent module that you want to test over. Adding explicit using statements there can be kind of annoying (it's fine if you want to though, I think; I just don't think we need to require it here).I only mention it because my example restructuring uses it heavily.
Likewise, Probably Worth Avoiding Custom Preludes
You can simulate a custom prelude via a combination of
use super::*
andpub(crate) use custom_prelude::*
. You'll see I do this in my example restructuring of the project. I don't actually think this is a good thing, but again like mentioned, it may be a good first step to help us break the code up without causing a lot of work for whoever splits things up.https://github.com/trent-reed/tpm-rs/blob/dev/trent-reed/restructuring/tpm2-rs-base/src/prelude.rs
I only mention it because my example restructuring uses it heavily.
Split Logic Into Separate Modules As Well
Just because we don't define different types, doesn't mean that we can't define different modules to contain different logic. If we have some function that calls other functions and the codepath is obviously separatable, then it may be reasonable to do this. Only do this when it aids in readability and/or code separation.
One such example is in the marshalable proc-macro. There's two different implementations basically - enum and struct, so it's a perfect place to split them up. This allows it to be much easier to find the code you want. And we can still share things by using other common modules as well - if needed.
You can grant access to specific functions by marking them as
pub(super)
in the child module. This allows only the parent to access the type/function, so you can control visibility.Example: https://github.com/trent-reed/tpm-rs/blob/dev/trent-reed/restructuring/tpm2-rs-marshal-derive/src/marshalable.rs
Functional Changes
tpm2-rs-base
Remove
#[repr(C)]
from all of the types.I think the purpose of this is to allow/enable
zerocopy
, but franklyzerocopy
isn't actually being used from what I can tell. As such, defining this everywhere is currently not accomplishing anything. We can talk aboutzerocopy
support in the future, but I think that currently it's practically not doing anything for this code. So there's no point in having attributes which don't accomplish anything.Example: You can see this all over the sample repo, I removed these most everywhere.
tpm2-rs-errors
So, TpmError and TssError need some refactoring. I have an initial pass solution on this. Frankly I'm not sold on my rewrite, but I think it's a bit nicer than what we have today at least. I also think we can avoid
macro_rules
(though untested, so I may have messed up some of the logic, but it is just supposed to show the direction of the code), and if/when we can we should.I also think that
TssTspError
is the right error type for marshaling, notTpmError
(TPM error can only be reported from the TPM directly - not from us unmarshaling/marshaling something).Example (TPM Error): https://github.com/trent-reed/tpm-rs/blob/dev/trent-reed/restructuring/tpm2-rs-errors/src/tpm/tpm_error.rs
Example (TSS Error): https://github.com/trent-reed/tpm-rs/blob/dev/trent-reed/restructuring/tpm2-rs-errors/src/tss/tss_error.rs
The TPM error constants thus can look like this: https://github.com/trent-reed/tpm-rs/blob/dev/trent-reed/restructuring/tpm2-rs-errors/src/tpm/constants.rs
I think in the future it's easy to add support for mixing-in the associations with only the supporting error formats then. (Not shown in my example code, but you could imagine a function on
TpmFmt1Error
calledwith_association(association: TpmAssociation)
). This separation of the types also allows us to limit things, so that format0 errors can't mix in an association (there is not even a function for it).tpm2-rs-client
Frankly, I'm still thinking about how I would structure this.
I have some suggestions in the example repo, but I'm not sold on it. I think I'll make more concrete suggestions that take other things (like handles and sessions) into account later. I have some ideas, just haven't had time to explore them.
tpm2-rs-client-feature / tpm2-rs-service
I think we should avoid making any changes in these crates until we have the other crates nailed-down. If any of our changes break these crates, I'd just delete the code.
tpm2-rs-marshal(-derive)
I think we need to have a more structured type for writing, other than passing around a simple mutable buffer. That way the error handling can be contained in the type (kind of like the
UnmarshalBuf
but for marshaling), and we don't have to do thiswritten += ...;
stuff everywhere. Instead we should just have sequential readers and writers (that might be a good name for them too, since that's what they do).Basically, let's not force the code to pass around byte offsets and raw mutable buffers - abstract that into a helper type. Then we can be more sure error handling is happening as expected (frankly it's a little unclear to me if the current marshalable can panic or not if an ill-sized buffer is passed somewhere).
I also think we can improve this to account for cases that we currently hardcode instead of using the derive macro. I suggest we add an attribute
#[marshal(discriminant=<field>)]
that will use a separately-named field for the discriminant of an enum (this will allow us to get rid of the manual definitions in some of the types).Still thinking about the other cases, but I'm fairly convinced of the above. We should try to avoid implementing this manually when there's a path for implementing things in the proc-macro.
One other thing - there is a dependency on some internal functions defined by the proc-macro from outside of the proc-macro - this is not great. For now we can fix some of these by making them additional specific trait implementations:
But I think we can also update the marshalable to hide things in generated private submodules to avoid this happening again in the future. (This was found when moving code to separate modules.)
tpm2-rs-unionify(-derive)
I think that the buffer size calculations are closely related to
Marshalable
, so we should make those a part of theMarshalable
trait instead of having a separateUnionify
trait. (This will allow us to entirely remove the unionify and unionify-derive crates.)If everyone is okay with all this, I'd be happy to start sending some small PRs to clean things up! If not, let's discuss it here and come to a consensus on a direction. I think the current code layout is very difficult to maintain, so I want to deal with that before we do anything else.
Beta Was this translation helpful? Give feedback.
All reactions