Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support using an integer as the class/case discriminator in polymorphic serialization #2587

Open
wants to merge 12 commits into
base: dev
Choose a base branch
from

Conversation

ShreckYe
Copy link
Contributor

@ShreckYe ShreckYe commented Mar 3, 2024

As for now, this library serializes the class/type of a subclass using a serialName in polymorphic serialization, which can be overridden with the @SerialName annotation. I'd like to propose supporting using a number and requiring subclasses to use numbers in polymorphic serialization, as it shortens the size of the serialized message greatly, especially in a binary format such as Protobuf.

When choosing to use a binary format such as Protobuf, one of the biggest concerns is shortening the size of serialized messages. In such a case serializing a fully qualified name of a subclass by default doesn't make much sense, as the binary format such as Protobuf compared to a readable string format as JSON reduces the serialized size of other fields by more than 50%, while a class discriminator takes the same size and takes up the majority of the message size, which defies the whole point. An alternative might be to override the serial name with a short name or a single letter, but IMO, still, the message is longer and it's less elegant than using a number.

…rs, and serializers modules for serializing type numbers in polymorphic serialization without any support from the serialization plugin yet
…edSerializer` that causes the tests to break
Some miscellaneous changes:
1. Add a `getSerialPolymorphicNumberByBaseClass` function in `SerialDescriptor` to throw the appropriate exception.
1. Add `defaultDeserializerForNumber` which was missing in `PolymorphicModuleBuilder`.
…ion, and update an annotation property name

The corresponding commit: huanshankeji/kotlin@9860724
…Class` into `PluginGeneratedSerialDescriptor` where it belongs and add some more tests in `SerialPolymorphicNumberTest` (which fail for now)
…phicNumber` with `@SerialInfo` to be stored in `SerialDescriptor.annotations` and refactor related code

Main changes:
1. Extract common lazy properties in `CommonSerialDescriptor` and make both `PluginGeneratedSerialDescriptor` and `SerialDescriptorImpl` inherit it.
1. Support serial polymorphic numbers with JSON's custom implementations in `AbstractJsonTreeEncoder`, `StreamingJsonEncoder`, and `DynamicObjectEncoder` while adapting the `encodePolymorphically` and `decodeSerializableValuePolymorphic` functions.
1. Revert gradle.properties since there is no need to update the compiler plugin anymore.
1. Make all tests pass in `SerialPolymorphicNumberTest` and copy it into the Protobuf module and adapt.
…ents and reverting some unnecessary changes
@ShreckYe
Copy link
Contributor Author

ShreckYe commented Mar 3, 2024

More details to discuss

Current API design

The current design is like this:

@Serializable
@UseSerialPolymorphicNumbers
sealed class Sealed {
    @Serializable
    @SerialPolymorphicNumber(Sealed::class, 1)
    object Case : Sealed()
}
Json.encodeToString<Sealed>(Sealed.Case)
Json.decodeFromString<Sealed>("""{"type":1}""")

One marks the classes with @UseSerialPolymorphicNumbers and @SerialPolymorphicNumber, when the message is serialized it will be {"type":1} in JSON, for example.

Alternatives of implementation details and designs

The implemented code is a prototype. Here are improvements and alternatives to be discussed.

Alternative names for "serial polymorphic number"

There can be alternative names for the word "serial polymorphic number". For example, these are what I can think of:

  1. (serial) discriminator number
  2. polymorphic number
  3. polymorphic serial number
  4. (serial) subclass number
  5. (serial) case number

Functions

There are 2 properties useSerialPolymorphicNumbers and serialPolymorphicNumberByBaseClass and a function getSerialPolymorphicNumberByBaseClass which depend on annotations added to the SerialDescriptor interface, which are also overridden in CommonSerialDescriptor for lazy caching. If this brings about unintended breaking changes to the existing core API, I can refactor them to be extension functions in a Polymorphic.kt file like the one in kotlinx.serialization.json.internal.

The added function names may have better alternatives too.

Some more possible unimplemented APIs

An independent version of the @SerialPolymorphicNumber annotation

We can also add a version of the @SerialPolymorphicNumber annotation which is independent of the base class:

@Serializable
@UseSerialPolymorphicNumbers
sealed class Sealed {
    @Serializable
    @IndependentSerialPolymorphicNumber(1)
    object Case : Sealed()
}

And it shall be overridden by the dependent ones if specified.

Complements to the annotations in the SerializersModule DSL

In addition to the annotations, there can be completion APIs to set them in the SerializersModule DSL, to ensure this requirement for all data, or for external classes which can't be marked with annotations. So we can add the following APIs:

  1. allUseSerialPolymorphicNumbers property in SerializersModuleBuilder to require all polymorphic serialization data to use numbers, in case there are missing ones.
  2. useSerialPolymorphicNumbers property in PolymorphicModuleBuilder as a complement to @UseSerialPolymorphicNumbers, which can also override the @UseSerialPolymorphicNumbers annotation and allUseSerialPolymorphicNumbers.
  3. An additional serialPolymorphicNumber argument in PolymorphicModuleBuilder.subclass as a complement to @SerialPolymorphicNumber, which can override @SerialPolymorphicNumber annotation.

Example:

SerializersModule {
    allUseSerialPolymorphicNumbers = true
    polymorphic(Project::class) {
        useSerialPolymorphicNumbers = true
        subclass(OwnedProject::class, serialPolymorphicNumber = 1)
        defaultDeserializer { BasicProject.serializer() }
    }
}

Other uncompleted tasks

There are some KDocs I left empty with TODO since the APIs are not finalized yet.

The mardown docs in Kotlin Serialization Guide are also not updated yet.

@ShreckYe ShreckYe changed the base branch from master to dev March 3, 2024 18:53
@pdvrieze
Copy link
Contributor

pdvrieze commented Mar 4, 2024

There is a lot of added complexity and special casing in this code. The same outcome could be achieved with a custom implementation of a polymorphic serializer (that maps between numbers and serial names), that can be marked as serializer for the base type. There are then only a few additions needed from serialization itself:

  • A nicer way to enumerate all subtypes/subserializers for a base type (this is possible now, but not quite elegant)
  • Some way for the serialization plugin to enforce (or warn about) requirements on subtypes (such as the need to specify an annotation).

Note also that your numbers system is a challenge as the numbers need to be unique (possibly within the entire program) if they are to completely replace serial names (on types).

@recursive-rat4
Copy link
Contributor

Alternative names for "serial polymorphic number"

Number is ambiguous here, I typically use UByte.

@ShreckYe
Copy link
Contributor Author

ShreckYe commented Mar 5, 2024

@pdvrieze Thanks for the reply.

The same outcome could be achieved with a custom implementation of a polymorphic serializer (that maps between numbers and serial names), that can be marked as serializer for the base type.

This is indeed the approach I am currently adopting in my projects, where I need to specify two-way KClass Int conversions. However, I think it's more verbose and separates the discriminator numbers from the annotation configurations (like @ProtoNumber). So I think supporting this with annotations still looks more elegant and provides a better separation of concerns.

  • A nicer way to enumerate all subtypes/subserializers for a base type (this is possible now, but not quite elegant)

This looks good, but I think it doesn't help that much if such information is only available at runtime and such exhaustiveness checks are not available at compile time.

  • Some way for the serialization plugin to enforce (or warn about) requirements on subtypes (such as the need to specify an annotation).

Yeah this looks like a general and elegant way.

Note also that your numbers system is a challenge as the numbers need to be unique (possibly within the entire program) if they are to completely replace serial names (on types).

In the current implementation, the numbers are only unique in the scope of one certain base class (which is necessary). That's why a baseClass argument needs to be passed to @SerialPolymorphicNumber.

If you mean in the scope of one base class, the number of a subclass should be different in different places, I think we can support it in the SerializersModule DSL as mentioned in the "Some more possible unimplemented APIs" section above. As a similar example, @SerialName is also unique in this way.

@ShreckYe
Copy link
Contributor Author

ShreckYe commented Mar 5, 2024

@recursive-rat4 Yeah perhaps "integer" is a better word. I still think using an Int or a UInt is a better idea here. Though it's unlikely that a parent class has more than 256 subclasses, using an Int/UInt covers such cases and is as fast as using a Byte/UByte because of our 32-bit/64-bit CPU architectures.

@ShreckYe ShreckYe changed the title Support using a number as the class/case discriminator in polymorphic serialization Support using an integer as the class/case discriminator in polymorphic serialization Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants