Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify properties #9

Merged
merged 3 commits into from
Oct 7, 2024
Merged

Simplify properties #9

merged 3 commits into from
Oct 7, 2024

Conversation

AntonOresten
Copy link
Member

@AntonOresten AntonOresten commented Sep 28, 2024

Closes #8

I did a little more than simplifying the properties, adding an atoms field to ProteinStructure, and removing the backbone field from ProteinChain. This is breaking and intended to be pushed with a version that easily serializes to HDF5.

Could use some more unit tests.

Copy link
Collaborator

@bicycle1885 bicycle1885 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Are you going to add the serializer to this PR?

@AntonOresten
Copy link
Member Author

Yes, here it is. I currently define ProteinChains.serialize and ProteinChains.deserialize which are meant work with collections of ProteinStructures, so there's currently no way to write single chains or structures. Is it worth having some ProteinDataset <: AbstractDict{String,ProteinStructure} type for easy indexing and unambiguous serialization IO, or are vectors good enough?

Copy link
Collaborator

@bicycle1885 bicycle1885 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the serialization and deserialization pair, which works nicely. One practical thing we'll need is partial/sequential deserialization from a serialized data file because the file would be too large to deserialize in one go. I assume some interface like the following:

data = Deserializer("data.h5")

# Sequential deserialization
for structure in data
    # Do something
end

# Partial deserialization
for id in some_id_list
    structure = data[id]  # Or multiple ids if it is more efficient
    # Do something
end

This can be done in a separated pull request.

@AntonOresten
Copy link
Member Author

Great idea with the lazy deserialization type. I might also try a corresponding "Serializer" with e.g. a Base.delete! method, as the current serialize mode keyword argument doesn't give a lot of control. Opening a PR soon!

@AntonOresten AntonOresten merged commit 4bb00a0 into main Oct 7, 2024
3 checks passed
@AntonOresten AntonOresten deleted the simplify-properties branch October 10, 2024 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplifying the chain type definition with named tuples
2 participants