Replies: 3 comments 2 replies
-
I think we should be able to achieve this quite easily. Creating a proper schema for the data and building the events from data upon loading will be some work, but it's not hard. Sidenote: |
Beta Was this translation helpful? Give feedback.
-
I am for getting rid of JSON in DB. Thank you for clarification that this discussion/decision is not about ES. |
Beta Was this translation helpful? Give feedback.
-
TL;DRI support the idea of storing the DLC in a separate table, moving away from the large JSON blob in the Comments
I don't see the relationship between changing the storage format of the data associated with an event and being able to make changes to its form. Can you elaborate on that?
I disagree with the idea that we are storing unnecessary events (or data) in general. After culling old DLC data from certain events, we are storing a fairly minimal amount of data. The name and timestamp of each event serves as an event log, which is something that I think we don't want to give up as it has helped with debugging in the past. All repetitive events come with no data associated with them, except for Loading a large JSON blob should only happen once per CFD. Optimising that is fine, but I don't think it's fair to say that this is a substantial problem now.
The number of saved events should not change if we keep an event log. Perhaps we should ask ourselves: are we saving too many events that are unimportant? Can we save event names in a smaller format (e.g. using custom sqlite enums)?. But in any case, this should have little to do with changing the storage format of event data. We should now only have one large, relevant blob of JSON per CFD, no matter how many events are associated with it.
These points are debatable for me. SuggestionOne possible disadvantage/task that may have not been considered: we will have to rework/drop the |
Beta Was this translation helpful? Give feedback.
-
There is long discussion in #1867 around how we can leverage an empty event table for performance improvement.
This discussion in here is about the particular proposal of getting rid the JSON blob in the DB and normalize our tables for better performance and improved readability plus upgradability.
Why JSON
First, let's see why we introduced JSON for storing event data in the first place. While event sourcing does not force JSON on you, it seems like it's the default choice for most:
+
it's easy to implement+
there is native support for JSON in most database systems+
it's language independentI don't think we ever had a discussion around alternatives and went with the most obvious choice.
What's the problem with JSON?
-
It's big: a serialized JSON object is big-
It's slow: JSON serialization is slow (if you remember flamegraph results)-
It's not easily readable in the db neither: while in general human readable, a single column, with a gigantic (>800kb) datablob is not of use for anyone.-
We have different json objects in the same column (this is due to Event Sourcing) which does not make reading any easier.-
Naturally db design says that each type belongs in its own column . If you are interested look upfallacy of the Entity-Attribute-Value antipattern
.-
It's hard to upgrade and hinders progress: Not being able to change the events hinders progress because we cannot iterate on the data model as easily. At some places or data model is wrong and we are not able to fix it which slows down innovation and cycle time of new features (e.g. aggregated positions?). The general event sourcing approach is to just introduce a new event but this means we need to keep supporting the old event forever.Alternatives
Binary
A binary serialization format requires an additional step when you want to read from the db.
For us, this can be a real hassle because we like to read from the DB, in fact, there are active efforts to move to a hosted DB solution so that we can read from the database during runtime. Because of the last point, I rule binary as a choice out but for the sake of completion, here are some pros and cons
+
more storage efficient than JSON. Binary should be more compressed than JSON. I saw numbers claiming 60%-70% in size reduction+
more efficient when serializing and deserializing-
not human readableProposal
We are already breaking the golden rules of event sourcing with #2041 to improve efficiency.
However, even after doing so we have unnecessary many events in the db and loading JSON from this is super slow.
As an example, as of last week, we had 31998 events in the DB while only having 12 open positions.
This is simply redonkulous.
Even culling old CFD data on an already culled DB takes a couple of seconds!
Long story short, my proposal is to drop the JSON blob an replace it in favor of a normalized DB.
Disadvantages
-
While we might be able to move one event at the time into an own table, this endeavour is big and working on the same parts of the code base should be avoided to reduce the risk of heavy merge conflicts.-
I don't see a simple SQL migration script (Figure out how to actually leverage an empty events table for performance improvements #1867 (reply in thread)). We can opt for either breaking backwards compatibility or write a migration script in Rust.-
We are moving away from event sourcing-
We will have to rework/drop the cull_old_dlcs mechanism introduced recently, since the SQL queries are designed around JSON.Advantages
+
we are moving away from event sourcing :P+
using a relational db simplifies reading+
it is more efficient because we don't have to serialize back and forth from JSON+
it needs less storage+
it is more cost efficient because of the last point+
we will be able to get rid ofCfdAggregate
which split up the same logic over various code parts and has been the source of various bugs already+
the complexity of our model is reduced to the db module, i.e. when done correctly (and wanted) we have a single function to load CFDs from the DB.+
it is more efficient+
it is extendible, we can easier extend the model and add additional events, fieldsBeta Was this translation helpful? Give feedback.
All reactions