-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Result set streaming #702
base: main
Are you sure you want to change the base?
Conversation
…sult-streaming
…sult-streaming
PerformanceThe main target of result set streaming is to reduce memory consumption, but it is important that the clock cycles are not being wasted. As the ultimate metric will be request throughput. When the initial result set streaming was implemented it was measured for its performance (results). For certain implementations the performance was un acceptable. This PR also addresses the extreme outlier its fundamental performance issues while optimizing the other implementations where possible. resultsThese are the results from the test
The current standard is that all queries are processed in The variance of |
Result set streaming
Currently it is possible to stream
Readable
objects into theINSERT
queries. This enables for fast mass data inserting, but often it is also required to serve large result sets fordownload
orexcel export
requests. Currently the biggest restriction for these endpoints is the default1000
resultlimit
as the default batch size for anexcel export
fromUI5
is set to5000
rows. Which means that currently cap receives 5x the requests for a singleexcel export
. The most important reason for the default1000
limit is keeping the application from runningout of memory
. Therefor Result set streaming enforces ahighWaterMark
according thenode
standard streaming implementations. While this does not fully prevent a result set stream from using more memory then thehighWaterMark
it is a soft limit enforced on the stream and provides a balance between throughput and memory usage.Raw stream
Up till now CAP has always loaded all data into javascript objects and processed it inside the javascript layer. With raw result set streaming the
json
result provided from the database connection is never parsed and is kept in a rawBuffer
format. This greatly improves memory and cpu usage.The big drawback is that the result cannot be manipulated inside the javascript layer. Therefor it is not possible to call
after
handlers on the result set. Additionally it is required for the protocol adapter to be able to handle theReadable
and write it correclty onto theres
body stream.Object stream
For the cases that it is required to modify the results using javascript it is possible using the Object stream. Instead of loading the whole result as an
Array
into memory the same results are passed through as single Objects. Allowing the protocol adapters to serialize them back toJSON
to be written back into theres
body as they are processed.While this does not benefit from the cpu usage benefits that come with Raw streams. There is still the memory usage benefits which can still result in reduced response times as
V8
has garbage collection optimization for short lived objects.Expand streams
When using Object streams it might be the case that the root result set is only a few rows, but each row has a large amount of children which would still all be loaded into memory and be counted as a single Object inside the
highWaterMark
. To prevent this from happening it is possible to apply a recursive streaming approach that handles allexpand
columns as Object streams as well.Depending on the database connection protocol it might be required to still load all results into the
Readable
buffer as the order of the root and children rows are related. So when reading all the root entries would require loading all children into the buffer. As they are interlaced in the result set between the root entries.Usage examples
Raw stream
The most common use case for Raw result set streams are protocol adapters. Where the final result has to be
Object stream
The most common place for the Object stream usage would be in custom handlers that required to modify the data or do additional calculations.
Expand stream
This is an extension of the previous usage case, but with children rows.
PR Status
JSON
result stream