Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🧪⚡ Load Testing & Discussion on Scalability and Performance #340

Open
ff137 opened this issue May 31, 2023 · 1 comment
Open
Labels
question Further information is requested

Comments

@ff137
Copy link
Collaborator

ff137 commented May 31, 2023

  • What is the scalability of the system and where are the bottlenecks:

    • if the cloud-api and the cloud-controller are in tip-top shape, then the bottlenecks should ultimately be on the agent side.
    • we should simplify and optimise the processes in cloudapi as much as possible, and perhaps generate another cloud-controller that runs with a different framework -- that way we can compare controller performance as well, and get a clearer idea of what sort of performance improvements can be eked out.
    • it also becomes important to define benchmark metrics: what is an acceptable duration for onboarding, issuing or verifying a credential, etc. "As fast as possible" is not useful because there will have to be trade-offs, so we must be able to say, e.g. "a minute is fine but slower than 2 minutes it will feel sluggish". That way we have a target to measure and scale against.
    • can we predict where the main bottlenecks will be? At least the ones in our control: webhooks, trust registry, endorsing ... What can we start improving now to be prepared for scaling out?
  • What are the limitations of running a single multi-tennant instance, and how can this be scaled out with multiple instances? What are the unknowns, or further development that is required to achieve this? e.g. How can we ensure load balancing across instances?

    • How many instances will we need in order to provision wallets for and transactions between 100'000 agents (while meeting our benchmark goals)? What about 1 million, or 10 million?
  • What are the different scenarios we should cover for proper load testing? Create a million wallets, create a million schemas, issue a million credentials, verify a million credentials, etc ... we need a checklist of these different categories to evaluate performance.

  • What is the fundamental bottleneck in the aries-cloud-agents that we can a) provide open source contribution to for improvement, or b) what alternative agent providers should we look out for that we want to be able to swap in and out of our api?

These are just some initial questions to spur further discussion, in the spirit of "start by getting things down on paper". And they aren't trivial, so we should answer what we can: defining our benchmark metrics, defining our test scenarios, etc, and getting down on what other unknowns still need to be answered.

@ff137
Copy link
Collaborator Author

ff137 commented Jan 16, 2024

@rblaine95 it may be worth tracking your load testing results here. Unless it's already logged online elsewhere

@ff137 ff137 added the question Further information is requested label Jan 16, 2024
@ff137 ff137 changed the title Load Testing & Discussion on Scalability and Performance 🧪⚡ Load Testing & Discussion on Scalability and Performance Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant