Use Lazy for expensive properties in System Dataset #550
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In my tests with repeated creation (clones) of Faker instances, the creation of this dataset was by far the most expensive one. Making it lazy should alleviate a lot of GC pressure, until the Dataset is actually used.
For my benchmark I ran 100 generators in parallel, each of which creating ~100 Fakers. The fakers are cloned from a static base config. A generator has all the configs needed to create all the data needed for our app, which it does.
This should also be reproducible with a simpler setup. That would probably drop the lazy version to even lower allocations as there wouldn't be that much data generated from it.
Is there anything that would speak against this change? It speeds up data generation a lot in cases like mine without sacrificing functionality. Tests all passed without issues, too.