Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive Has()/GetSize() calls, caching? #113

Closed
ShadowJonathan opened this issue Aug 5, 2020 · 7 comments
Closed

Excessive Has()/GetSize() calls, caching? #113

ShadowJonathan opened this issue Aug 5, 2020 · 7 comments
Labels
need/triage Needs initial labeling and prioritization

Comments

@ShadowJonathan
Copy link
Contributor

I'm currently having a problem with using this plugin for b2 backblaze storage, which has an s3 API, they charge for "Class B" transactions, under which HeadObject falls, which GetSize uses, which in turn is used by Has()

As far as I know, Has() is called every time a bitswap request comes over the network, to see if the block can be provided to the other node, with the rate at which this happens, however, the transactions quickly build up, and I'm looking at a 300K transaction count just after bringing this node up for just 5 hours.

That in turn translates to some significant monthly extra costs, is it possible to "cache" the keys available in the s3 bucket? To perform a ListObjects call every minute or so, and to update a local "Key List" file which'd every Has call will be validated against (performing no calls to the s3 api), a Put should then automatically add that key to the list.

@ShadowJonathan ShadowJonathan added the need/triage Needs initial labeling and prioritization label Aug 5, 2020
@welcome
Copy link

welcome bot commented Aug 5, 2020

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@ShadowJonathan
Copy link
Contributor Author

This "Key List" could possibly also just exist in memory, though that could become problematic if simply too many keys exist in the s3 bucket (which is very possible, as these buckets can become quite large)

@ShadowJonathan ShadowJonathan changed the title Excessive Has() calls, caching? Excessive Has()/GetSize() calls, caching? Aug 5, 2020
@ShadowJonathan
Copy link
Contributor Author

I discovered that GetSize() is getting a sizable amount of calls as well, through profiling it in /debug/metrics/, maybe some caching could help there as well

@aschmahmann
Copy link
Contributor

@MichaelMure any thoughts?

@Stebalien
Copy link
Member

@ShadowJonathan the right way to deal with this is to use the bloom filter cache: https://github.com/ipfs/go-ipfs/blob/master/docs/config.md#datastorebloomfiltersize. However, this will list every block you have on start.

@Stebalien
Copy link
Member

There's also an ARC cache to remember whether or not we have blocks (and their sizes). Unfortunately, the size of that cache is not tunable. See ipfs/go-ipfs-config#41 for some WIP work there.

In general, modifying this datastore directly is not the right approach

@Stebalien
Copy link
Member

Note: I'm happy to re-open this if you disagree. I just don't want to leave it around given how old it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/triage Needs initial labeling and prioritization
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants