Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Observability] Permissionless demand observability #762

Open
6 tasks done
Olshansk opened this issue Aug 26, 2024 · 18 comments
Open
6 tasks done

[Observability] Permissionless demand observability #762

Olshansk opened this issue Aug 26, 2024 · 18 comments

Comments

@Olshansk
Copy link
Member

Olshansk commented Aug 26, 2024

Objective

Baseline observability to support the effort here and the development efforts by @red-0ne related to permissionless

Origin Document

Goals

Deliverables

Update an existing dashboard (if it's available) or create a new one that contains the following:

  • Line charts showing:
    • Block size (in MB)
    • Num proofs per block
    • Total number of relays in network
  • Total number of relays in network
    • Double bar chart showing average number of proof per application AND per service
    • Relay mining difficulty per service (already have this but make it more accessible)
      Same chart needs to show what the multiplier looks like
  • Per (metered) Supplier charts
    • Double bar chart showing the cumulative number of relays serviced vs the cumulative number of estimated relays, per supplier

Non-goals / Non-deliverables

  • Building a new dashboard if one already exists

Creator: @Olshansk
Co-Owners: @okdas @red-0ne

@okdas
Copy link
Member

okdas commented Aug 26, 2024

Going to extend this dashboard: https://grafana.poktroll.com/d/b799a130-3789-416d-aa7f-de5f4599cf03/network-overview?orgId=1

cc @Olshansk @red-0ne

@red-0ne
Copy link
Contributor

red-0ne commented Sep 17, 2024

@okdas, @Olshansk I updated the deliverables to add Per (metered) Supplier charts.

Definitely not all RelayMiners will be metered, but we can have them from the relay miners we own.

@okdas
Copy link
Member

okdas commented Sep 30, 2024

Changing some metric names per prometheus guide and adding new ones (including the metrics that can be found in cosmos-sdk modules but are missing in ours). - #832

This will temporarily invalidate some dashboards on Dev/Test nets until we change the queries there.

@okdas
Copy link
Member

okdas commented Oct 4, 2024

Double bar chart showing average number of proof per application AND per service

I don't think we can expose this metric on full-node due to cardinality issues. I'll see if we can make it configurable/optional.

@Olshansk
Copy link
Member Author

Olshansk commented Oct 4, 2024

@okdas

Changing some metric names pe

Do you mean the suffixes?

I don't think we can expose this metric on full-node due to cardinality issues

I realize this is a non-trivial metric. Curious if you can just explain/show the cardinality issue more?

@okdas
Copy link
Member

okdas commented Oct 5, 2024

I realize this is a non-trivial metric. Could you explain or demonstrate the cardinality issue in more detail?

I've added a custom configuration that is disabled by default but can be enabled on smaller networks. It will soon be ready for review in #832. So this won't be a long-term issue anymore.

To provide context:
Imagine counting the number of relays on a full node based on on-chain data. The total on the network is easily calculated and represents a single time series.

If we break down the total by suppliers (say 5) = we have the same number of time series as there are suppliers - 5.
If we break down the metric by both suppliers AND applications (5 each) = we now have 25 unique time series.
Now, if we have thousands of suppliers and applications, the full node and telemetry infrastructure will struggle to handle this.

Do you mean the suffixes?

Not just that, but names in general. We had a discussion on GitHub a long time ago, and I wanted to revisit it. Now seemed like a good time.

@okdas
Copy link
Member

okdas commented Oct 21, 2024

After a quick conversation in #832 I want to check if we can get some data for the deliverables in this ticket can be imported into Grafana from pocketdex db directly. We should be able to make it work, there just might be issues with JSON parsing.

@Olshansk
Copy link
Member Author

@okdas Are you going to use the SQL or GraphQL endpoints? Asking just to understand what you're thinking.

cc @bryanchriswhite for visiblity ☝️

@okdas
Copy link
Member

okdas commented Oct 23, 2024

Are you going to use the SQL or GraphQL endpoints? Asking just to understand what you're thinking.

I think SQL is a better option because Grafana has an official postgresql data source. There are GraphQL data sources from the community, though, I'll resort to them if Postgres won't work for any reason.

LLMs write queries quickly and reliably now, I think it should be a good solution for what we need. Will check this out soon!

@bryanchriswhite
Copy link
Contributor

bryanchriswhite commented Oct 24, 2024

... there just might be issues with JSON parsing.

@okdas, I'm not sure if any of the following context is helpful, but perhaps it adds useful detail:

Fields of indexed objects (blocks, txs, msgs, events, etc.) which contain JSON encoded objects should be stored in postgres using its json (or possibly jsonb, have to double-check) data type. The intent was to simplify indexer logic and be able to leverage postgres's JSON operators when querying.

@okdas
Copy link
Member

okdas commented Oct 24, 2024

@bryanchriswhite I'd love to use JSON operators against jsonb columns. I assumed this is the case, but it appears the fields are actually text.

postgres=# \d messages
                  Table "localnet.messages"
     Column     |   Type    | Collation | Nullable | Default
----------------+-----------+-----------+----------+---------
 id             | text      |           | not null |
 type_url       | text      |           | not null |
 json           | text      |           |          |
 transaction_id | text      |           | not null |
 block_id       | text      |           | not null |
 _id            | uuid      |           | not null |
 _block_range   | int8range |           | not null |
Indexes:

I'll see if we can adjust that. I couldn't find where the migrations happen. Do we just let postgraphile parse the graphql schema and map it to postgres?

I've been using events successfully to extract metrics now. Events are not saved as json values so that was easy. I think this should work just fine, but please let me know if I any of you can think of a reason to use transactions instead.

@bryanchriswhite
Copy link
Contributor

bryanchriswhite commented Oct 25, 2024

@bryanchriswhite I'd love to use JSON operators against jsonb columns. I assumed this is the case, but it appears the fields are actually text.

I'll see if we can adjust that. I couldn't find where the migrations happen. Do we just let postgraphile parse the graphql schema and map it to postgres?

It's not PostGraphile that does this but yes, to make a field use the JSON type in postgres, we need to define it as an entity/type with the @jsonField decorator, and then appy that entity/type to the desired field. See coin as an example.

@okdas
Copy link
Member

okdas commented Oct 29, 2024

I had a success with integrating pocketdex into grafana in #893. This will work to resolve the rest of deliverables.

@Olshansk
Copy link
Member Author

@okdas For DevNet/TestNet, would we need to add this to our deployment infra?

@okdas
Copy link
Member

okdas commented Oct 29, 2024

@Olshansk yes - we are going to need to. Not sure if DevNets actually need it but yes for TestNet, 100%.

@Olshansk
Copy link
Member Author

@okdas Can you please create a ticket? Just a screen of this convo should suffice.

@okdas
Copy link
Member

okdas commented Oct 30, 2024

While I added multiple dashboards that have related metrics, I added a separate one to resolve deliverables in one place:
Image
Ignore missing ema/difficulty as it only updates when it changes for now.

@Olshansk
Copy link
Member Author

Appreciate the separation + update @okdas.

Before you head OOO, please link / state the instructions to reproduce. It could be as simple as:

  1. make localnet_up
  2. make test_e2e
  3. Go to localhost:8080

Trying it hands-on and trying to get intuition around the data and what's happening on-chain will be super valuable here.

okdas added a commit that referenced this issue Oct 31, 2024
## Summary

Refactor the foundation for E2E tokenomics observability w/ lots of new
data points.

Key changes include:
- `x/tokenomics` telemetry
- Begin/End blockers execution time management
- Custom `poktroll` telemetry config in `app.toml`

## Issue

- #762

---------

Co-authored-by: Daniel Olshansky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🏗 In progress
Development

No branches or pull requests

4 participants