This module is used to collect telemetry data from the node.
We use two types of metrics:
- Time Series metrics: Metrics collected at regular intervals and stored in a database.
- Event series metrics: Event-driven (i.e. not time-driven) metrics that are stored in a database.
- INVESTIGATE(team): Replacing logs with a proper events solution for recording Event Metrics.
By leveraging:
- Prometheus: for time-series metrics
- Plain Logs: for event metrics
- INVESTIGATE(team): Might be substituted in the future with an events database.
It is necessary to provide a telemetry configuration to your node in config.json
:
"enable_telemetry": true,
"telemetry": {
"address": "0.0.0.0:9000",
"endpoint": "/metrics"
}
enable_telemetry
: configures node to expose telemetry if true.address
: is the Prometheus server's address that the telemetry module will listen on.endpoint
: the endpoint that Prometheus exposes through the telemetry module for other services to pull the metrics (i.e. usually referred to as the scraping endpoint).
If you are not familiar with the time-series concepts related to Prometheus, you can review Prometheus Metrics.
Primarily, we use:
- Gauges
To keep track of:
- Block height
- Number of nodes online
In your module, make sure you have access to the bus, then use the metrics you need as follows:
// Retrieve the time-series agent:
timeSeriesTelemetry := module.GetBus().GetTelemetryModule().GetTimeSeriesAgent()
/* explore the methods you can use in shared/modules/telemetry_module.go */
// Increment a gauge by:
timeSeriesTelemetry.GaugeIncrement("gauge_name", 1)
/* ... */
In the current implementation, we are recording events through logs.
Loki and Grafana are used to parse the logs and generate the desired graphs.
In your module, make sure you have access to the bus, then use the metrics you need as follows:
// Retrieve the event metrics agent:
eventMetricsTelemetry := module.GetBus().GetTelemetryModule().GetEventMetricsAgent()
/* explore the methods you can use in shared/modules/telemetry_module.go */
// Emit an event by:
eventMetricsTelemetry.EmitEvent(
"namespace",
"event_name",
... // any other fields you want to include
)
/* ... */
To test this out, track an event in your code, and then go to your Grafana's local setup's link, and to the the explore page.
Run the following LogQL query:
{host="desktop-docker"} |= "[EVENT] your_namespace your_event" | pattern `<datetime> <_> <time> <type> <event_name> <any> <aditional> <whitespaced> <items>` | logfmt
See the images below for an example of how to use the Explore
functionality.
Go to the explore page:
Type in your query and play with the results:
You should see a log stream coming out where you can click a line to explore how you've used the pattern
keyword in LogQL to parse the log line. Now you can reference your parsed fields per your needs.
Example: Counting how many events we've seen by type over 5m:
sum by (type) (count_over_time(
{host="desktop-docker"}
|= "[EVENT] your_namespace"
| pattern `<datetime> <_> <time> <type> <event_name> <any> <aditional> <whitespaced> <items>`[5m]
))
Example: Counting how many events of a certain type have we seen over 5m:
sum (count_over_time(
{host="desktop-docker"}
|= "[EVENT] your_namespace your_event"
| pattern `<datetime> <_> <time> <type> <event_name> <any> <aditional> <whitespaced> <items>`[5m]
))
To launch and start using Grafana, do the following:
- Prepare environment (one-time setup)
If you haven't already, make sure you install all necessary pre-requisites in the Development README. Make sure you've installed the docker Loki dependencies:
make docker_loki_install
- Spin up the stack
make lightweight_localnet
- Wait a few seconds and Voila!
- Grafana: Visit:
http://localhost:3000
and browse pre-configured dashboards by navigating to the sidebar on Grafana and clicking onSearch Dashboards
as shown below. - Victoria Metrics: Visit
http://localhost:8428/vmui
- Check to see if Victoria Metrics detects the Prometheus endpoint by checking VM targets. You should see a section named:
pocket-nodes
with the Prometheus endpoint details and status underneath.
NOTE: Make sure you use http
and not https
when developing locally.
We follow a specific pattern to define our metrics to guarantee consistency and ease of use.
Metric definitions for each module/domain/service are stored under a new folder called telemetry
in the module's folder, in a file named metrics.go
, and they respect the following rules:
- Every metric's name and description and any additional information about the metric should be defined as a constant.
- Constants relative to a metric's definition follow a naming pattern:
<metric_name>_<metric_type>_<metric_attribute>
- We keep the actual metric name value open for definition however the developer sees fit.
For example:
We want to define a metric of type: timeseries, with a name: nodes_alive_counter
,
The constants definition will be as follows:
// metric_name=NODES_ALIVE_COUNTER
// metric_type=TIME_SERIES
// metric_attribute=NAME
const NODES_ALIVE_COUNTER_TIME_SERIES_NAME = "nodes_alive_counter"
const NODES_ALIVE_COUNTER_TIME_SERIES_DESCRIPTION = "Number of nodes online"
This makes referencing the metrics easier, and makes it easy to change the name of the metric later on, in addition to the fact that it gathers metric definitions in a single file.