Releases: grafana/mimir
2.11.0-rc.0
This release contains 531 PRs from 55 authors, including new contributors Benjamin, Dominik Kepinski, Jonathan Donzallaz, Juraj Michálek, Kai.Ke, Ludovic Terrier, Luke, Maciej Lech, Matthew Penner, Michael Potter, Mihai Țimbota-Belin, Rasmus Werner Salling, Ying WANG, chencs, fayzal-g, kalle (jag), renovate[bot], sarthaktyagi-505, whoami. Thank you!
Grafana Mimir version 2.11.0-rc.0 release notes
Grafana Labs is excited to announce version 2.11 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Features and enhancements
- Sampled logging of errors in the ingester. A high-traffic Mimir cluster can occasionally become bogged down logging high volumes of repeated errors. You can now reduce the amount of errors outputted to logs by setting a sample rate via the
-ingester.error-sample-rate
CLI flag. - Add total request size instance limit for ingesters. This limit protects the ingesters against requests that together may cause an OOM. Enable this feature by setting the
-ingester.instance-limits.max-inflight-push-requests-bytes
CLI flag in combination with the-ingester.limit-inflight-requests-using-grpc-method-limiter
CLI flag. - Reduce the resolution of incoming native histograms samples if the incoming sample has too many buckets compared to
-validation.max-native-histogram-buckets
. This is enabled by default but can be turned off by setting the-validation.reduce-native-histogram-over-max-buckets
CLI flag tofalse
. - Improved query-scheduler performance under load. This is particularly apparent for clusters with large numbers of queriers.
- Ingester to querier chunks streaming reduces the memory utilization of queriers and reduces the likelihood of OOMs.
- Ingester query request minimization reduces the number of query requests to ingesters, improving performance and resource utilization for both ingesters and queriers.
Experimental features
Grafana Mimir 2.11 includes new features that are considered experimental and disabled by default. Please use them with caution and report any issue you encounter:
- Block specified queries on a per-tenant basis. This is configured via the
blocked_queries
limit. See the docs for more information. - Store metadata when ingesting metrics via OTLP. This makes metric description and type available when ingesting metrics via OTLP. You can enable this feature by setting the CLI flag
-distributor.enable-otlp-metadata-storage
totrue
. - Reject gRPC push requests that the ingester/distributor is unable to accept before reading them into memory. You can enable this feature by using the
-ingester.limit-inflight-requests-using-grpc-method-limiter
and/or the-distributor.limit-inflight-requests-using-grpc-method-limiter
CLI flags for the ingester and/or the distributor, respectively. - Customize the memcached client write and read buffer size. The buffer allocated for each memcached connection can be configured via the following CLI flags:
- For the blocks storage:
-blocks-storage.bucket-store.chunks-cache.memcached.read-buffer-size-bytes
-blocks-storage.bucket-store.chunks-cache.memcached.write-buffer-size-bytes
-blocks-storage.bucket-store.index-cache.memcached.read-buffer-size-bytes
-blocks-storage.bucket-store.index-cache.memcached.write-buffer-size-bytes
-blocks-storage.bucket-store.metadata-cache.memcached.read-buffer-size-bytes
-blocks-storage.bucket-store.metadata-cache.memcached.write-buffer-size-bytes
- For the query frontend:
-query-frontend.results-cache.memcached.read-buffer-size-bytes
-query-frontend.results-cache.memcached.write-buffer-size-bytes
- For the ruler storage:
-ruler-storage.cache.memcached.read-buffer-size-bytes
-ruler-storage.cache.memcached.write-buffer-size-bytes
- For the blocks storage:
- Configure the number of long-living workers used to process gRPC requests. This can decrease CPU usage by reducing the number of stack allocations. Configure this feature by using the
-server.grpc.num-workers
CLI flag. - Enforce a limit in bytes on the
PostingsForMatchers
cache used by ingesters. This limit can be configured via the-blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes
and-blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes
CLI flags. - Pre-allocate the pool of workers in the distributor that are used to send push requests to ingesters. This can decrease CPU usage by reducing the number of stack allocations. You can enable this feature by using the
-distributor.reusable-ingester-push-worker
flag. - Include a
Retry-After
header in recoverable error responses from the distributor. This can protect your Mimir cluster from clients including Prometheus that default to retrying very quickly. Enable this feature by setting the-distributor.retry-after-header.enabled
CLI flag.
Helm chart improvements
The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.
Important changes
In Grafana Mimir 2.11 the following behavior has changed:
- The utilization-based read path limiter now operates on Go heap size instead of RSS from the Linux proc file system.
The following configuration options had been previously deprecated and are removed in Grafana Mimir 2.11:
- The CLI flag
-querier.iterators
. - The CLI flag
-query.batch-iterators
. - The CLI flag
-blocks-storage.bucket-store.bucket-index.enabled
. - The CLI flag
-blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
. - The CLI flag
-blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
. - The CLI flag
-blocks-storage.bucket-store.max-chunk-pool-bytes
.
The following configuration options are deprecated and will be removed in Grafana Mimir 2.13:
- The CLI flag
-log.buffered
; this is now the default behavior.
The following metrics are removed:
cortex_query_frontend_workers_enqueued_requests_total
; usecortex_query_frontend_enqueue_duration_seconds_count
instead.
The following configuration option defaults were changed:
- The CLI flag
-blocks-storage.bucket-store.index-header.sparse-persistence-enabled
now defaults to true. - The default value for the CLI flag
-blocks-storage.bucket-store.index-header.lazy-loading-concurrency
was changed from0
to4
. - The default value for the CLI flag
-blocks-storage.tsdb.series-hash-cache-max-size-bytes
was changed from1GB
to350MB
. - The default value for the CLI flag
-blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage
was changed from10
to15
.
Bug fixes
- Ingester: Respect context cancelation during query execution. PR 6085
- Distributor: Return 529 when ingestion rate limit is hit and the
distributor.service_overload_status_code_on_rate_limit_enabled
flag is active. PR 6549 - Query-scheduler: Prevent accumulation of stale querier connections. PR 6100
- Packaging: Fix preremove script preventing upgrades on RHEL based OS. PR 6067
Changelog
2.11.0-rc.0
Grafana Mimir
- [CHANGE] The following deprecated configurations have been removed: #6673 #6779 #6808 #6814
-querier.iterators
-querier.batch-iterators
-blocks-storage.bucket-store.max-chunk-pool-bytes
-blocks-storage.bucket-store.chunk-pool-min-bucket-size-bytes
-blocks-storage.bucket-store.chunk-pool-max-bucket-size-bytes
-blocks-storage.bucket-store.bucket-index.enabled
- [CHANGE] Querier: Split worker GRPC config into separate client configs for the frontend and scheduler to allow TLS to be configured correctly when specifying the
tls_server_name
. The GRPC config specified under-querier.frontend-client.*
will no longer apply to the scheduler client, and will need to be set explicitly under-querier.scheduler-client.*
. #6445 #6573 - [CHANGE] Store-gateway: enable sparse index headers by default. Sparse index headers reduce the time to load an index header up to 90%. #6005
- [CHANGE] Store-gateway: lazy-loading concurrency limit default value is now 4. #6004
- [CHANGE] General: enabled
-log.buffered
by default. The-log.buffered
has been deprecated and will be removed in Mimir 2.13. #6131 - [CHANGE] Ingester: changed default
-blocks-storage.tsdb.series-hash-cache-max-size-bytes
setting from1GB
to350MB
. The new default cache size is enough to store the hashes for all series in a ingester, assuming up to 2M in-memory series per ingester and using the default 13h retention period for local TSDB blocks in the ingesters. #6130 - [CHANGE] Query-frontend: removed
cortex_query_frontend_workers_enqueued_requests_total
. Usecortex_query_frontend_enqueue_duration_seconds_count
instead. #6121 - [CHANGE] Ingester / querier: enable ingester to querier chunks streaming by default and mark it as stable. #6174
- [CHANGE] Ingester / querier: enable ingester query request minimisation by default and mark it as stable. #6174
- [CHANGE] Ingester: changed the default value for the experimental configuration parameter
-blocks-storage.tsdb.early-head-compaction-min-estimated-series-reduction-percentage
from 10 to 15. #6186 - [CHANGE] Ingester:
/ingester/push
HTTP endpoint has been removed. This endpoint was added for testing and troubleshooting, but was never documented or used for anything. #6299...
2.9.4
Changelog
2.9.4
Grafana Mimir
- [ENHANCEMENT] Update Docker base images from
alpine:3.18.3
toalpine:3.18.5
. #6895
All changes in this release: mimir-2.9.3...mimir-2.9.4
2.10.5
Changelog
2.10.5
Grafana Mimir
- [ENHANCEMENT] Update Docker base images from
alpine:3.18.3
toalpine:3.18.5
. #6897 - [BUGFIX] Fixed possible series matcher corruption leading to wrong series being included in query results. #6886
Documentation
- [ENHANCEMENT] Document the concept of native histograms and how to send them to Mimir, migration path. #6757
- [ENHANCEMENT] Document native histograms query and visualization. #6757
All changes in this release: mimir-2.10.4...mimir-2.10.5
2.9.3
This release contains 1 PR from 1 author. Thank you!
Changelog
2.9.3
- [BUGFIX] Update
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp
to0.44
which includes a fix for CVE-2023-45142. #6637
All changes in this release: mimir-2.9.2...mimir-2.9.3
2.10.4
This release contains 3 PRs from 1 authors. Thank you!
Changelog
2.10.4
Grafana Mimir
- [BUGFIX] Update otelhttp library to v0.44.0 as a mitigation for CVE-2023-45142. #6634
All changes in this release: mimir-2.10.3...mimir-2.10.4
2.10.3
This release contains 1 PR from 1 author. Thank you!
Changelog
2.10.3
Grafana Mimir
- [BUGFIX] Update grpc-go library to 1.57.2-dev that includes a fix for a bug introduced in 1.57.1. #6419
All changes in this release: mimir-2.10.2...mimir-2.10.3
2.9.2
This release contains 5 PRs from 3 authors. Thank you!
Grafana Mimir version 2.9.2 release notes
Changelog
2.9.2
- [BUGFIX] Update grpc-go library to 1.56.3 and
golang.org/x/net
to0.17
, which include fix for CVE-2023-44487. #6353 #6364
All changes in this release: mimir-2.9.1...mimir-2.9.2
2.10.2
This release contains 2 PRs from 1 authors. Thank you!
Warning
This release contains a known bug in the grpc-go
library that drastically affects network performance of the servers.
Mimir 2.10.3 was released fixing this issue.
Changelog
2.10.2
Grafana Mimir
- [BUGFIX] Update grpc-go library to 1.57.1 and
golang.org/x/net
to0.17
, which include fix for CVE-2023-44487. #6349
All changes in this release: mimir-2.10.1...mimir-2.10.2
2.10.1
This release contains 6 PRs from 4 authors. Thank you!
Changelog
2.10.1
Grafana Mimir
- [CHANGE] Update Go version to 1.21.3. #6244 #6325
- [BUGFIX] Query-frontend: Don't retry read requests rejected by the ingester due to utilization based read path limiting. #6032
- [BUGFIX] Ingester: fix panic in WAL replay of certain native histograms. #6086
All changes in this release: mimir-2.10.0...mimir-2.10.1
2.10.0
This release contains 455 PRs from 54 authors, including new contributors Aaron Sanders, Alexander Proschek, Aljoscha Pörtner, balazs92117, Francois Gouteroux, Franco Posa, Heather Yuan, jingyang, kendrickclark, m4r1u2, Milan Plžík, Samir Teymurov, Sven Haardiek, Thomas Schaaf, Tiago Posse. Thank you!
Grafana Mimir version 2.10.0 release notes
Grafana Labs is excited to announce version 2.10 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Features and enhancements
- Added support for rule filtering by passing
file
,ruler_group
andrule_name
parameters to the ruler endpoint/api/v1/rules
. - Added support to only count series that are considered active through the Cardinality API endpoint
/api/v1/cardinality/label_values
by passing thecount_method
parameter. You can set it toactive
to count only series that are considered active according to the-ingester.active-series-metrics-idle-timeout
flag setting rather than counting all in-memory series. - Reduced the overall memory consumption by changing the internal data structure for labels. Expect ingesters to use around 15% less memory with this change, depending on the pattern of labels used, number of tenants, etc.
- Reduced the memory usage of the Active Series Tracker in the ingester.
- Added a buffered logging implementation that can be enabled through the
-log.buffered
CLI flag. This should reduce contention and resource usage under heavy usage patterns. - Improved the performance of the OTLP ingestion and more detailed information was added to the traces in order to make troubleshooting problems easier.
- Improved the performance of series matching in the store-gateway by always including the
__name__
posting group causing a reduction in the number of object storage API calls. - Improved the performance of label values with matchers calls when number of matched series is small. If you're using Grafana to query Grafana Mimir, you'll need to be sure your Prometheus data source configuration has the Prometheus type set to
Mimir
and theVersion
set correctly in order to benefit from this improvement. - Support to cache cardinality, label names and label values query responses in query frontend. The cache will be used when
-query-frontend.cache-results
is enabled, and-query-frontend.results-cache-ttl-for-cardinality-query
or-query-frontend.results-cache-ttl-for-labels-query
is set to a value greater than 0. - Reduced wasted effort spent computing results that won't be used by having queriers cancel the requests sent to the ingesters in a zone upon receiving first error from that zone.
- Reduced object storage use by enhancing the compactor to remove the bucket index, markers, and debug files when it detects zero remaining blocks in the bucket index. This cleanup process can be enabled by setting the
-compactor.no-blocks-file-cleanup-enabled
option totrue
. - Added new debug HTTP endpoints
/ingester/tenants
and/ingester/tsdb/{tenant}
to the ingester that provide debug information about tenants and their TSDBs. - Added new metrics for tracking native histograms in active series:
cortex_ingester_active_native_histogram_series
,cortex_ingester_active_native_histogram_series_custom_tracker
,cortex_ingester_active_native_histogram_buckets
,cortex_ingester_active_native_histogram_buckets_custom_tracker
. The first 2 are the subsets of the existing and unmodifiedcortex_ingester_active_series
andcortex_ingester_active_series_custom_tracker
respectively, only tracking native histogram series, and the last 2 are the equivalent for tracking the number of buckets in native histogram series.
Additionally, the following previously experimental features are now considered stable:
- Support for a ruler storage cache. This cache should reduce the number of "list objects" API calls issued to the object storage when there are 2+ ruler replicas running in a Mimir cluster. The cache can be configured by setting the
-ruler-storage.cache.*
CLI flags or their respective YAML config options. - Query sharding cardinality estimation. This feature allows query sharding to take into account cardinality of similar requests executed previously when computing the maximum number of shards to use. You can enable it through the advanced CLI configuration flag
-query-frontend.query-sharding-target-series-per-shard
; we recommend starting with a value of2500
. - Query expression size limit. You can limit the size in bytes of the queries allowed to be processed through the CLI configuration flag
-query-frontend.max-query-expression-size-bytes
. - Peer discovery / tenant sharding for overrides exporters. You can enable it through the CLI configuration flag
-overrides-exporter.ring.enabled
. - Overrides exporter enabled metrics selection. You can select which metrics the overrides exporter should export through the CLI configuration flag
-overrides-exporter.enabled-metrics
. - Per-tenant results cache TTL. The time-to-live duration for cached query results can be configured using the
results_cache_ttl
andresults_cache_ttl_for_out_of_order_time_window
parameters.
Experimental features
Grafana Mimir 2.10 includes new features that are considered as experimental and disabled by default. Please use them with caution and report any issues you encounter:
- Support for ingesting exponential histograms in OpenTelemetry format. The exponential histograms that are over the native histogram scale limit of 8 are downscaled to allow their ingestion.
- Store-gateway index-header loading improvements, which include the ability to persist the sparse index-header to disk instead of reconstructing it on every restart (
-blocks-storage.bucket-store.index-header-sparse-persistence-enabled
) as well as the ability to persist the list of block IDs that were lazy-loaded while running to eagerly load them upon startup to prevent starting up with no loaded blocks (-blocks-storage.bucket-store.index-header.eager-loading-startup-enabled
) and an option to limit the number of concurrent index-header loads when lazy-loading (-blocks-storage.bucket-store.index-header-lazy-loading-concurrency
). - Option to allow queriers to reduce pressure on ingesters by initially querying only the minimum set of ingesters required to reach quorum. (
-querier.minimize-ingester-requests
). - Early TSDB Head compaction in the ingesters to reduce in-memory series when a certain threshold is reached. Useful to deal with high series churning rate. (
-blocks-storage.tsdb.early-head-compaction-min-in-memory-series
). - Spread-minimizing token generation algorithm for the ingesters. This new method drastically reduces the difference in series pushed to different ingesters. Please note that a migration process is required to switch from previous random generation algorithm, which will be detailed once the feature is declared stable.
- Support for chunks streaming from store-gateways to queriers that should reduce the memory usage in the queriers. Can be enabled through the
-querier.prefer-streaming-chunks-from-store-gateways
option. - Support for circuit-breaking the distributor write requests to the ingesters. This can be enabled through the
-ingester.client.circuit-breaker.*
configuration options and should serve to let ingesters recover when under high pressure. - Support to limit read requests based on CPU/memory utilization. This should alleviate pressure on the ingesters after receiving heavy queries and reduce the likelihood of disrupting the write path. (
-ingester.read-path-cpu-utilization-limit
,-ingester.read-path-memory-utilization-limit
,-ingester.log-utilization-based-limiter-cpu-samples
).
Helm chart improvements
The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.
Important changes
In Grafana Mimir 2.10 we have changed the following behaviors:
- Query requests are initiated only to ingesters in the
ACTIVE
state in the ring. This is not expected to introduce any degradation in terms of query results correctness or high-availability. - Per-instance limit errors are not logged anymore, to reduce resource usage when ingesters are under pressure. We encourage you to use metrics and alerting to monitor them instead. The following metrics have been added to count the number of requests rejected for hitting per-instance limits:
cortex_distributor_instance_rejected_requests_total
cortex_ingester_instance_rejected_requests_total
- The CLI flag
-validation.create-grace-period
is now enforced in the ingester. If you've configured-validation.create-grace-period
, make sure the configuration is applied to ingesters too. - The CLI flag
-validation.create-grace-period
is now enforced for exemplars. Thecortex_discarded_exemplars_total{reason="exemplar_too_far_in_future",user="..."}
series is incremented when exemplars are dropped because their timestamp is greater than "now + grace_period". - The CLI flag
-validation.create-grace-period
is now enforced in the query-frontend even when the configured value is 0. When the value is 0, the query end time range is truncated to the current real-world time.
The following metrics were removed:
cortex_ingester_shipper_dir_syncs_total
- `cortex_ingester_shipper_dir_sync_f...