feat(om2): add native histograms to OpenMetrics2.0 #2634

krajorama · 2025-04-24T15:04:47Z

Background

Based on https://github.com/prometheus/proposals/blob/main/proposals/2024-01-29_native_histograms_text_format.md
And OpenMetrics 2.0 WG discussions.

Changes

Allow structured complex types marked by "{" and "}" in the specification.
Allow multiple exemplars per complex type value.
~~Require that exemplars for complex type values have the timestamp.~~ (1)
Be permissive about observing NaN , +Inf, -Inf. Discourage observing NaN.
Split histogram into ones with classic and native buckets.
For classic buckets, define behavior when observing NaN.
Define the native buckets and also how NaN is handled.
Define the text format of native histograms and also their exemplars.

(1) Not doing this on account of being an implementation limitation and also contradicts having backwards compatibility when exposing classic histograms as complex type.

Open questions / decisions

See OpenMetrics2.0 WG meeting notes tab

Signed-off-by: György Krajcsovits <[email protected]>

krajorama · 2025-04-30T12:39:25Z

content/docs/specs/om/open_metrics_spec_2_0.md

+
+Numbers MUST be either floating points or integers. Note that ingestors of the format MAY only support float64. The non-real values NaN, +Inf and -Inf MUST be supported. NaN MUST NOT be considered a missing value, but it MAY be used to signal a division by zero.
+
+Complex data types MUST contain all information necessary to recreate a Metric Type, with the exception of Created time and Exemplars.


This assume we'll have the created timestamp separate from the JSON like data. prometheus/OpenMetrics#285

Signed-off-by: György Krajcsovits <[email protected]>

krajorama · 2025-05-06T14:58:40Z

Note self: add details how summaries and classic histograms one liners (so not NHCB spans/deltas) fit into it.

Signed-off-by: György Krajcsovits <[email protected]>

Add semantic conventions about where complex types may occure. Allow empty spans and deltas. Be more precise. Signed-off-by: György Krajcsovits <[email protected]>

Signed-off-by: György Krajcsovits <[email protected]>

ArthurSens

Had some minutes to read the PR. Couldn't read everything though, I'll do a more complete review another day!

Reviewed from mobile

content/docs/specs/om/open_metrics_spec_2_0.md

Signed-off-by: György Krajcsovits <[email protected]>

docs/specs/om/open_metrics_spec_2_0.md

I did not want to clutter the examples with adding "# EOF" to all, so I made a new marker to explicitly add it on request. There was only one faulty example where we had an extra space, see the Exemplars section. Signed-off-by: György Krajcsovits <[email protected]>

Allow multiple exemplars for complex types, i.e. native histograms. But require that the timestamp is present. Signed-off-by: György Krajcsovits <[email protected]>

beorn7

Thanks for doing all of this.

Mostly commented about the histogram aspect (but I couldn't resist and referred to OM specific things now and then).

docs/specs/om/open_metrics_spec_2_0.md

beorn7 · 2025-06-12T23:29:07Z

docs/specs/om/open_metrics_spec_2_0.md

+
+Histograms with exponential buckets use the integer native histogram data type.
+
+The integer native histogram data type is a JSON like structure with fields. There MUST NOT be any whitespace around fields.


It's not really JSON. Many language use vaguely this kind of structure. I wouldn't call out JSON because it creates the expectation that other JSON syntax rules also apply.

There MUST NOT be any whitespace around fields.

Has it already been decided that OMv2 will stay whitespace intolerant?

It's one of my big big red flags with OMv1 that it doesn't tolerate whitespace like the classic Prometheus text format. If OMv2 will see the light and allow whitespace again, then of course we should allow whitespace here.

We have not discussed this in OM2.0 explicitly. Comes from https://github.com/prometheus/proposals/blob/main/proposals/2024-01-29_native_histograms_text_format.md.

However we did say that since native histograms use a structure that's essentially internal, we kind of already given up on being very human friendly. I still keep the count and sum fields up front because that's easy to understand and relates to PromQL.

A consequence is that I think we'll have to give some tooling (like promtool) for people to pretty print the output and also instrumentations may allow pretty printing themselves.

I'm more shooting for the high-level consistency: If OMv2 will also require precise usage (and non-usage) of whitespace (like OMv1), then we should of course do the same inside the native histogram representation.

However, if OMv2 will be liberal WRT whitespace again (which I hope), then we should not be more restrictive inside the native histogram representation.

(I don't want to open the whitespace discussion here in this comment. Here I'm only proposing to use the same rule (either liberal whitespace use or precisely ruled whitespace placement) inside and outside of native histograms.)

I'd like to defer the space (SP) question to a separate PR.

beorn7 · 2025-06-12T23:31:43Z

docs/specs/om/open_metrics_spec_2_0.md

+
+Exponential bucket values MUST be ordered by their index, and their values MUST be placed in the `negative_deltas` (and/or `positive_deltas`) field using delta encoding, that is the first bucket value is written as is and the following values only as a delta relative to the previous value. For example bucket values 1, 5, 4, 4 will become 1, 4, -1, 0.
+
+To map the `negative_deltas` (and/or `positive_deltas`) back to their indices, the `negative_spans` (and/or `positive_spans`) field MUST be constructed in the following way: each span consists of a pair of numbers, an integer called offset and an non-negative integer called length. Only the first span in each list can have a negative offset. It defines the index of the first bucket in its corresponding `negative_deltas` (and/or `positive_deltas`). The length defines the number of consecutive buckets the bucket list starts with. The offsets of the following spans define the number of excluded (and thus unpopulated buckets). The lengths define the number of consecutive buckets in the list following the excluded buckets.


Capitalize after :: each → Each

docs/specs/om/open_metrics_spec_2_0.md

Signed-off-by: György Krajcsovits <[email protected]>

even unpopulated ones. Signed-off-by: György Krajcsovits <[email protected]>

Empty may be exposed for optimizing spans. Signed-off-by: György Krajcsovits <[email protected]>

Signed-off-by: György Krajcsovits <[email protected]>

docs/specs/om/open_metrics_spec_2_0.md

Signed-off-by: György Krajcsovits <[email protected]>

Rename negative_deltas and positive_deltas to negative_buckets and positive_buckets. Distinguish integer and float histograms by the number format used in them. If the number has dot "." or exponent "e", then the number is a float. Otherwise integer. Signed-off-by: György Krajcsovits <[email protected]>

The timestamp requirement came from current limitations in Prometheus exemplar storage. However in the future this might be solved. Also there's the case when a classic histogram exposes exemplars. Signed-off-by: György Krajcsovits <[email protected]>

beorn7 · 2025-06-24T20:10:07Z

docs/specs/om/open_metrics_spec_2_0.md

-Buckets MUST be sorted in number increasing order of "le", and the value of the "le" label MUST follow the rules for Canonical Numbers.
+Classic Buckets MUST be sorted in number increasing order of "le", and the value of the "le" label MUST follow the rules for Canonical Numbers.
+
+All Classic Buckets must be represented, even the unpopulated ones that didn't measure any values.


Hmm, not sure if that nails it. As Classic Buckets are cumulative, there aren't really "unpopulated buckets" - except at the lower end.

Another aspect is that we advertise the benefit of cumulative buckets that individual buckets can be removed from the histogram to reduce resolution without the need of updating other buckets.

What I was shooting for in my other comment is that the "le" values SHOULD NOT change frequently between scrapes. (Maybe that's somewhere implied where I didn't catch it. And it's definitely not a new thing in OMv2. I might just notice it now where we have the contrast to native histograms, where schema changes within the standard exponential schemas are "just fine", while all hell breaks loose whenever you change the "le" values with a classic histogram.)

I'll remove it.

What I was shooting for in my other comment is that the "le" values SHOULD NOT change frequently between scrapes. (Maybe that's somewhere implied where I didn't catch it. And it's definitely not a new thing in OMv2. I might just notice it now where we have the contrast to native histograms, where schema changes within the standard exponential schemas are "just fine", while all hell breaks loose whenever you change the "le" values with a classic histogram.)

I'm not sure that's in scope here. This isn't something we control in the exposition or indeed in the instrumentation. I mean in contrast, the instrumentation could flat out refuse to observe NaN. But how do ensure le doesn't change?

It's not about ensuring it. It MAY change. But if you expose a different bucketing schema each scrape, you won't get any meaningful results. I guess everyone has taken that for granted, but maybe it should be mentioned somewhere.

We discussed in DM that considering all the other things, maybe it's worth mentioning somehow.

Maybe say that implementation libraries should discourage users from frequently changing the Classic Bucket thresholds as it leads to possibly invalidating queries and making it impossible to execute some time aggregations.

hmm?

beorn7

Some more comments.

Sorry for adding some comments outside of a review.

I also assume that non-outdated non-resolved comments are still on your radar to deal with later.

docs/specs/om/open_metrics_spec_2_0.md

beorn7 · 2025-06-24T20:25:23Z

docs/specs/om/open_metrics_spec_2_0.md

+
+If the NaN value is not allowed, then the Count value MUST be equal to the sum of the negative, positive and zero native buckets.
+
+If the NaN value is allowed, it MUST NOT be counted in any bucket and the Count MUST be greater than the sum of the negative, positive and zero native buckets. The ratonale is that NaN does not belong to any bucket mathematically, and instrumentation libraries traditionally don't put it into any bucket.


Suggested change

If the NaN value is allowed, it MUST NOT be counted in any bucket and the Count MUST be greater than the sum of the negative, positive and zero native buckets. The ratonale is that NaN does not belong to any bucket mathematically, and instrumentation libraries traditionally don't put it into any bucket.

If the NaN value is allowed, it MUST NOT be counted in any bucket and the Count MUST be greater than the sum of the negative, positive and zero native buckets. The rationale is that NaN does not belong to any bucket mathematically, and instrumentation libraries traditionally don't put it into any bucket.

Beyond typos: Not sure if we can talk about a tradition already. Not counting NaN observations in any bucket was part of the nascent spec pretty early. So it didn't just happen out of tradition.

"Count MUST be greater than the sum..." → that's only the case if there are NaN observations in the histogram. Maybe use a wording that implies that the difference between Count and the sum of all buckets MUST be equal to the number of NaN observations.

Took out traditionally and reworded with the diff. I do cover the case with NaNs and without separately.

Signed-off-by: György Krajcsovits <[email protected]>

krajorama added 4 commits April 24, 2025 17:04

feat(om2): add native histograms to OpenMetrics2.0

ddbace4

Signed-off-by: György Krajcsovits <[email protected]>

define the counter histogram model

2f4113f

Signed-off-by: György Krajcsovits <[email protected]>

small fixes to histogram model

a47ce0c

Signed-off-by: György Krajcsovits <[email protected]>

update gauge histogram model

10dc43d

Signed-off-by: György Krajcsovits <[email protected]>

krajorama commented Apr 30, 2025

View reviewed changes

wip: wip

d12dfeb

Signed-off-by: György Krajcsovits <[email protected]>

krajorama mentioned this pull request May 8, 2025

histograms: Implement final OM text format prometheus/prometheus#11265

Open

krajorama added 4 commits May 22, 2025 10:05

adjust wording

471a7dd

Signed-off-by: György Krajcsovits <[email protected]>

add abnf and presentation

ec991a4

Signed-off-by: György Krajcsovits <[email protected]>

updates

dc308e2

Add semantic conventions about where complex types may occure. Allow empty spans and deltas. Be more precise. Signed-off-by: György Krajcsovits <[email protected]>

Add gauge histogram syntax

57c6ab8

Signed-off-by: György Krajcsovits <[email protected]>

ArthurSens reviewed May 24, 2025

View reviewed changes

beorn7 self-requested a review May 27, 2025 11:26

krajorama added 2 commits May 27, 2025 15:53

clarify NaN and Inf and be permissive

32a65cf

Signed-off-by: György Krajcsovits <[email protected]>

fix from Arthur's comments

6968e00

Signed-off-by: György Krajcsovits <[email protected]>

krajorama marked this pull request as ready for review May 27, 2025 14:01

Merge branch 'main' into krajo/om2.0-native-histograms

d95eed7

This comment was marked as outdated.

Sign in to view

krajorama commented Jun 3, 2025

View reviewed changes

docs/specs/om/open_metrics_spec_2_0.md Outdated Show resolved Hide resolved

krajorama mentioned this pull request Jun 4, 2025

OM text exposition for NH prometheus/client_python#1087

Open

krajorama added 2 commits June 11, 2025 09:03

Update with exemplars

7965787

Allow multiple exemplars for complex types, i.e. native histograms. But require that the timestamp is present. Signed-off-by: György Krajcsovits <[email protected]>

krajorama force-pushed the krajo/om2.0-native-histograms branch from c16426b to 7965787 Compare June 11, 2025 07:27

Merge branch 'main' into krajo/om2.0-native-histograms

0a944c4

krajorama mentioned this pull request Jun 12, 2025

OM 2.0: Native Histogram Support in Text format. prometheus/OpenMetrics#279

Open

beorn7 requested changes Jun 12, 2025

View reviewed changes

krajorama added 3 commits June 13, 2025 14:37

fix schema number allocation

52bd9e9

Signed-off-by: György Krajcsovits <[email protected]>

Only prohibit "le" label if there are classic buckets

2b0fdcc

Signed-off-by: György Krajcsovits <[email protected]>

Make the requirement around NaN stronger in classic case.

1dea11f

Signed-off-by: György Krajcsovits <[email protected]>

krajorama added 6 commits June 13, 2025 15:03

Fix classic bucket choice for exemplar

986380d

Signed-off-by: György Krajcsovits <[email protected]>

Use native buckets instead of exponential buckets

3b9e078

Signed-off-by: György Krajcsovits <[email protected]>

Define choice of classic vs native in terms of measurement

bd82e1e

Signed-off-by: György Krajcsovits <[email protected]>

Require that all classic buckets are exposed in the text format

cbfe910

even unpopulated ones. Signed-off-by: György Krajcsovits <[email protected]>

Specify that empty native buckets should not be present or exposed

16b4c50

Empty may be exposed for optimizing spans. Signed-off-by: György Krajcsovits <[email protected]>

Mention reset when bucket count is too high

ca1d860

Signed-off-by: György Krajcsovits <[email protected]>

krajorama commented Jun 13, 2025

View reviewed changes

docs/specs/om/open_metrics_spec_2_0.md Outdated Show resolved Hide resolved

krajorama added 3 commits June 16, 2025 15:00

fix comment on "sum" in ABNF

d65a48e

Signed-off-by: György Krajcsovits <[email protected]>

beorn7 reviewed Jun 24, 2025

View reviewed changes

beorn7 requested changes Jun 24, 2025

View reviewed changes

krajorama added 5 commits June 26, 2025 09:37

sync gauge histogram to histogram in metric types

9d02b5b

Signed-off-by: György Krajcsovits <[email protected]>

Fix missing Gsum to Sum mapping

e972b87

Signed-off-by: György Krajcsovits <[email protected]>

Merge branch 'main' into krajo/om2.0-native-histograms

ae5c995

Merge main

abdeb9d

Signed-off-by: György Krajcsovits <[email protected]>

follow up rename of created value

08f0e67

Signed-off-by: György Krajcsovits <[email protected]>


		Numbers MUST be either floating points or integers. Note that ingestors of the format MAY only support float64. The non-real values NaN, +Inf and -Inf MUST be supported. NaN MUST NOT be considered a missing value, but it MAY be used to signal a division by zero.

		Complex data types MUST contain all information necessary to recreate a Metric Type, with the exception of Created time and Exemplars.


		Histograms with exponential buckets use the integer native histogram data type.

		The integer native histogram data type is a JSON like structure with fields. There MUST NOT be any whitespace around fields.


		Exponential bucket values MUST be ordered by their index, and their values MUST be placed in the `negative_deltas` (and/or `positive_deltas`) field using delta encoding, that is the first bucket value is written as is and the following values only as a delta relative to the previous value. For example bucket values 1, 5, 4, 4 will become 1, 4, -1, 0.

		To map the `negative_deltas` (and/or `positive_deltas`) back to their indices, the `negative_spans` (and/or `positive_spans`) field MUST be constructed in the following way: each span consists of a pair of numbers, an integer called offset and an non-negative integer called length. Only the first span in each list can have a negative offset. It defines the index of the first bucket in its corresponding `negative_deltas` (and/or `positive_deltas`). The length defines the number of consecutive buckets the bucket list starts with. The offsets of the following spans define the number of excluded (and thus unpopulated buckets). The lengths define the number of consecutive buckets in the list following the excluded buckets.


		If the NaN value is not allowed, then the Count value MUST be equal to the sum of the negative, positive and zero native buckets.

		If the NaN value is allowed, it MUST NOT be counted in any bucket and the Count MUST be greater than the sum of the negative, positive and zero native buckets. The ratonale is that NaN does not belong to any bucket mathematically, and instrumentation libraries traditionally don't put it into any bucket.

feat(om2): add native histograms to OpenMetrics2.0 #2634

Are you sure you want to change the base?

feat(om2): add native histograms to OpenMetrics2.0 #2634

Uh oh!

Conversation

krajorama commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Changes

Open questions / decisions

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krajorama commented May 6, 2025

Uh oh!

ArthurSens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

beorn7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krajorama Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beorn7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

krajorama commented Apr 24, 2025 •

edited

Loading

krajorama Jun 26, 2025 •

edited

Loading