Skip to content

feat(om2): add native histograms to OpenMetrics2.0 #2634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

krajorama
Copy link
Member

@krajorama krajorama commented Apr 24, 2025

Background

Based on https://github.com/prometheus/proposals/blob/main/proposals/2024-01-29_native_histograms_text_format.md
And OpenMetrics 2.0 WG discussions.

Changes

  • Allow structured complex types marked by "{" and "}" in the specification.
  • Allow multiple exemplars per complex type value.
  • Require that exemplars for complex type values have the timestamp. (1)
  • Be permissive about observing NaN , +Inf, -Inf. Discourage observing NaN.
  • Split histogram into ones with classic and native buckets.
  • For classic buckets, define behavior when observing NaN.
  • Define the native buckets and also how NaN is handled.
  • Define the text format of native histograms and also their exemplars.

(1) Not doing this on account of being an implementation limitation and also contradicts having backwards compatibility when exposing classic histograms as complex type.

Open questions / decisions

See OpenMetrics2.0 WG meeting notes tab

Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>

Numbers MUST be either floating points or integers. Note that ingestors of the format MAY only support float64. The non-real values NaN, +Inf and -Inf MUST be supported. NaN MUST NOT be considered a missing value, but it MAY be used to signal a division by zero.

Complex data types MUST contain all information necessary to recreate a Metric Type, with the exception of Created time and Exemplars.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assume we'll have the created timestamp separate from the JSON like data. prometheus/OpenMetrics#285

Signed-off-by: György Krajcsovits <[email protected]>
@krajorama
Copy link
Member Author

Note self: add details how summaries and classic histograms one liners (so not NHCB spans/deltas) fit into it.

krajorama added 4 commits May 22, 2025 10:05
Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
Add semantic conventions about where complex types may occure.
Allow empty spans and deltas.
Be more precise.

Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some minutes to read the PR. Couldn't read everything though, I'll do a more complete review another day!

Reviewed from mobile

@beorn7 beorn7 self-requested a review May 27, 2025 11:26
krajorama added 2 commits May 27, 2025 15:53
Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
@krajorama krajorama marked this pull request as ready for review May 27, 2025 14:01
@krajorama

This comment was marked as outdated.

I did not want to clutter the examples with adding "# EOF" to all,
so I made a new marker to explicitly add it on request.

There was only one faulty example where we had an extra space, see
the Exemplars section.

Signed-off-by: György Krajcsovits <[email protected]>
Allow multiple exemplars for complex types, i.e. native histograms.
But require that the timestamp is present.

Signed-off-by: György Krajcsovits <[email protected]>
@krajorama krajorama force-pushed the krajo/om2.0-native-histograms branch from c16426b to 7965787 Compare June 11, 2025 07:27
Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing all of this.

Mostly commented about the histogram aspect (but I couldn't resist and referred to OM specific things now and then).


Histograms with exponential buckets use the integer native histogram data type.

The integer native histogram data type is a JSON like structure with fields. There MUST NOT be any whitespace around fields.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really JSON. Many language use vaguely this kind of structure. I wouldn't call out JSON because it creates the expectation that other JSON syntax rules also apply.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There MUST NOT be any whitespace around fields.

Has it already been decided that OMv2 will stay whitespace intolerant?

It's one of my big big red flags with OMv1 that it doesn't tolerate whitespace like the classic Prometheus text format. If OMv2 will see the light and allow whitespace again, then of course we should allow whitespace here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not discussed this in OM2.0 explicitly. Comes from https://github.com/prometheus/proposals/blob/main/proposals/2024-01-29_native_histograms_text_format.md.

However we did say that since native histograms use a structure that's essentially internal, we kind of already given up on being very human friendly. I still keep the count and sum fields up front because that's easy to understand and relates to PromQL.

A consequence is that I think we'll have to give some tooling (like promtool) for people to pretty print the output and also instrumentations may allow pretty printing themselves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more shooting for the high-level consistency: If OMv2 will also require precise usage (and non-usage) of whitespace (like OMv1), then we should of course do the same inside the native histogram representation.

However, if OMv2 will be liberal WRT whitespace again (which I hope), then we should not be more restrictive inside the native histogram representation.

(I don't want to open the whitespace discussion here in this comment. Here I'm only proposing to use the same rule (either liberal whitespace use or precisely ruled whitespace placement) inside and outside of native histograms.)

Copy link
Member Author

@krajorama krajorama Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to defer the space (SP) question to a separate PR.


Exponential bucket values MUST be ordered by their index, and their values MUST be placed in the `negative_deltas` (and/or `positive_deltas`) field using delta encoding, that is the first bucket value is written as is and the following values only as a delta relative to the previous value. For example bucket values 1, 5, 4, 4 will become 1, 4, -1, 0.

To map the `negative_deltas` (and/or `positive_deltas`) back to their indices, the `negative_spans` (and/or `positive_spans`) field MUST be constructed in the following way: each span consists of a pair of numbers, an integer called offset and an non-negative integer called length. Only the first span in each list can have a negative offset. It defines the index of the first bucket in its corresponding `negative_deltas` (and/or `positive_deltas`). The length defines the number of consecutive buckets the bucket list starts with. The offsets of the following spans define the number of excluded (and thus unpopulated buckets). The lengths define the number of consecutive buckets in the list following the excluded buckets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize after :: each → Each

Signed-off-by: György Krajcsovits <[email protected]>
even unpopulated ones.

Signed-off-by: György Krajcsovits <[email protected]>
Empty may be exposed for optimizing spans.

Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
Rename negative_deltas and positive_deltas to negative_buckets and
positive_buckets.

Distinguish integer and float histograms by the number format used in them.

If the number has dot "." or exponent "e", then the number is a float.
Otherwise integer.

Signed-off-by: György Krajcsovits <[email protected]>
The timestamp requirement came from current limitations in Prometheus
exemplar storage. However in the future this might be solved.

Also there's the case when a classic histogram exposes exemplars.

Signed-off-by: György Krajcsovits <[email protected]>
Buckets MUST be sorted in number increasing order of "le", and the value of the "le" label MUST follow the rules for Canonical Numbers.
Classic Buckets MUST be sorted in number increasing order of "le", and the value of the "le" label MUST follow the rules for Canonical Numbers.

All Classic Buckets must be represented, even the unpopulated ones that didn't measure any values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure if that nails it. As Classic Buckets are cumulative, there aren't really "unpopulated buckets" - except at the lower end.

Another aspect is that we advertise the benefit of cumulative buckets that individual buckets can be removed from the histogram to reduce resolution without the need of updating other buckets.

What I was shooting for in my other comment is that the "le" values SHOULD NOT change frequently between scrapes. (Maybe that's somewhere implied where I didn't catch it. And it's definitely not a new thing in OMv2. I might just notice it now where we have the contrast to native histograms, where schema changes within the standard exponential schemas are "just fine", while all hell breaks loose whenever you change the "le" values with a classic histogram.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was shooting for in my other comment is that the "le" values SHOULD NOT change frequently between scrapes. (Maybe that's somewhere implied where I didn't catch it. And it's definitely not a new thing in OMv2. I might just notice it now where we have the contrast to native histograms, where schema changes within the standard exponential schemas are "just fine", while all hell breaks loose whenever you change the "le" values with a classic histogram.)

I'm not sure that's in scope here. This isn't something we control in the exposition or indeed in the instrumentation. I mean in contrast, the instrumentation could flat out refuse to observe NaN. But how do ensure le doesn't change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about ensuring it. It MAY change. But if you expose a different bucketing schema each scrape, you won't get any meaningful results. I guess everyone has taken that for granted, but maybe it should be mentioned somewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed in DM that considering all the other things, maybe it's worth mentioning somehow.

Maybe say that implementation libraries should discourage users from frequently changing the Classic Bucket thresholds as it leads to possibly invalidating queries and making it impossible to execute some time aggregations.

hmm?

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments.

Sorry for adding some comments outside of a review.

I also assume that non-outdated non-resolved comments are still on your radar to deal with later.


If the NaN value is not allowed, then the Count value MUST be equal to the sum of the negative, positive and zero native buckets.

If the NaN value is allowed, it MUST NOT be counted in any bucket and the Count MUST be greater than the sum of the negative, positive and zero native buckets. The ratonale is that NaN does not belong to any bucket mathematically, and instrumentation libraries traditionally don't put it into any bucket.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If the NaN value is allowed, it MUST NOT be counted in any bucket and the Count MUST be greater than the sum of the negative, positive and zero native buckets. The ratonale is that NaN does not belong to any bucket mathematically, and instrumentation libraries traditionally don't put it into any bucket.
If the NaN value is allowed, it MUST NOT be counted in any bucket and the Count MUST be greater than the sum of the negative, positive and zero native buckets. The rationale is that NaN does not belong to any bucket mathematically, and instrumentation libraries traditionally don't put it into any bucket.

Beyond typos: Not sure if we can talk about a tradition already. Not counting NaN observations in any bucket was part of the nascent spec pretty early. So it didn't just happen out of tradition.

"Count MUST be greater than the sum..." → that's only the case if there are NaN observations in the histogram. Maybe use a wording that implies that the difference between Count and the sum of all buckets MUST be equal to the number of NaN observations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took out traditionally and reworded with the diff. I do cover the case with NaNs and without separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants