Skip to content

Metrics improvements #1472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
morhidi opened this issue Sep 16, 2022 · 6 comments · Fixed by #1645 or #1649
Closed

Metrics improvements #1472

morhidi opened this issue Sep 16, 2022 · 6 comments · Fixed by #1645 or #1649
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@morhidi
Copy link

morhidi commented Sep 16, 2022

Hi team,

we've successfully implemented and using the existing Metrics interface from JOSDK in the Flink Operator. Great job!

We'd be happy to see a few more metrics and improvements around the following areas:

  • Exposure of the complete CR in the Metrics interface
  • CR delay (time between the CR created and first detected by an operator)
  • Kubernetes API access (basically HTTP request/response metrics, thus this might be a better fit for fabric8 client, having an http client independent interceptor would be awesome)
  • A few other metrics about JOSDK internals (queue size, thread count, informer count, etc.)

Let me know what you think.

Thanks,
Matyas

@csviri csviri added this to the 4.2 milestone Sep 16, 2022
@metacosm
Copy link
Collaborator

All of this sounds reasonable. 😄

@metacosm
Copy link
Collaborator

@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 16, 2022
@csviri csviri added kind/feature Categorizes issue or PR as related to a new feature. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 16, 2022
@csviri csviri self-assigned this Dec 2, 2022
@csviri csviri linked a pull request Dec 2, 2022 that will close this issue
@csviri
Copy link
Collaborator

csviri commented Dec 5, 2022

Just some thoughts:
CR delay (time between the CR created and first detected by an operator) : This is a little problematic since, we don't know if a resource was reconciled before or not if an operator starts. Also when an operator is not running and CR is created might lead to very high values in the metrics.

It would be possible to do a metric to measure resource event received and first processed time delay. But event this might be an issue if there is already a reconciliation running - that might also complicate things.

So queue since and thread count might be a simpler indicator to measure the pressure on the operator I guess.

@csviri csviri linked a pull request Dec 5, 2022 that will close this issue
@csviri
Copy link
Collaborator

csviri commented Dec 12, 2022

will close this issue, added then mentioned metrics, expect the one I commented above, and informer count (since it might not trivial to add in the current architecture, also it's fairly simple, in general if the namespaces are not changing dynamically are static values). In case @morhidi you think it still makes sense pls create a separate issue for that.

@csviri csviri closed this as completed Dec 12, 2022
@morhidi
Copy link
Author

morhidi commented Dec 14, 2022

Thanks @csviri

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
3 participants