-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Prometheus - kubernetes-service-endpoints down #7986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The router stuff isn't set up correctly. It looks like it will be working in the next release. See openshift/origin#19318 The logging elasticsearch one seems like it should be working, but there are some changes that would need to be made. First of all, the kubernetes-service-endpoints don't do auth. You would need to add something like |
@pat2man Any progress in your rbac problem? For me the logging metric endpoint works with "bearer_token_file" in prometheus.yml configuration file. But I don't know, how to solve a quite similar error for haproxy.
With my user cluster-admin token the haproxy metric endpoint works as expected. Also when I give the prometheus serviceuser the same rights. Has anyone some ideas for me? EDIT:
|
@Reamer if you take a look at the latest prometheus example it has a prometheus-scraper cluster role: https://github.com/openshift/origin/blob/master/examples/prometheus/prometheus.yaml |
The However, there is another issue, the path Or, if the goal is to have Prometheus scrape ES nodes via the proxy then only |
To make it more clear, this is what I am talking about: I had a quick discussion about this with @jcantrill and he had an idea of creating a new extra Prometheus rules just for logging. He has a PR for this also https://github.com/openshift/origin/pull/18796/files I will try to push his approach further. |
@lukas-vlcek you can annotate the service with prometheus.io/port to restrict it to the correct port |
@pat2man right, there is PR to fix that: #8432 |
FYI I've submitted #8512 to get the router's metrics back. |
Can anyone explain how to fix in existing (running) 3.9 ? |
@prasenforu I think you can get around it by fixing a couple of permissions. At least it worked for me using First, create the following cluster role: apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-scraper
rules:
- apiGroups:
- route.openshift.io
resources:
- routers/metrics
verbs:
- get
- apiGroups:
- image.openshift.io
resources:
- registry/metrics
verbs:
- get Then assign this cluser role to the |
Let me try,
|
@prasenforu I used:
Not sure if it makes a difference. |
I am not using single node cluster. Is there any difference ? even no firewall protection. Even from prometheus container also I am able to fetch metrics. Prometheus Config:
|
@prasenforu right so I've checked again and IIUC what's missing is that the Can you check whether this command succeeds after you've added the permissions? oc rsh po/prometheus-0 sh -c 'curl -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" http://router.default.svc:1936/metrics' If it does, I'd recommend adding another scrape configuration specifically for the router (loosely adapted from openshift/origin#18254): - job_name: 'openshift-router'
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- default
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;router;1936-tcp And modify the - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: drop
regex: default;router;1936-tcp |
I created new scape (haproxy)
And modified the kubernetes-service-endpoints job to drop the router endpoints by adding the following rule to the relabel_configs section:
But still red error in "kubernetes-service-endpoints" and as a result I am getting mail alert as alert is configured in prometheus. |
@prasenforu can you double-check the configuration of the |
Here is the configuration from Prometheus UI Console
|
@prasenforu please try removing the - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
action: drop
regex: default;router |
Yes, now it works.. Thanks for your valuable continuous support 👍 |
Coming back again on this issue. Looks like its not AUTO discovering services endpoints. Recently I add RabbitMQ with attachment MQ-exporter. I can see all metrics are exposed but not visible in prometheus console. After I added another scrape similar like haproxy (router) then all metrics are visible. |
The |
All three ports were being scraped by Prometheus but only one worked. Related to openshift#7986
All three ports were being scraped by Prometheus but only one worked. Related to openshift#7986
All three ports were being scraped by Prometheus but only one worked. Related to openshift#7986
Hi Simon, Coming back again! hope you are doing well. We are trying to enable Openshift Registry Metric. Every thing we have done in container side and I am able to get metric inside the docker-registry.
Curl command from docker registry container:
But when I was trying setup as Job in Prometheus, facing authentication error.
Error as follows ... |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Uh oh!
There was an error while loading. Please reload this page.
Description
I have installed openshift 3.9 using the playbooks (deploy_cluster.yml) with the test repo RPMs. I have also enabled the metrics and prometheus.
After the installation is done i looked at the prometheus "targets" page and most of the "kubernetes-service-endpoints" are down.
I looked particularly at the kubernetes_name="logging-es-prometheus" endpoint and i can see the following messages in the proxy container inside the logging-es-data-master-xxxx pod:
I also noticed that Endpoint URLs are kind of weird too:
https://90.49.0.11:9300_prometheus/metrics
Any hints on what might be wrong?
UPDATE:
I tried curl to the first endpoint (the "router" one) and i get similar error:
Just found the defect for the ROUTER endpoint:
openshift/origin#17685
The text was updated successfully, but these errors were encountered: