I have a docker swarm running our business stack defined in a docker-compose.yml on two servers (nodes). The docker-compose has defined cAdvisor starting on each of the two nodes like that:
cadvisor: image: gcr.io/google-containers/cadvisor:latest command: "--logtostderr --housekeeping_interval=30s" volumes: - /var/run/docker.sock:/var/run/docker.sock:ro - /:/rootfs:ro - /var/run:/var/run - /sys:/sys:ro - /var/lib/docker/:/var/lib/docker:ro - /dev/disk:/dev/disk/:ro ports: - "9338:8080" deploy: mode: global resources: limits: memory: 128M reservations: memory: 64M On a third server I run a docker separately from the docker swarm on node 1 and 2 and this server is used to run Prometheus and Grafana. Prometheus is configured to scrape only the node1:9338 resource to get the cAdvisor information.
I occasionally get the problem that when scraping node1:9338 not all containers running on both nodes 1 and 2 are shown in the cAdvisor statistics.
I was assuming that cAdvisor is synching its information in the swarm so that I'm able to configure Prometheus to only use node1:9338 as entrypoint into the docker swarm and scraping the information.
Or do I have to also put node2:9338 into my Prometheus configuration to always get all information of all nodes? If yes, how should this scale then because I would need to add each new node to the Prometheus config.
Running Prometheus together with the business stack in one swarm is no option.
edit: I experienced today when opening the cAdvisor metrics URL http://node1:9338/metrics as well as http://node2:9338/metrics a strange behaviour as I see the same information of all containers running on node1 on both URLs. The information of the containers running on node2 are missing when requesting http://node2:9338/metrics.
Could it be that the docker-internal load balancing is routing the request from http://node2:9338/metrics to the node1:9338 cAdvisor so the metrics of node1 are shown despite node2 is requested?