从ingress-nginx官方代码中的expoter迁移出来 用来监控虚拟机上的nginx的expoter
基于官方 controller-v0.49.3 版本移植的代码
nginx_socket 通过lua模块monitor.lua 将nginx log 以json格式发到 /tmp/prometheus-nginx.socket, nginx_exporter 通过这个socket获得数据并组装成metrics。
nginx_process 通过采集 /proc/PID/ 目录下面的数据,监控cpu、memory、IO ;
nginx_status 通过 http_stub_status_module 模块采集nginx的连接数据,nginx编译时要加上 --with-http_stub_status_module; 配置文件要加上
#监控使用端口 server { listen 8021; location /stub_status{ stub_status on; access_log off; allow 127.0.0.1; # deny all; } } nginx_certificate 证书监控模块,待完成
1) nginx 必须要编译有lua模块
2) lua 必须要有 cjson 模块
yum install gcc -y cd /usr/local/src/ wget --no-check-certificate https://luajit.org/download/LuaJIT-2.0.5.zip unzip LuaJIT-2.0.5.zip cd LuaJIT-2.0.5/ make install PREFIX=/usr/local/luajit cd /usr/local/src/ wget --no-check-certificate https://kyne.com.au/~mark/software/download/lua-cjson-2.1.0.zip unzip lua-cjson-2.1.0.zip cd lua-cjson-2.1.0/ # 这里要修改makefile文件,不然编译报错 sed -i 's#^LUA_INCLUDE_DIR = .*#LUA_INCLUDE_DIR = /usr/local/src/LuaJIT-2.0.5/src#' Makefile make && make install -
将Lua脚本copy到 /data/nginx/lua 目录(这个目录可以自己定义,和nginx配置文件一致就行);
-
修改nginx的http模块配置,新增如下配置
http { # lua脚本的目录路径 lua_package_path "/data/nginx/lua/?.lua;;"; init_by_lua_block { collectgarbage("collect") -- init modules local ok, res ok, res = pcall(require, "monitor") if not ok then error("require failed: " .. tostring(res)) else monitor = res end ok, res = pcall(require, "plugins") if not ok then error("require failed: " .. tostring(res)) else plugins = res end -- load all plugins that'll be used here plugins.init({ }) } init_worker_by_lua_block { monitor.init_worker(10000) plugins.run() } log_by_lua_block { monitor.call() plugins.run() } ...... } - 启动nginx_exproter
# 编译ngx_exporter git clone https://gitee.com/xianglinzeng/nginx_exporter.git cd nginx_exporte go mod tidy go build -o nginx_exporter # nginx_exporter 参数 # -port 指定启动端口,默认9123端口 # -v 指定日志级别 1 2 3 4 5 越高日志越详细,默认是2,不指定也行,调试使用5 # -statuspath string # http_stub_status_module 模块的监听路径,默认/stub_status (default "/stub_status") # -statusport string # http_stub_status_module 模块的监听端口,默认8021 (default "8021") ./nginx_exporter -port=9999 -v=5 # 使用systemd管理 mkdir /opt/nginx_exporter cp nginx_exporter /opt/nginx_exporter cat <<EOF > /usr/lib/systemd/system/nginx_exporter.service [Unit] Description=nginx_exporter After=network.target [Service] User=root Group=root Type=simple ExecStart=/opt/nginx_exporter/nginx_exporter -port=9123 -v=2 Restart=on-failure [Install] WantedBy=multi-user.target EOF systemctl daemon-reload systemctl daemon-reload systemctl restart nginx_exporter systemctl status nginx_exporter.service systemctl enable nginx_exporter curl localhost:9123/metrics- 使用ServiceMonitor添加到prometheus-opertor
--- apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: prometheus: k8s k8s-apps: nginx-exporter name: nginx-exporter-sm namespace: monitoring spec: endpoints: - port: metrics interval: 10s Scheme: http path: /metrics jobLabel: k8s-app selector: matchLabels: metrics: nginx-exporter --- apiVersion: v1 kind: Service metadata: labels: # ServiceMonitor 自动发现的关键label metrics: nginx-exporter name: nginx-exporter namespace: monitoring spec: ports: - name: metrics #对应 ServiceMonitor 中spec.endpoints.port port: 9123 targetPort: 9123 --- apiVersion: v1 kind: Endpoints metadata: name: nginx-exporter namespace: monitoring labels: metrics: nginx-exporter subsets: - addresses: - ip: 172.20.4.117 ports: - name: metrics port: 9123 protocol: TCP | 报警名称 | 表达式 | 采集数据时间(分钟) | 报警触发条件 |
|---|---|---|---|
| NginxHighHttp4xxErrorRate | sum(rate(nginx_http_requests_total{status=~"^4.."}[1m])) / sum(rate(nginx_http_requests_total[1m])) * 100 > 5 | 5 | HTTP 4xx错误率过高。 |
| NginxHighHttp5xxErrorRate | sum(rate(nginx_http_requests_total{status=~"^5.."}[1m])) / sum(rate(nginx_http_requests_total[1m])) * 100 > 5 | 5 | HTTP 5xx错误率过高。 |
| NginxLatencyHigh | histogram_quantile(0.99, sum(rate(nginx_http_request_duration_seconds_bucket[10m])) by (host, node)) > 10 | 5 | 延迟过高。 |
grafana的dashboard在grafana目录,预览如下:


