服务公告
Zabbix - Zabbix故障排查 完全指南
发布时间:2026-05-03 20:02
Zabbix故障排查:Server无法启动、监控数据丢失、Agent连接异常的实战解决方案
本文聚焦Zabbix运维中最头疼的三大场景:Server启动失败、监控数据中断、Agent无法连接。通过日志分析、配置检查、权限修复等步骤,手把手带你定位根因并修复。
一、前言
搞过Zabbix的人都知道,这玩意儿平时没事,一旦出问题就是连环炸:Server起不来、监控项全是红、数据收不上来。作为干了10年的老兵,这次把高频故障排查路径扒干净,让你遇到问题不再抓瞎。
二、操作步骤
步骤1:检查Zabbix Server服务状态
服务起不来是最常见的问题,先用systemctl或service确认状态:
# CentOS/RHEL
systemctl status zabbix-server
# Ubuntu/Debian
systemctl status zabbix-server预期输出(正常状态):
● zabbix-server.service - Zabbix Server
Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2024-01-15 10:30:00 CST; 2h 30min ago
Main PID: 12345 (zabbix_server)
CGroup: /system.slice/zabbix-server.service
└─12345 /usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf预期输出(异常状态,常见):
● zabbix-server.service - Zabbix Server
Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2024-01-15 10:30:00 CST; 5s ago
Process: 12344 ExecStart=/usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf (code=exited, status=1/FAILURE)如果服务failed,直接看journal日志定位原因:
journalctl -u zabbix-server -n 50 --no-pager步骤2:检查Zabbix日志文件
日志是排查的黄金入口,先找到日志文件路径:
# CentOS/RHEL 配置文件默认路径
cat /etc/zabbix/zabbix_server.conf | grep -i log
# Ubuntu 配置文件默认路径
cat /etc/zabbix/zabbix_server.conf | grep -i log预期输出:
LogFile=/var/log/zabbix/zabbix_server.log查看最近50行关键日志:
tail -n 50 /var/log/zabbix/zabbix_server.log预期输出(数据库连接失败的典型错误):
2024-01-15 10:30:15.123 45678:0x7f1234567890 **ERROR** [Z3001] connection to database 'zabbix' failed: [1045] Access denied for user 'zabbix'@'localhost' (using password: YES)
2024-01-15 10:30:15.124 45678:0x7f1234567890 **ERROR** [Z3005] database is down: reconnection attempt 1 of 3预期输出(配置错误的典型错误):
2024-01-15 10:30:15.234 45678:0x7f1234567890 **ERROR** [Z3001] cannot parse configuration file "/etc/zabbix/zabbix_server.conf" [2] No such file or directory步骤3:检查数据库连接
日志报数据库错误是高频问题,先手动测试MySQL连接:
# 测试MySQL连接
mysql -u zabbix -p -h localhost zabbix
# 输入密码后,测试简单查询
show databases;
use zabbix;
select count(*) from hosts;预期输出(正常):
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 12345
Server version: 8.0.35 MySQL Community Server
mysql> select count(*) from hosts;
+----------+
| count(*) |
+----------+
| 5 |
+----------+
1 row in set (0.01 sec)预期输出(连接失败):
ERROR 1045 (28000): Access denied for user 'zabbix'@'localhost' (using password: YES)连接失败时重置密码(生产环境慎用):
# 登录MySQL root
mysql -u root -p
# 重置zabbix用户密码
ALTER USER 'zabbix'@'localhost' IDENTIFIED BY 'YOUR_NEW_PASSWORD';步骤4:检查端口监听状态
Server起来了但外部连不上,先确认端口是否监听:
# 检查Zabbix Server端口(默认10051)
netstat -tlnp | grep 10051
# 如果netstat没有,用ss命令
ss -tlnp | grep 10051预期输出(正常):
tcp 0 0 0.0.0.0:10051 0.0.0.0:* LISTEN 12345/zabbix_server
tcp6 0 0 :::10051 :::* LISTEN 12345/zabbix_server预期输出(未监听,问题):
# 无输出表示端口未监听如果端口未监听,检查配置文件ListenIP参数:
cat /etc/zabbix/zabbix_server.conf | grep -E "^ListenPort|^ListenIP"预期输出:
ListenPort=10051
# ListenIP默认注释,表示监听所有IP步骤5:排查Zabbix Agent连接问题
Agent无法连接是最常见的客户端问题,先在Agent端检查服务状态:
# CentOS/RHEL
systemctl status zabbix-agent
# Ubuntu
systemctl status zabbix-agent预期输出(正常状态):
● zabbix-agent.service - Zabbix Agent
Loaded: loaded (/usr/lib/systemd/system/zabbix-agent.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2024-01-15 10:30:00 CST; 2h 30min ago
Main PID: 23456 (zabbix_agentd)
CGroup: /system.slice/zabbix-agent.service
└─23456 /usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf测试Agent端本地key是否正常:
# 测试agent的自定义key
zabbix_agentd -t agent.ping
# 测试系统key
zabbix_agentd -t system.uname预期输出(正常):
agent.ping [u|1]
system.uname [u|Linux server01 5.4.0--generic #1 SMP x86_64]预期输出(配置错误):
ZBX_ERROR: cannot parse item key [agent.ping]从Server端测试被动检查(Zabbix Server机器上执行):
# 使用zabb_get工具测试
zabbix_get -s 192.168.1.100 -k agent.ping
# 如果没有zabbix_get,先安装
# CentOS/RHEL: yum install zabbix-get
# Ubuntu: apt install zabbix-get预期输出(正常):
1预期输出(连接失败):
zabbix_get [45678]: Check access restriction in Zabbix agent configuration
这种错误说明Agent配置里Server地址限制导致Server无法连接,检查Agent配置:
cat /etc/zabbix/zabbix_agentd.conf | grep -E "^Server=|^ServerActive=|^Hostname="预期输出:
Server=192.168.1.10,192.168.1.11
ServerActive=192.168.1.10:10051
Hostname=server-test-01如果Server端IP不在Server列表里,需要添加。编辑文件后重启Agent:
# CentOS/RHEL
systemctl restart zabbix-agent
# Ubuntu
systemctl restart zabbix-agent步骤6:检查SELinux和防火墙配置
配置都对了但就是连不通,多数是SELinux或防火墙在搞鬼:
# 检查SELinux状态
getenforce
# 查看Zabbix相关SELinux布尔值
getsebool -a | grep zabbix预期输出:
Enforcing
zabbix_agent_can_network --> off
zabbix_server_can_network --> off如果SELinux是Enforcing状态且布尔值是off,需要开启:
# 开启Zabbix相关权限
setsebool -P zabbix_agent_can_network on
setsebool -P zabbix_server_can_network on
# 验证
getsebool -a | grep zabbix预期输出:
zabbix_agent_can_network --> on
zabbix_server_can_network --> on检查防火墙规则(CentOS/RHEL 7及以上用firewalld):
# 查看已开放的端口
firewall-cmd --list-ports
# 添加Zabbix相关端口
firewall-cmd --permanent --add-port=10051/tcp
firewall-cmd --permanent --add-port=10050/tcp
# 重载防火墙
firewall-cmd --reload预期输出:
success
success
success步骤7:修复Zabbix Web界面报错的配置问题
Web界面有时会报"Zabbix server is not running"错误,虽然Server进程在跑:
# 检查Zabbix Server的PHP前端通信
# 登录MySQL检查Server的availability_status
mysql -u zabbix -p -h localhost zabbix -e "SELECT hostid,host,status,available FROM hosts WHERE host='Zabbix server';"预期输出:
+--------+------------------+--------+-----------+
| hostid | host | status | available |
+--------+------------------+--------+-----------+
| 10084 | Zabbix server | 0 | 1 |
+--------+------------------+--------+-----------+
available值说明:0=未知,1=正常,2=连接问题,3=不支持
如果available=2,检查zabbix.conf.php配置:
# CentOS/RHEL 路径
cat /etc/zabbix/web/zabbix.conf.php | grep -E "\$ZBX_SERVER|\$ZBX_PORT"
# Ubuntu 路径
cat /etc/zabbix/web/zabbix.conf.php | grep -E "\$ZBX_SERVER|\$ZBX_PORT"预期输出:
$ZBX_SERVER = 'localhost';
$ZBX_SERVER_PORT = '10051';如果配置正确但还是连不上,检查前端日志:
# CentOS/RHEL
tail -n 30 /var/log/zabbix/zabbix_server.log | grep -i frontend
# Ubuntu
tail -n 30 /var/log/zabbix/zabbix_server.log | grep -i frontend三、常见问题FAQ
Q1:Zabbix Server启动报错"cannot allocate shared memory"
这是共享内存配置不足的问题,不是代码问题。修改zabbix_server.conf里的StartPollers相关参数,或降低CacheSize:
# 查看当前配置
cat /etc/zabbix/zabbix_server.conf | grep -E "^StartPollers|^CacheSize|^HistoryCacheSize"预期输出:
StartPollers=20
CacheSize=64M
HistoryCacheSize=128M修复方案:适当降低这些数值或增加系统共享内存
# 查看系统共享内存
ipcs -lm
# 如果不够,临时增加
ipcs -lm | grep shmmaxQ2:监控数据正常但突然全部丢失,图表显示NaN
这类问题通常是数据库写入出了问题。先检查数据库表是否损坏:
mysql -u root -p -h localhost zabbix -e "CHECK TABLE history REPAIR;"也可能是Housekeeper进程删除了太多历史数据,检查配置:
cat /etc/zabbix/zabbix_server.conf | grep -E "^HousekeepingFrequency|^MaxHousekeeperDelete"如果HousekeepingFrequency=1(每小时清理),且MaxHousekeeperDelete=5000,频繁删除会导致前端卡顿。改成更保守的值:
HousekeepingFrequency=12
MaxHousekeeperDelete=1000Q3:Agent端执行zabbix_agentd -t报错"ZBX_ERROR: cannot parse item key"
这不是key写错了,是UserParameters语法问题。检查自定义key配置:
# 查看自定义key配置
cat /etc/zabbix/zabbix_agentd.conf.d/custom.conf典型错误写法:
UserParameter=custom.vfs.dev.read.discovery,/bin/bash -c "awk '/^d/ {print $NF}'"正确写法,每个参数用逗号分隔:
UserParameter=custom.vfs.dev.read.discovery,/bin/bash -c "awk '/^d/ {print $NF}'"
# 注意:key名和命令之间是英文逗号,不是空格修改后重启Agent并测试:
systemctl restart zabbix-agent
zabbix_agentd -t custom.vfs.dev.read.discoveryQ4:Zabbix Web登录一直转圈或报502 Bad Gateway
这是Nginx/PHP-FPM和Zabbix Server通信的问题。先检查PHP-FPM状态:
# CentOS/RHEL
systemctl status php-fpm
# Ubuntu
systemctl status php*fpm检查Socket权限:
ls -la /var/lib/php/session/
ls -la /var/run/php-fpm/确保Zabbix Web有权访问Socket和Session目录。检查Nginx配置中的fastcgi_pass是否正确指向PHP-FPM Socket:
cat /etc/zabbix/nginx.conf | grep fastcgi_pass预期输出:
fastcgi_pass unix:/var/run/php-fpm/zabbix.sock;四、总结
Zabbix故障排查的核心就三点:日志是入口、配置是根因、网络是通道。先从日志定位错误类型,再检查数据库/文件配置,最后确认SELinux和防火墙。遇到Agent连不上,重点查ServerIP白名单和防火墙;Server起不来,重点看共享内存和数据库连接。
核心要点:
- 服务状态+日志是排查起点,别上来就看配置
- 数据库连接问题占故障60%以上,优先排查
- Agent连接失败,八成是Server IP不在白名单里
- SELinux Enforcing环境下记得开布尔值
延伸阅读:
- Zabbix官方文档 - 故障排除章节:https://www.zabbix.com/documentation/current/en/manual/appendix/troubleshooting
- Zabbix日志详解 - 区分DEBUG/INFO/ERROR日志级别含义
- 性能监控 - Tune Zabbix Server的Pollers和JavaGateway提升吞吐量
相关推荐
上一篇: React - 安装配置 配置详解
已经是最后一篇啦!