Loki - 日志聚合 配置与优化
发布时间:2026-05-02 18:02
Loki日志聚合:分布式架构下的日志收集与查询实战,告别日志分散、查询卡顿的痛点问题
一、前言
想象一下,半夜2点告警来了,你登录服务器翻日志,发现日志在3台机器上散着,grep一下要等半分钟。就在你焦头烂额的时候,老板在群里问"问题定位到哪了"。这就是我当年为什么死磕Loki的原因——它轻量、快速、能和Prometheus生态无缝集成,比ELK省太多资源。干了这些年,Loki从0.x升级到2.x踩过的坑,今天全抖出来。
二、操作步骤
步骤1:理解Loki架构,选对部署模式
Loki和Elasticsearch完全不同,它不索引日志内容本身,只索引标签。这就是它省资源的原因。核心组件有三个:
```
┌─────────────────────────────────────────────────────────┐
│ Distributor ──> Ingester ──> Compactor │
│ │ │
│ └─> Query Frontend ──> Querier │
└─────────────────────────────────────────────────────────┘
Distributor:接收日志,验证,分发到Ingester
Ingester:接收写入,整理成chunks存储
Querier:处理查询请求
Query Frontend:查询队列和缓存(可选)
Compactor:压缩和保留管理
```
生产环境建议至少2个Querier节点,Ingester要3个以上防数据丢失。单机测试可以用All-in-one模式。
步骤2:Docker快速部署测试环境
先跑起来看看效果,用Docker Compose最简单:
```bash
# 创建工作目录
mkdir -p loki-test && cd loki-test
cat > docker-compose.yml << 'EOF'
version: "3.8"
services:
loki:
image: grafana/loki:2.9.2
container_name: loki
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
volumes:
- ./loki-config.yaml:/etc/loki/local-config.yaml
- ./data:/data
restart: unless-stopped
promtail:
image: grafana/promtail:2.9.2
container_name: promtail
volumes:
- /var/log:/var/log
- ./promtail-config.yaml:/etc/promtail/promtail.yaml
command: -config.file=/etc/promtail/promtail.yaml
depends_on:
- loki
grafana:
image: grafana/grafana:10.2.0
container_name: grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=YOUR_GRAFANA_PASSWORD
volumes:
- grafana-data:/var/lib/grafana
depends_on:
- loki
volumes:
grafana-data:
EOF
```
预期输出:
```
Creating network "loki-test_default" with the default driver
Creating loki ... done
Creating promtail ... done
Creating grafana ... done
```
步骤3:配置Loki基础参数
创建Loki配置文件:
```bash
cat > loki-config.yaml << 'EOF'
auth_enabled: false
server:
http_listen_port: 3100
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 5m
chunk_retain_period: 30s
max_transfer_retries: 0
schema_config:
stores:
- layer: boltdb-shipper
object_store: filesystem
schema: v11
index:
period: 24h
prefix: loki_index_
configs:
- from: 2023-01-01
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
period: 24h
prefix: loki_index_
storage_config:
boltdb_shipper:
active_index_directory: /data/index
cache_location: /data/index_cache
shared_store: filesystem
filesystem:
directory: /data/chunks
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
compactor:
working_directory: /data/compactor
shared_store: filesystem
EOF
```
启动后检查日志:
```bash
docker logs loki --tail 20
```
预期输出:
```
level=info ts=2024-01-15T10:30:45.123Z caller=main.go:47 msg="Starting Loki" version=2.9.2
level=info ts=2024-01-15T10:30:45.456Z caller=server.go:260 http=3100 grpc=9095
level=info ts=2024-01-15T10:30:45.789Z caller=modules.go:1234 target=all msg="Starting module" module=query-frontend
level=info ts=2024-01-15T10:30:46.000Z caller=loki.go:45 msg="Loki started"
```
看到"Loki started"说明启动成功。如果看到"connection refused"之类的错误,检查3100端口是否被占用:
```bash
netstat -tlnp | grep 3100
```
步骤4:配置Promtail收集日志
Promtail是Loki的日志收集端,配置灵活性很强:
```bash
cat > promtail-config.yaml << 'EOF'
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /var/log/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
# 收集Docker日志
- job_name: docker
docker_targets:
- localhost:9323
relabel_configs:
- source_labels: ['__meta_docker_container_name']
regex: '(.*)'
target_label: container
# 收集系统日志(Ubuntu/Debian)
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: system
env: prod
__path__: /var/log/syslog
# 收集Nginx日志
- job_name: nginx
static_configs:
- targets:
- localhost
labels:
job: nginx
__path__: /var/log/nginx/*.log
pipeline_stages:
- regex:
expression: '^(?P
\S+) (?P\S+) (?P\S+) \[(?P