服务公告

服务公告 > 综合新闻 > Loki - 日志聚合 配置与优化

Loki - 日志聚合 配置与优化

发布时间:2026-05-02 18:02
Loki日志聚合:分布式架构下的日志收集与查询实战,告别日志分散、查询卡顿的痛点问题

一、前言

想象一下,半夜2点告警来了,你登录服务器翻日志,发现日志在3台机器上散着,grep一下要等半分钟。就在你焦头烂额的时候,老板在群里问"问题定位到哪了"。这就是我当年为什么死磕Loki的原因——它轻量、快速、能和Prometheus生态无缝集成,比ELK省太多资源。干了这些年,Loki从0.x升级到2.x踩过的坑,今天全抖出来。

二、操作步骤

步骤1:理解Loki架构,选对部署模式

Loki和Elasticsearch完全不同,它不索引日志内容本身,只索引标签。这就是它省资源的原因。核心组件有三个: ``` ┌─────────────────────────────────────────────────────────┐ │ Distributor ──> Ingester ──> Compactor │ │ │ │ │ └─> Query Frontend ──> Querier │ └─────────────────────────────────────────────────────────┘ Distributor:接收日志,验证,分发到Ingester Ingester:接收写入,整理成chunks存储 Querier:处理查询请求 Query Frontend:查询队列和缓存(可选) Compactor:压缩和保留管理 ``` 生产环境建议至少2个Querier节点,Ingester要3个以上防数据丢失。单机测试可以用All-in-one模式。

步骤2:Docker快速部署测试环境

先跑起来看看效果,用Docker Compose最简单: ```bash # 创建工作目录 mkdir -p loki-test && cd loki-test cat > docker-compose.yml << 'EOF' version: "3.8" services: loki: image: grafana/loki:2.9.2 container_name: loki ports: - "3100:3100" command: -config.file=/etc/loki/local-config.yaml volumes: - ./loki-config.yaml:/etc/loki/local-config.yaml - ./data:/data restart: unless-stopped promtail: image: grafana/promtail:2.9.2 container_name: promtail volumes: - /var/log:/var/log - ./promtail-config.yaml:/etc/promtail/promtail.yaml command: -config.file=/etc/promtail/promtail.yaml depends_on: - loki grafana: image: grafana/grafana:10.2.0 container_name: grafana ports: - "3000:3000" environment: - GF_SECURITY_ADMIN_PASSWORD=YOUR_GRAFANA_PASSWORD volumes: - grafana-data:/var/lib/grafana depends_on: - loki volumes: grafana-data: EOF ``` 预期输出: ``` Creating network "loki-test_default" with the default driver Creating loki ... done Creating promtail ... done Creating grafana ... done ```

步骤3:配置Loki基础参数

创建Loki配置文件: ```bash cat > loki-config.yaml << 'EOF' auth_enabled: false server: http_listen_port: 3100 ingester: lifecycler: address: 127.0.0.1 ring: kvstore: store: inmemory replication_factor: 1 final_sleep: 0s chunk_idle_period: 5m chunk_retain_period: 30s max_transfer_retries: 0 schema_config: stores: - layer: boltdb-shipper object_store: filesystem schema: v11 index: period: 24h prefix: loki_index_ configs: - from: 2023-01-01 store: boltdb-shipper object_store: filesystem schema: v11 index: period: 24h prefix: loki_index_ storage_config: boltdb_shipper: active_index_directory: /data/index cache_location: /data/index_cache shared_store: filesystem filesystem: directory: /data/chunks limits_config: reject_old_samples: true reject_old_samples_max_age: 168h compactor: working_directory: /data/compactor shared_store: filesystem EOF ``` 启动后检查日志: ```bash docker logs loki --tail 20 ``` 预期输出: ``` level=info ts=2024-01-15T10:30:45.123Z caller=main.go:47 msg="Starting Loki" version=2.9.2 level=info ts=2024-01-15T10:30:45.456Z caller=server.go:260 http=3100 grpc=9095 level=info ts=2024-01-15T10:30:45.789Z caller=modules.go:1234 target=all msg="Starting module" module=query-frontend level=info ts=2024-01-15T10:30:46.000Z caller=loki.go:45 msg="Loki started" ``` 看到"Loki started"说明启动成功。如果看到"connection refused"之类的错误,检查3100端口是否被占用: ```bash netstat -tlnp | grep 3100 ```

步骤4:配置Promtail收集日志

Promtail是Loki的日志收集端,配置灵活性很强: ```bash cat > promtail-config.yaml << 'EOF' server: http_listen_port: 9080 grpc_listen_port: 0 positions: filename: /var/log/positions.yaml clients: - url: http://loki:3100/loki/api/v1/push scrape_configs: # 收集Docker日志 - job_name: docker docker_targets: - localhost:9323 relabel_configs: - source_labels: ['__meta_docker_container_name'] regex: '(.*)' target_label: container # 收集系统日志(Ubuntu/Debian) - job_name: system static_configs: - targets: - localhost labels: job: system env: prod __path__: /var/log/syslog # 收集Nginx日志 - job_name: nginx static_configs: - targets: - localhost labels: job: nginx __path__: /var/log/nginx/*.log pipeline_stages: - regex: expression: '^(?P\S+) (?P\S+) (?P\S+) \[(?P