ES数据库索引报错

发布于:2025-04-20 ⋅ 阅读:(51) ⋅ 点赞:(0)

ES数据库索引报错

ERROR - {'took': 0, 'errors': True, 'items': [{'create': {'_index': 'xxxx-log-2.0-2022.01.11-000001', '_type': '_doc', '_id': 'wdrDR5YBNUJot4R74noE', 'status': 429, 'error': {'type': 'cluster_block_exception', 'reason': 'index [xxxx-log-2.0-2022.01.11-000001] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];'}}}]}

关键点:[TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block]

由于 Elasticsearch 磁盘空间不足导致的写入阻塞错误

解决方式

错误原因分析

{
  "error": {
    "type": "cluster_block_exception",
    "reason": "index [...] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark..."
  }
}

触发机制:Elasticsearch 默认磁盘水位线保护策略
flood-stage (红色警戒线):磁盘使用 ≥ 95%
• 超过该阈值后,ES 自动将索引设为 只读模式(仅允许删除)


紧急处理步骤

1. 确认当前磁盘状态
# 查看所有节点磁盘使用率
GET _cat/allocation?v&h=node,disk.percent,disk.avail,disk.total,disk.indices

# 检查水位线配置
GET _cluster/settings?include_defaults=true&filter_path=*.cluster.routing.allocation.disk*
2. 临时恢复写入(应急)
# 关闭磁盘检查(需谨慎,可能导致磁盘写满)
PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.threshold_enabled": "false"
  }
}

# 解除索引只读状态(替换your_index_name)
PUT your_index_name/_settings
{
  "index.blocks.read_only_allow_delete": null
}
3. 长期解决方案
操作方向 具体措施
清理数据 - 删除旧索引:DELETE /old_index_*
- 使用 ILM 自动管理生命周期
扩容存储 - 增加磁盘容量
- 添加新数据节点
优化存储 - 启用压缩:"index.codec": "best_compression"
- 关闭未用字段索引
调整水位线 修改配置(需重启):
yaml<br>cluster.routing.allocation.disk.watermark.flood_stage: 97%<br>

预防性配置建议

# elasticsearch.yml
# -------------------------------
# 磁盘水位线阈值(按需调整)
cluster.routing.allocation.disk.watermark.low: 85% 
cluster.routing.allocation.disk.watermark.high: 90%
cluster.routing.allocation.disk.watermark.flood_stage: 95%

# 自动删除旧数据(示例保留7天)
PUT _ilm/policy/log_retention_policy 
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {"rollover": {"max_size": "50gb", "max_age": "7d"}}
      },
      "delete": {
        "min_age": "7d",
        "actions": {"delete": {}}
      }
    }
  }
}

监控方案示例

# 通过Prometheus监控(示例告警规则)
- alert: ElasticsearchDiskFull
  expr: elasticsearch_cluster_filesystem_used_percent > 90
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "ES节点磁盘即将写满 ({{ $value }}% used)"
    description: "节点 {{ $labels.node }} 磁盘使用率超过90%"

注意事项

  1. 强制关闭磁盘检查后,需在 4小时内 完成数据清理或扩容
  2. best_compression 压缩算法会增加约 10% CPU 负载
  3. 修改水位线阈值时,建议保持至少 5% 的缓冲空间

通过以上操作可快速恢复服务,但需从根本上解决存储容量问题以避免重复触发保护机制。


网站公告

今日签到

点亮在社区的每一天
去签到