1. Elasticsearch 简介
Elasticsearch 是一个基于 Lucene 的开源分布式搜索和分析引擎,由 Elastic 公司开发。它具有以下特点:
- 分布式:可以轻松扩展到数百台服务器,处理 PB 级数据
- 实时性:数据一旦被索引,立即可被搜索
- 全文检索:强大的全文搜索能力
- RESTful API:提供简单易用的 JSON 风格 API
- 多功能:不仅是搜索引擎,还是强大的分析引擎
2. 核心概念
在深入 Elasticsearch 之前,我们需要理解几个基本概念:
Elasticsearch | 关系型数据库 |
---|---|
索引 (Index) | 数据库 (Database) |
类型 (Type) | 表 (Table) |
文档 (Document) | 行 (Row) |
字段 (Field) | 列 (Column) |
映射 (Mapping) | 表结构 (Schema) |
分片 (Shard) | 数据分区 |
副本 (Replica) | 数据备份 |
3. 安装与设置
安装 Elasticsearch
# 下载 Elasticsearch(以 7.x 版本为例)
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.0-linux-x86_64.tar.gz
# 解压
tar -xzf elasticsearch-7.17.0-linux-x86_64.tar.gz
# 启动
cd elasticsearch-7.17.0/
./bin/elasticsearch
验证安装成功:
curl http://localhost:9200/
输出结果:
{
"name" : "node-1",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "xyzABCdefGHI123456",
"version" : {
"number" : "7.17.0",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "abcd1234",
"build_date" : "2022-01-01T12:34:56.789Z",
"build_snapshot" : false,
"lucene_version" : "8.11.1",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
4. 基本操作 (CRUD)
Elasticsearch 提供了 RESTful API 进行各种操作,常用的 HTTP 方法如下:
- GET:获取资源
- POST:创建资源
- PUT:创建或更新资源
- DELETE:删除资源
- HEAD:检查资源是否存在
4.1 创建索引
# 创建索引语法
PUT /索引名称
{
"settings": {
"number_of_shards": 分片数,
"number_of_replicas": 副本数
}
}
例子:
PUT /blog
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1
}
}
响应:
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "blog"
}
4.2 添加文档
# 添加文档语法 - 指定ID
PUT /索引名称/_doc/文档ID
{
"字段1": "值1",
"字段2": "值2",
...
}
# 添加文档语法 - 自动生成ID
POST /索引名称/_doc
{
"字段1": "值1",
"字段2": "值2",
...
}
例子:
PUT /blog/_doc/1
{
"title": "Elasticsearch入门",
"author": "张三",
"content": "这是一篇关于Elasticsearch的入门文章",
"tags": ["搜索引擎", "Elasticsearch"],
"created_at": "2023-01-01T10:00:00"
}
响应:
{
"_index": "blog",
"_type": "_doc",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
4.3 查询文档
# 查询文档语法 - 按ID查询
GET /索引名称/_doc/文档ID
# 查询所有文档
GET /索引名称/_search
例子:
# 按ID查询
GET /blog/_doc/1
响应:
{
"_index": "blog",
"_type": "_doc",
"_id": "1",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"title": "Elasticsearch入门",
"author": "张三",
"content": "这是一篇关于Elasticsearch的入门文章",
"tags": ["搜索引擎", "Elasticsearch"],
"created_at": "2023-01-01T10:00:00"
}
}
4.4 更新文档
# 更新文档语法
POST /索引名称/_update/文档ID
{
"doc": {
"字段1": "新值1",
"字段2": "新值2"
}
}
例子:
POST /blog/_update/1
{
"doc": {
"title": "Elasticsearch快速入门",
"tags": ["搜索引擎", "Elasticsearch", "教程"]
}
}
响应:
{
"_index": "blog",
"_type": "_doc",
"_id": "1",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 1
}
4.5 删除文档
# 删除文档语法
DELETE /索引名称/_doc/文档ID
例子:
DELETE /blog/_doc/1
响应:
{
"_index": "blog",
"_type": "_doc",
"_id": "1",
"_version": 3,
"result": "deleted",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 2,
"_primary_term": 1
}
4.6 删除索引
# 删除索引语法
DELETE /索引名称
例子:
DELETE /blog
响应:
{
"acknowledged": true
}
5. 搜索功能
Elasticsearch 的核心功能是搜索,它提供了丰富的查询功能。
5.1 基本查询
# 查询语法
GET /索引名称/_search
{
"query": {
"查询类型": {
"参数": "值"
}
}
}
例子:
# 查询标题中包含"Elasticsearch"的文档
GET /blog/_search
{
"query": {
"match": {
"title": "Elasticsearch"
}
}
}
响应:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.6931472,
"hits": [
{
"_index": "blog",
"_type": "_doc",
"_id": "1",
"_score": 0.6931472,
"_source": {
"title": "Elasticsearch快速入门",
"author": "张三",
"content": "这是一篇关于Elasticsearch的入门文章",
"tags": ["搜索引擎", "Elasticsearch", "教程"],
"created_at": "2023-01-01T10:00:00"
}
},
{
"_index": "blog",
"_type": "_doc",
"_id": "2",
"_score": 0.5753642,
"_source": {
"title": "深入理解Elasticsearch",
"author": "李四",
"content": "本文详细介绍Elasticsearch的内部原理",
"tags": ["Elasticsearch", "原理"],
"created_at": "2023-01-02T15:30:00"
}
}
]
}
}
5.2 布尔查询
GET /索引名称/_search
{
"query": {
"bool": {
"must": [
{ "match": { "字段1": "值1" } }
],
"should": [
{ "match": { "字段2": "值2" } }
],
"must_not": [
{ "match": { "字段3": "值3" } }
],
"filter": [
{ "term": { "字段4": "值4" } }
]
}
}
}
例子:
# 查询标题包含"Elasticsearch"且作者不是"王五"的文档
GET /blog/_search
{
"query": {
"bool": {
"must": [
{ "match": { "title": "Elasticsearch" } }
],
"must_not": [
{ "match": { "author": "王五" } }
]
}
}
}
5.3 查询结果排序
GET /索引名称/_search
{
"query": {
"match_all": {}
},
"sort": [
{ "字段1": { "order": "desc" } },
{ "字段2": { "order": "asc" } }
]
}
例子:
# 按创建时间排序查询
GET /blog/_search
{
"query": {
"match_all": {}
},
"sort": [
{ "created_at": { "order": "desc" } }
]
}
5.4 分页查询
GET /索引名称/_search
{
"from": 起始位置,
"size": 返回数量,
"query": {
"match_all": {}
}
}
例子:
# 分页查询,返回第2页的10条数据
GET /blog/_search
{
"from": 10,
"size": 10,
"query": {
"match_all": {}
}
}
5.5 聚合查询
GET /索引名称/_search
{
"size": 0,
"aggs": {
"聚合名称": {
"聚合类型": {
"field": "字段名"
}
}
}
}
例子:
# 获取作者发文数量统计
GET /blog/_search
{
"size": 0,
"aggs": {
"authors": {
"terms": {
"field": "author.keyword",
"size": 10
}
}
}
}
响应:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"authors": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "张三",
"doc_count": 3
},
{
"key": "李四",
"doc_count": 2
},
{
"key": "王五",
"doc_count": 1
}
]
}
}
}
6. 实际应用场景
6.1 网站搜索
很多网站的搜索功能都是基于 Elasticsearch 实现的。用户可以通过关键词快速找到相关内容,同时支持高亮显示、搜索建议、拼写纠错等功能。
示例场景:电商网站商品搜索
# 创建商品索引
PUT /products
{
"mappings": {
"properties": {
"name": { "type": "text", "analyzer": "ik_max_word" },
"description": { "type": "text", "analyzer": "ik_max_word" },
"price": { "type": "float" },
"category": { "type": "keyword" },
"tags": { "type": "keyword" },
"stock": { "type": "integer" },
"created_at": { "type": "date" }
}
}
}
# 搜索名称或描述中包含"手机"的商品,按价格降序排列
GET /products/_search
{
"query": {
"multi_match": {
"query": "手机",
"fields": ["name", "description"]
}
},
"sort": [
{ "price": { "order": "desc" } }
]
}
6.2 日志分析
Elasticsearch 是 ELK 栈(Elasticsearch、Logstash、Kibana)的核心组件,广泛应用于日志收集和分析。
示例场景:Web服务器日志分析
# 查询特定时间范围内的错误日志
GET /logs/_search
{
"query": {
"bool": {
"must": [
{ "match": { "level": "ERROR" } }
],
"filter": [
{
"range": {
"timestamp": {
"gte": "2023-01-01T00:00:00",
"lte": "2023-01-31T23:59:59"
}
}
}
]
}
},
"sort": [
{ "timestamp": { "order": "desc" } }
]
}
6.3 数据可视化
结合 Kibana,可以将 Elasticsearch 中的数据进行可视化展示,如仪表盘、折线图、饼图等。
示例场景:业务监控仪表盘
# 按小时统计API请求量
GET /api_logs/_search
{
"size": 0,
"aggs": {
"requests_per_hour": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "hour"
}
}
}
}
6.4 实时分析
Elasticsearch 支持实时数据分析,可以用于实时监控和报警系统。
示例场景:异常监控
# 监控最近5分钟内的异常请求
GET /system_logs/_search
{
"query": {
"bool": {
"must": [
{ "match": { "status": "error" } }
],
"filter": [
{
"range": {
"timestamp": {
"gte": "now-5m",
"lte": "now"
}
}
}
]
}
}
}
7. 高级功能
7.1 映射(Mapping)
映射是定义文档及其字段如何存储和索引的过程。
# 创建带映射的索引
PUT /users
{
"mappings": {
"properties": {
"username": { "type": "keyword" },
"email": { "type": "keyword" },
"bio": { "type": "text" },
"age": { "type": "integer" },
"join_date": { "type": "date" },
"location": { "type": "geo_point" }
}
}
}
7.2 分析器(Analyzer)
分析器用于处理文本字段,包括分词、过滤等操作。
# 创建自定义分析器
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "my_custom_analyzer"
}
}
}
}
7.3 集群管理
查看集群健康状态:
GET /_cluster/health
响应:
{
"cluster_name": "elasticsearch",
"status": "green",
"timed_out": false,
"number_of_nodes": 3,
"number_of_data_nodes": 3,
"active_primary_shards": 15,
"active_shards": 30,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100.0
}
8. 总结
Elasticsearch 是一个功能强大的搜索和分析引擎,具有以下优势:
- 强大的搜索能力:支持全文搜索、结构化搜索、复杂查询等
- 实时分析:数据一旦索引立即可被搜索和分析
- 分布式架构:易于水平扩展,支持高可用
- RESTful API:简单易用的接口
- 丰富的生态系统:与 Logstash、Kibana、Beats 等工具集成形成完整解决方案
本指南涵盖了 Elasticsearch 的基本概念和操作,包括索引管理、文档CRUD、各种查询方式以及实际应用场景。通过这些基础知识,你可以开始在项目中使用 Elasticsearch 来实现强大的搜索和分析功能。
随着对 Elasticsearch 的深入学习,你还可以探索更多高级功能,如聚合分析、地理位置搜索、机器学习等,以满足更复杂的业务需求。