elasticsearch操作

发布于:2023-01-15 ⋅ 阅读:(725) ⋅ 点赞:(0)

1.倒排索引的介绍

1 倒排索引:对文章进行分词,对每个词建立索引,
由于这样建,会出现索引爆炸,索引索引跟标题建关系,标题再跟文章建索引,如下:
分词---文章建立索引                             |

| 今天(索引)    | (文章1,<2,10>,2) (文章3,<8>,1)       |
| 星期天(索引) | (文章2,<12,25,100>,3)                 |
| 出去玩(索引) | (文章5,<11,24,89>,3)(文章1,<8,19>,2) |

今天出现在哪个文章,出现的位置和出现的次数

2.索引操作(数据库)

2.1 创建索引

PUT ymq
{
  "settings": {
    "index":{
      "number_of_shards":5,
      "number_of_replicas":1
    }
  }
}

2.2 查看索引

# 查看单个
GET ymq/_settings
# 查看所有
GET _all/_settings
# 查看特定
GET ymq,ymq2/_settings
# 查看所有
GET _settings

2.3 修改索引(一般不太用,只能用来修改副本数量)

#修改索引副本数量为2  分片的数量一开始就要定好
# 副本数量可以改(有可能会出错)
PUT ymq/_settings
{
  "number_of_replicas": 2
}

PUT  _all/_settings
{
"index": {
  "blocks": {
    "read_only_allow_delete": false
    }
  }
}

2.4 删除索引

DELETE ymq

3. 映射管理(类型)(表)

3.1 介绍

在Elasticsearch 6.0.0或更高版本中创建的索引只包含一个mapping type。

在5.x中使用multiple mapping types创建的索引将继续像以前一样在Elasticsearch 6.x中运行。 Mapping types将在Elasticsearch 7.0.0中完全删除

##索引如果不创建,只有插入文档,会自动创建

3.2 创建映射(类型,表)

PUT books
{
  "mappings": {
    "properties":{
      "title":{
        "type":"text"
      },
      "price":{
        "type":"integer"
      },
      "addr":{
        "type":"keyword"
      },
      "company":{
        "properties":{
          "name":{"type":"text"},
          "company_addr":{"type":"text"},
          "employee_count":{"type":"integer"}
        }
      },
      "publish_date":{"type":"date","format":"yyy-MM-dd"}
      
    }
    
  }
}

3.3 查看映射

GET books/_mapping
GET _all/_mapping

3.4 特殊说明索引映射都不存在,也可以插入文档

PUT ymq2/_doc/1
{
  "title":"白雪公主和十个小矮人",
  "price":"99",
  "addr":"黑暗森里",
  "publish_date":"2018-05-19",
  "name":"ymq"
}

4. 文档基本增删查改(一行一行数据)

4.1 插入文档

PUT books/_doc/1
{
  "title":"大头儿子小偷爸爸",
  "price":100,  
  "addr":"北京天安门",
  "company":{
    "name":"我爱北京天安门",
    "company_addr":"我的家在东北松花江傻姑娘",
    "employee_count":10
  },
  "publish_date":"2019-08-19"
}

PUT books/_doc/2
{
  "title":"白雪公主和十个小矮人",
  "price":"99", 
  "addr":"黑暗森里",
  "publish_date":"2018-05-19"
}

PUT books/_doc/3
{
  "title":"白雪公主和十个小矮人",
  "price":"99", 
  "addr":"黑暗森里",
  "publish_date":"2018-05-19",
   "name":"lqz"
}

4.2 查看文档


# 格式:索引名称/默认类型名称/id
GET books/_doc/1

4.3 修改文档两种方式

4.3.1 第一种(不推荐,全部修改)

PUT lqz/_doc/1
{
  "name":"顾老二",
  "age":30,
  "from": "gu",
  "desc": "皮肤黑、武器长、性格直",
  "tags": ["黑", "长", "直"]
}

4.3.2 局部修改

POST lqz/_doc/1/_update
{
  "doc": {
    "desc": "皮肤很safasdfsda黄,武器很长,性格很直",
    "tags": ["很黄","很长", "很直"]
  }
}

4.4 删除文档

DELETE lqz/_doc/4

5. 文档查询

5.1 term与match的区别

5.1.1 介绍

term:是代表完全匹配,也就是精确查询,搜索前不会再对搜索词进行分词,所以我们的搜索词必须是文档分词集合中的一个

match:查询会先对搜索词进行分词,分词完毕后再逐个对分词结果进行匹配,因此相比于term的精确搜索,match是分词匹配搜索

5.1.2 创建索引+映射(无ik)+插入数据

# 创建索引跟映射
PUT lqz
{
  "settings": {
		"number_of_shards": 5,
		"number_of_replicas": 2
	},
  "mappings": {
    "properties":{
      "title":{
        "type":"text"
      },
      "desc":{
        "type":"text"
      },
      "price":{
        "type":"integer"
      },
      "addr":{
        "type":"keyword"
      },
      "company":{
        "properties":{
          "name":{"type":"text"},
          "company_addr":{"type":"text"},
          "employee_count":{"type":"integer"}
        }
      },
      "publish_date":{"type":"date","format":"yyy-MM-dd"}
      
    }
    
  }
}

# 插入数据

PUT lqz/_doc/1
{
  "title":"so beautiful zero",
  "price":100,  
  "addr":"北京天安门",
  "desc":"beautiful cat",
  "company":{
    "name":"我爱北京天安门",
    "company_addr":"我的家在东北松花江傻姑娘",
    "employee_count":10
  },
  "publish_date":"2019-08-19"
}
 
PUT lqz/_doc/2
{
  "title":"so beautiful one",
  "price":200,  
  "addr":"北京天安门",
  "desc":"beautiful dog",
  "company":{
    "name":"我爱北京天安门",
    "company_addr":"我的家在东北松花江傻姑娘",
    "employee_count":10
  },
  "publish_date":"2019-08-19"
}


PUT lqz/_doc/3
{
  "title":"so beautiful tow",
  "price":698,  
  "addr":"北京天安门",
  "desc":"dog",
  "company":{
    "name":"我爱北京天安门",
    "company_addr":"我的家在东北松花江傻姑娘",
    "employee_count":10
  },
  "publish_date":"2019-08-19"
}

5.2 term

5.2.1 term与terms

term:不会分词,按照指定的词查询

terms:可指定多个词查询

# term查的不会分词
GET lqz/_doc/_search
        {
      "query": {
        "term": {
          "desc": "beautiful"
        }
      }
    }
# terms由于部分词,想查多个,terms
GET lqz/_doc/_search
  {
    "query": {
      "terms": {
        "title": ["beautiful", "so"]
      }
    }
  }

5.3 match

5.3.1 match和match_all

match:查询相当于模糊匹配,只包含其中一部分关键词就行 

match_all:能够匹配索引中的所有文件。 

match_phrase:短语匹配查询,要求必须全部精确匹配,且顺序必须与指定的短语相同

# match查的短语会分词
GET lqz/_doc/_search
    {
      "query": {
        "match_all": {}
      }
    }
  
GET lqz/_doc/_search
    {
      "query": {
        "match": {
          "title": "beautiful tow"
        }
      }
    }

5.4 排序查询

不是所有字段都支持排序,只有数字类型,字符串不支持

# 排序查询
# 1.普通查询
GET lqz/_doc/_search
{
  "query": {
    "match": {
      "addr": "北京天安门"
    }
  }
}

# 2.降序
GET lqz/_doc/_search
{
  "query": {
    "match": {
      "addr": "北京天安门"
    }
  },
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    }
  ]
}

#3.升序
GET lqz/_doc/_search
{
  "query": {
    "match": {
      "addr": "北京天安门"
    }
  },
  "sort": [
    {
      "price": {
        "order": "asc"
      }
    }
  ]
}

# 4.match_all+升序
GET lqz/_doc/_search
{
  "query": {
    "match_all": {
    }
  },
  "sort": [
    {
      "price": {
        "order": "asc"
      }
    }
  ]
}

5.5 分页查询

所有的条件都是可插拔的,彼此之间用 , 分割

# 分页
#从第二条开始,取一条

GET lqz/_doc/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    }
  ], 
  "from": 2,
  "size": 2
}




###注意:对于`elasticsearch`来说,所有的条件都是可插拔的,彼此之间 , 分割
GET lqz/_doc/_search
{
  "query": {
    "match_all": {}
  }, 
  "from": 2,
  "size": 2
}

5.6 布尔查询

  • must:与关系,相当于关系型数据库中的and

  • should:或关系,相当于关系型数据库中的or

  • must_not:非关系,相当于关系型数据库中的not

  • filter:过滤条件。

  • range:条件筛选范围。

  • gt:大于,相当于关系型数据库中的>

  • gte:大于等于,相当于关系型数据库中的>=

  • lt:小于,相当于关系型数据库中的<

  • lte:小于等于,相当于关系型数据库中的<=

##布尔查询之should or条件
GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "addr": "北京天安门"
          }
        },
        {
          "match": {
            "desc": "beautiful"
          }
        }
      ]
    }
  }
}





### must_not条件   都不是
GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "addr": "北京天安门"
          }
        },
        {
          "match": {
            "desc": "beautiful"
          }
        },
        {
          "match": {
            "price": 698
          }
        }
      ]
    }
  }
}




###filter,大于小于的条件   gt lt  gte  lte
GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "addr": "北京天安门"
          }
        }
      ],
      "filter": {
        "range": {
          "price": {
            "lt": 200
          }
        }
      }
    }
  }
}


### 范围查询
GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "addr": "北京天安门"
          }
        }
      ],
      "filter": {
        "range": {
          "price": {
            "gte": 100,
            "lte": 150
          }
        }
      }
    }
  }
}

5.7 查询结果过滤


###基本使用
GET lqz/_doc/_search
{
  "query": {
    "match_all": {
      }
  },
  "_source":["name","age"]
}


####_source和query是平级的

GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "must":{
        "match":{"from":"gu"}
      },
      
      "filter": {
        "range": {
          "age": {
            "lte": 25
          }
        }
      }
    }
  },
  "_source":["name","age"]
}





5.8 高亮查询(未能高亮)

GET lqz/_doc/_search
{
  "query": {
    "match": {
      "price": "698"
    }
  },
  "highlight": {
    "pre_tags": "<b class='key' style='color:red'>",
    "post_tags": "</b>",
    "fields": {
    "from": {}
    }
  }
}

5.9 聚合函数


# sum ,avg, max ,min

# select max(age) as my_avg from 表 where from=gu;
GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_avg": {
      "avg": {
        "field": "age"
      }
    }
  },
  "_source": ["name", "age"]
}

#最大年龄
GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_max": {
      "max": {
        "field": "age"
      }
    }
  },
  "_source": ["name", "age"]
}

#最小年龄
GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_min": {
      "min": {
        "field": "age"
      }
    }
  },
  "_source": ["name", "age"]
}

# 总年龄
#最小年龄
GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_sum": {
      "sum": {
        "field": "age"
      }
    }
  },
  "_source": ["name", "age"]
}



#分组


# 现在我想要查询所有人的年龄段,并且按照`15~20,20~25,25~30`分组,并且算出每组的平均年龄。
GET lqz/_doc/_search
{
  "size": 0, 
  "query": {
    "match_all": {}
  },
  "aggs": {
    "age_group": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 15,
            "to": 20
          },
          {
            "from": 20,
            "to": 25
          },
          {
            "from": 25,
            "to": 30
          }
        ]
      }
    }
  }
}


网站公告

今日签到

点亮在社区的每一天
去签到