【ElasticSearch实用篇-01】需求分析和数据制造

发布于:2025-07-08 ⋅ 阅读:(11) ⋅ 点赞:(0)

ElasticSearch实用篇整体栏目


内容 链接地址
【一】ElasticSearch实用篇-需求分析和数据制造 https://zhenghuisheng.blog.csdn.net/article/details/149178534

如需转载,请附上链接:https://blog.csdn.net/zhenghuishengq/article/details/149178534

一,【ElasticSearch实用篇】需求分析和数据制造

为了更加的熟练elasticSearch,掌握其语法,底层原理,实际业务开发等,接下来的系列就是通过实操来对es进行深度学习

1,需求分析

1.1,业务分析

假设我需要做一个简单的相亲用户平台,然后会涉及到用户的筛选,比如用户的性别,年龄,身高,体重,学历,老家,工作城市等基本信息。接下来就以这个维度的需求,来深度的学习一下es,熟悉es语法的使用和原理。

那么根据上面的需求,es需要存储的字段就如下:索引名称为user,在mysql数据库类似于表名

  • 在设置mapping映射属性时,如果是基本属性可以设置成基本属性即可,比如Long,Integer等;
  • 如果需要精确查询,可以直接设置成keyword,那么就不会分词,那么就可以通过term精确查找;
  • 如果设置成text属性,那么就会通过对应的分词器进行分词,那么后期得通过match查找
@Data
@Document(indexName = "user")
public class UserEO {
    @Id
    @Field(type = FieldType.Long)
    private Long id;
    @Field(type = FieldType.Keyword)
    private String nickName;
    /**
     * 性别:1=男,0=女
     */
    @Field(type = FieldType.Integer)
    private Integer sex;
    /**
     * 出生-年
     */
    @Field(type = FieldType.Integer)
    private Integer birthYear;
    /**
     * 出生-月
     */
    @Field(type = FieldType.Integer)
    private Integer birthMonth;
    /**
     * 出生-日
     */
    @Field(type = FieldType.Integer)
    private Integer birthDay;
    /**
     * 身高
     */
    @Field(type = FieldType.Integer)
    private Integer height;
    /**
     * 体重
     */
    @Field(type = FieldType.Integer)
    private Integer weight;

    /**
     * 学历: 3=大专以下,4=大专,5=大学本科,6=硕士,7=博士
     */
    @Field(type = FieldType.Integer)
    private Integer eduLevel;

    /**
     * 居住-省份
     */
    @Field(type = FieldType.Keyword)
    private String liveProvince;
    /**
     * 居住-城市
     */
    @Field(type = FieldType.Keyword)
    private String liveCity;
    /**
     * 老家-省份
     */
    @Field(type = FieldType.Keyword)
    private String regProvince;
    /**
     * 老家-城市
     */
    @Field(type = FieldType.Keyword)
    private String regCity;
    /**
     * 是否删除,0=未删除,1=已删除
     */
    @Field(type = FieldType.Integer)
    private Integer delFlag;
}

1.2,数据分析

也许在实际开发中,es中的数据是mysql数据库同步过去的,通过canal中间件同步过去的,canal伪装成mysql主节点的一个从节点,监听主节点的binlog日志,然后将数据同步过去,为了先将es的各个语法先练熟,那么先通过springboot项目手动的同步一些数据到es中,先同步10w条数据到es中

这里采用的是手动的制造用户数据,用户名和性别随机,年在1990-2010年区间,月日随机,身高在155-185,体重在100-160,学历在大专到硕士之间,省份是全国省份,城市是全部省会城市,当然数据可以动态调整。

1.3 功能实现

目标:快速实现数据查询,基于权重打分优先推出用户匹配度高的数据

  • 可以动态的查询用户想要的数据,比如实现异性,同城和高学历等的优质异性,也能对身高体重的一些塞选;
  • 优先推出优质男用户,比如同城异性优先推出,年龄相仿优先推出,学历相同或者更高优先推出等
  • 快速响应用户想要推出的数据,实现快速响应

2,数据制造代码实现

这里采用线程池多批量插入的方式制造数据

2.1,基础配置

其详细代码如下,首先就是核心依赖

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactI
    <version>2.7.10</version> <!-- 请根据你的 Spring Boot 版本选择适当的版本 --
</dependency>

其次就是yml配置,设置域名和端口号到application.yml中统一管理

es:
  param:
    connect:
      hostname: xx.xx.xx.xx
      port: 9200

上面的配置对应的配置文件如下

@Component
@ConfigurationProperties(prefix = "es.param.connect")
@Data
public class EsConnectProperties {
    private String hostname;
    private Integer port;
}

配置对应的es连接文件,将es注入到spring容器中

@Configuration
@Slf4j
public class ElasticSearchConfig {
    public static final RequestOptions COMMON_OPTIONS;
    static {
        RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();
        COMMON_OPTIONS = builder.build();
    }
    private final EsConnectProperties esConnectProperties;
    public ElasticSearchConfig(EsConnectProperties esConnectProperties) {
        this.esConnectProperties = esConnectProperties;
    }
    
    @Bean
    public RestHighLevelClient esRestClient() {
        log.info("ES配置注入完成:,{},{}", esConnectProperties.getHostname(), esConnectProperties.getPort());
        //初始化配置
        RestClientBuilder builder = RestClient.builder(new HttpHost(esConnectProperties.getHostname(), esConnectProperties.getPort()));
        builder.setRequestConfigCallback(requestConfigBuilder ->
                requestConfigBuilder.setConnectTimeout(5000).setSocketTimeout(60000));
        builder.setHttpClientConfigCallback(httpClientBuilder ->
                httpClientBuilder.setMaxConnTotal(100).setMaxConnPerRoute(20));
        return new RestHighLevelClient(builder);
    }
}

2.2,线程池和线程配置

自定义线程池,采用cpu密集型的线程池,设置阻塞队列为有界链表

@Slf4j
public class ThreadPoolUtil {

    /**
     * io密集型:最大核心线程数为2N,可以给cpu更好的轮换,
     *           核心线程数不超过2N即可,可以适当留点空间
     * cpu密集型:最大核心线程数为N或者N+1,N可以充分利用cpu资源,N加1是为了防止缺页造成cpu空闲,
     *           核心线程数不超过N+1即可
     * 使用线程池的时机:1,单个任务处理时间比较短 2,需要处理的任务数量很大
     */

    private static ThreadPoolExecutor pool = null;

    public static synchronized ThreadPoolExecutor getThreadPool() {
        if (pool == null) {
            //获取当前机器的cpu
            int cpuNum = Runtime.getRuntime().availableProcessors();
            log.info("当前机器的cpu的个数为:{}", cpuNum);
            int maximumPoolSize = cpuNum * 2;
            pool = new ThreadPoolExecutor(
                    maximumPoolSize - 2,
                    maximumPoolSize,
                    5L,   //5s
                    TimeUnit.SECONDS,
                    new LinkedBlockingQueue<>(50),  //数组有界队列
                    Executors.defaultThreadFactory(), //默认的线程工厂
                    new ThreadPoolExecutor.AbortPolicy());  //直接抛异常,默认异常
        }
        return pool;
    }
}

定义线程任务,这里直接实现Runnable即可,里面包括每个属性的设置

@Slf4j
public class UserSaveTask implements Runnable {

    private final UserRepository userRepository;

    public UserSaveTask(UserRepository userRepository) {
        this.userRepository = userRepository;
    }

    /**
     * 批量插入10 0000条数据
     */
    @Override
    public void run() {
        List<UserEO> list = new ArrayList<>();
        //每次1000条
        log.info("开始插入数据...");
        for (int i = 0; i < 100; i++) {
            list.add(buildUserBaseInfo());
        }
        userRepository.saveAll(list);
        log.info("结束插入数据...");
    }

    /**
     * 构建用户基础信息
     * @return
     */
    public UserEO buildUserBaseInfo() {
        UserEO user = new UserEO();

        //设置用户id,雪花算法
        user.setId(IdUtil.getSnowflakeNextId());
        user.setNickName("用户" + getRandomString(6));
        //设置性别
        user.setSex(ThreadLocalRandom.current().nextInt(0, 2));
        //构建年月日
        int year = randBetween(1990, 2010);
        int month = randBetween(1, 12);
        int day = getRandomDay(year, month);
        user.setBirthYear(year);
        user.setBirthMonth(month);
        user.setBirthDay(day);

        //设置身高体重
        user.setHeight(randBetween(150, 185));
        user.setWeight(randBetween(100, 160));
        user.setEduLevel(randBetween(3, 7)); // 大专以下 ~ 硕士

        //居住省份+城市
        String[] live = CityUtil.getRandomCity();
        user.setLiveProvince(live[0]);
        user.setLiveCity(live[1]);

        //老家省份+城市
        String[] reg = CityUtil.getRandomCity();
        user.setRegProvince(reg[0]);
        user.setRegCity(reg[1]);

        // 默认不被删除
        user.setDelFlag(0);
        return user;
    }

    private static String getRandomString(int length) {
        String chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
        StringBuilder sb = new StringBuilder(length);
        ThreadLocalRandom r = ThreadLocalRandom.current();
        for (int i = 0; i < length; i++) {
            sb.append(chars.charAt(r.nextInt(chars.length())));
        }
        return sb.toString();
    }

    private static int randBetween(int start, int end) {
        return ThreadLocalRandom.current().nextInt(start, end + 1);
    }

    private static int getRandomDay(int year, int month) {
        // 获取当月最大天数
        return randBetween(1, LocalDate.of(year, month, 1).lengthOfMonth());
    }
}

城市工具类如下,只需要对应的省份和省会城市即可,这里还包含了四个直辖市

public class CityUtil {

    private static final List<String[]> PROVINCE_AND_CITY_LIST = Arrays.asList(
            new String[]{"北京市", "北京市"},
            new String[]{"天津市", "天津市"},
            new String[]{"上海市", "上海市"},
            new String[]{"重庆市", "重庆市"},
            new String[]{"河北省", "石家庄市"},
            new String[]{"山西省", "太原市"},
            new String[]{"辽宁省", "沈阳市"},
            new String[]{"吉林省", "长春市"},
            new String[]{"黑龙江省", "哈尔滨市"},
            new String[]{"江苏省", "南京市"},
            new String[]{"浙江省", "杭州市"},
            new String[]{"安徽省", "合肥市"},
            new String[]{"福建省", "福州市"},
            new String[]{"江西省", "南昌市"},
            new String[]{"山东省", "济南市"},
            new String[]{"河南省", "郑州市"},
            new String[]{"湖北省", "武汉市"},
            new String[]{"湖南省", "长沙市"},
            new String[]{"广东省", "广州市"},
            new String[]{"海南省", "海口市"},
            new String[]{"四川省", "成都市"},
            new String[]{"贵州省", "贵阳市"},
            new String[]{"云南省", "昆明市"},
            new String[]{"陕西省", "西安市"},
            new String[]{"甘肃省", "兰州市"},
            new String[]{"青海省", "西宁市"},
            new String[]{"台湾省", "台北市"},
            new String[]{"内蒙古自治区", "呼和浩特市"},
            new String[]{"广西壮族自治区", "南宁市"},
            new String[]{"西藏自治区", "拉萨市"},
            new String[]{"宁夏回族自治区", "银川市"},
            new String[]{"新疆维吾尔自治区", "乌鲁木齐市"},
            new String[]{"香港特别行政区", "香港"},
            new String[]{"澳门特别行政区", "澳门"}
    );

    public static String[] getRandomCity() {
        return PROVINCE_AND_CITY_LIST.get(ThreadLocalRandom.current().nextInt(PROVINCE_AND_CITY_LIST.size()));
    }
}

2.3,插入数据

配置UserRepository接口,需要加上 @Repository 注解

@Repository
public interface UserRepository extends ElasticsearchRepository<UserEO, Long> {
}

随后定义一个 UserMatchService 接口,里面先定义一个插入方法

public interface UserMatchService {
    AjaxResult matchSave();
}

随后实现上面的这个接口以及方法,循环向线程池中提交1000个任务

/**
 *
 * @Author zhenghuisheng
 * @Date:2025/6/23 15:50
 */
@Service
public class UserMatchServiceImpl implements UserMatchService {

    @Resource
    private UserRepository userRepository;

    //获取线程池
    ThreadPoolExecutor threadPool = ThreadPoolUtil.getThreadPool();

    /**
     * 线程池批量生成100000个用户
     * @return
     */
    @Override
    public AjaxResult matchSave() {
        for (int i = 0; i < 1000; i++) {
            //提交任务
            threadPool.submit(new UserSaveTask(userRepository));
        }
        return AjaxResult.success("数据生成完毕");
    }

}

最后配置Controller即可

@RestController
@RequestMapping("/es/user")
public class UserMatchController {

    @Resource
    private UserMatchService userMatchService;

    @GetMapping("/matchSave")
    public AjaxResult matchSave() {
        return userMatchService.matchSave();
    }

}

3,kibana查看数据

项目启动执行完上面的接口之后,可以查看一下这个索引对应的数据,其总数据如下

get /user/_count

在这里插入图片描述

看一下其mapping映射,就是每个字段的数据类型映射

GET /user/_mapping
{
  "user" : {
    "mappings" : {
      "properties" : {
        "_class" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "birthDay" : {
          "type" : "long"
        },
        "birthMonth" : {
          "type" : "long"
        },
        "birthYear" : {
          "type" : "long"
        },
        "delFlag" : {
          "type" : "long"
        },
        "eduLevel" : {
          "type" : "long"
        },
        "height" : {
          "type" : "long"
        },
        "id" : {
          "type" : "long"
        },
        "liveCity" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "liveProvince" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "nickName" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "regCity" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "regProvince" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "sex" : {
          "type" : "long"
        },
        "weight" : {
          "type" : "long"
        }
      }
    }
  }
}

查看数据,并且分页

GET /user/_search?from=1&size=5
{
  "query": {
    "match_all": {}
  }
}

其返回数据如下

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "user",
        "_type" : "_doc",
        "_id" : "1937084380165083277",
        "_score" : 1.0,
        "_source" : {
          "_class" : "com.zhs.elasticsearch.match.eo.UserEO",
          "id" : 1937084380165083277,
          "nickName" : "用户Rxo729",
          "sex" : 0,
          "birthYear" : 1998,
          "birthMonth" : 6,
          "birthDay" : 16,
          "height" : 179,
          "weight" : 153,
          "eduLevel" : 4,
          "liveProvince" : "河北省",
          "liveCity" : "石家庄市",
          "regProvince" : "辽宁省",
          "regCity" : "沈阳市",
          "delFlag" : 0
        }
      },
      {
        "_index" : "user",
        "_type" : "_doc",
        "_id" : "1937084380165083281",
        "_score" : 1.0,
        "_source" : {
          "_class" : "com.zhs.elasticsearch.match.eo.UserEO",
          "id" : 1937084380165083281,
          "nickName" : "用户pLNM3B",
          "sex" : 0,
          "birthYear" : 2007,
          "birthMonth" : 7,
          "birthDay" : 14,
          "height" : 172,
          "weight" : 131,
          "eduLevel" : 7,
          "liveProvince" : "西藏自治区",
          "liveCity" : "拉萨市",
          "regProvince" : "内蒙古自治区",
          "regCity" : "呼和浩特市",
          "delFlag" : 0
        }
      },
      {
        "_index" : "user",
        "_type" : "_doc",
        "_id" : "1937084380165083286",
        "_score" : 1.0,
        "_source" : {
          "_class" : "com.zhs.elasticsearch.match.eo.UserEO",
          "id" : 1937084380165083286,
          "nickName" : "用户yupBE5",
          "sex" : 0,
          "birthYear" : 1999,
          "birthMonth" : 10,
          "birthDay" : 29,
          "height" : 166,
          "weight" : 140,
          "eduLevel" : 7,
          "liveProvince" : "贵州省",
          "liveCity" : "贵阳市",
          "regProvince" : "澳门特别行政区",
          "regCity" : "澳门",
          "delFlag" : 0
        }
      },
      {
        "_index" : "user",
        "_type" : "_doc",
        "_id" : "1937084380165083290",
        "_score" : 1.0,
        "_source" : {
          "_class" : "com.zhs.elasticsearch.match.eo.UserEO",
          "id" : 1937084380165083290,
          "nickName" : "用户fTGRMJ",
          "sex" : 1,
          "birthYear" : 2003,
          "birthMonth" : 7,
          "birthDay" : 9,
          "height" : 182,
          "weight" : 128,
          "eduLevel" : 6,
          "liveProvince" : "海南省",
          "liveCity" : "海口市",
          "regProvince" : "辽宁省",
          "regCity" : "沈阳市",
          "delFlag" : 0
        }
      },
      {
        "_index" : "user",
        "_type" : "_doc",
        "_id" : "1937084380165083295",
        "_score" : 1.0,
        "_source" : {
          "_class" : "com.zhs.elasticsearch.match.eo.UserEO",
          "id" : 1937084380165083295,
          "nickName" : "用户v6ZwfS",
          "sex" : 0,
          "birthYear" : 1995,
          "birthMonth" : 12,
          "birthDay" : 11,
          "height" : 173,
          "weight" : 140,
          "eduLevel" : 5,
          "liveProvince" : "湖南省",
          "liveCity" : "长沙市",
          "regProvince" : "江苏省",
          "regCity" : "南京市",
          "delFlag" : 0
        }
      }
    ]
  }
}

那么此时数据制造成功

详细代码可以直接gitee获取:https://gitee.com/zhenghuisheng/elasticsearch_study


网站公告

今日签到

点亮在社区的每一天
去签到