Elasticsearch:分布式搜索处理引擎

发布于 2023-02-12  1163 次阅读


前言:公司在业务中需要处理综合搜索业务,在现有的MySQL模糊搜索已经不适用。最终经过技术选型,选择了Elasticsearch工具。

一.Docker搭建服务

安装es

1. docker pull

去dockerhub看具体版本,这里用7.17.1

镜像选择,不带ik分词器:docker.elastic.co/elasticsearch/elasticsearch:7.15.2
镜像选择,带ik分词器,但是版本不全:docker pull davyinsa/elasticsearch-ik

分词器:https://github.com/medcl/elasticsearch-analysis-ik
分词器需要安装在挂载出来的文件中:/data/elasticsearch/plugins/ik-analyse/

docker pull elasticsearch:7.17.1

2. 临时安装生成文件

 docker run -d --name elasticsearch  -p 9200:9200 -p 9300:9300 -e  "discovery.type=single-node" -e ES_JAVA_OPTS="-Xms256m -Xmx256m" elasticsearch:7.17.1

参数说明

  • -d 后台启动
  • –name 起别名即:NAMES
  • -p 9200:9200 将端口映射出来
    elasticsearch的9200端口是供外部访问使用;9300端口是供内部访问使用集群间通讯
  • -e "discovery.type=single-node"单节点启动
  • -e ES_JAVA_OPTS="-Xms256m -Xmx256m" 限制内存大小

确保成功启动

docker ps

3. 设置外部数据卷

  1. 执行
mkdir -p /data/elasticsearch/{config,data,logs,plugins}
yml
  1. 将容器内文件拷贝出来
docker cp elasticsearch:/usr/share/elasticsearch/config /data/elasticsearch
docker cp elasticsearch:/usr/share/elasticsearch/logs /data/elasticsearch
docker cp elasticsearch:/usr/share/elasticsearch/data /data/elasticsearch
docker cp elasticsearch:/usr/share/elasticsearch/plugins /data/elasticsearch
  1. 设置elasticsearch.yml的内容
vi /data/elasticsearch/config/elasticsearch.yml
  • 确保有以下几个配置,原有的配置可以不改动
cluster.name: "docker-cluster"
network.hosts:0.0.0.0
# 跨域
http.cors.allow-origin: "*"
http.cors.enabled: true
http.cors.allow-headers: Authorization,X-Requested-With,Content-Length,Content-Type

4. 停止并删除临时容器

docker stop elasticsearch
docker rm elasticsearch

5. 重新起容器并挂载外部文件夹

docker run -d --name elasticsearch \
-p 9200:9200 \
-p 9300:9300 \
-e "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms256m -Xmx256m" \
-v /data/elasticsearch/logs:/usr/share/elasticsearch/logs \
-v /data/elasticsearch/data:/usr/share/elasticsearch/data \
-v /data/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-v /data/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
elasticsearch:7.17.1

等docker容器起来一分钟左右,再访问9200 端口,会返回

因为安装的是V7版本的,默认没开启x-pack(v8默认开启),所以能直接访问

[root@iZuf6ai62xce7wexx4wwi9Z config]# curl "http://localhost:9200"
{
  "name" : "6a1036c69d59",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "0zgLiGhESGKQYTYy9gH4iA",
  "version" : {
    "number" : "7.17.1",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "e5acb99f822233d62d6444ce45a4543dc1c8059a",
    "build_date" : "2022-02-23T22:20:54.153567231Z",
    "build_snapshot" : false,
    "lucene_version" : "8.11.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}
[root@iZuf6ai62xce7wexx4wwi9Z config]#

6.为es设置密码

  1. es开启x-pack
vim /data/elasticsearch/config/elasticsearch.yml

增加以下xpack.security.enabled

cluster.name: "docker-cluster-01"
network.host: 0.0.0.0
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-headers: Authorization,X-Requested-With,Content-Length,Content-Type

# 此处开启xpack
xpack.security.enabled: true

重启es容器

docker restart elasticsearch
  1. 进入es容器修改密码
docker exec -ti elasticsearch /bin/bash
/usr/share/elasticsearch/bin/elasticsearch-setup-passwords interactive

然后会分别让重置以下的密码,这里重置成123456

Initiating the setup of passwords for reserved users elastic,apm_system,kibana,kibana_system,logstash_system,beats_system,remote_monitoring_user.
You will be prompted to enter passwords as the process progresses.
Please confirm that you would like to continue [y/N]y


Enter password for [elastic]:
passwords must be at least [6] characters long
Try again.
Enter password for [elastic]:
Reenter password for [elastic]:
Enter password for [apm_system]:
Reenter password for [apm_system]:
Enter password for [kibana_system]:
Reenter password for [kibana_system]:
Enter password for [logstash_system]:
Reenter password for [logstash_system]:
Enter password for [beats_system]:
Reenter password for [beats_system]:
Enter password for [remote_monitoring_user]:
Reenter password for [remote_monitoring_user]:
Changed password for user [apm_system]
Changed password for user [kibana_system]
Changed password for user [kibana]
Changed password for user [logstash_system]
Changed password for user [beats_system]
Changed password for user [remote_monitoring_user]
Changed password for user [elastic]
  1. 重置完毕之后带上用户就可以访问了
[root@k8s-master ~]# curl localhost:9200 -u elastic
Enter host password for user 'elastic':
{
  "name" : "cd52e7fbacd1",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "0S-V9zElSie_zXtcDRssAQ",
  "version" : {
    "number" : "8.1.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "31df9689e80bad366ac20176aa7f2371ea5eb4c1",
    "build_date" : "2022-03-29T21:18:59.991429448Z",
    "build_snapshot" : false,
    "lucene_version" : "9.0.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}
[root@k8s-master ~]#

二.Springboot整合elasticsearch使用

一定注意版本问题,具体参照官网版本需求,否则无法使用,因为elasticsearch的API更新很快:https://docs.spring.io/spring-data/elasticsearch/docs/current/reference/html/#preface.metadata

引入Springboot依赖

<dependency>
      <groupId>org.springframework.boot</groupId>       
      <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

配置文件

  elasticsearch:
    uris: xx.xx.xx.xx:9200
    username: 
    password: 

定义JavaBean

注解:

@Document用来声明Java对象与ElasticSearch索引的关系
indexName 索引名称(是字母的话必须是小写字母)
type 索引类型
shards 主分区数量,默认5
replicas 副本分区数量,默认1
createIndex 索引不存在时,是否自动创建索引,默认true
不建议自动创建索引(自动创建的索引 是按着默认类型和默认分词器)
注解:@Id 表示索引的主键
注解:@Field 用来描述字段的ES数据类型,是否分词等配置,等于Mapping描述
index 设置字段是否索引,默认是true,如果是false则该字段不能被查询
store 默认为no,被store标记的fields被存储在和index不同的fragment中,以便于快速检索。虽然store占用磁盘空间,但是减少了计算。
type 数据类型(text、keyword、date、object、geo等)
analyzer 对字段使用分词器,注意一般如果要使用分词器,字段的type一般是text。
format 定义日期时间格式,详细见 官方文档: https://www.elastic.co/guide/reference/mapping/date-format/.
注解:@CompletionField 定义关键词索引 要完成补全搜索
analyzer 对字段使用分词器,注意一般如果要使用分词器,字段的type一般是text。
searchAnalyzer 显示指定搜索时分词器,默认是和索引是同一个,保证分词的一致性。
maxInputLength:设置单个输入的长度,默认为50 UTF-16 代码点

JobBean示例:elasticsearch的bean可以和MySQL中的bean共用。

package com.craftsman.common.dto;

import com.fasterxml.jackson.annotation.JsonIgnore;
import lombok.Data;
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldType;

import java.util.Date;

@Data
@Document(indexName = "job_index")
public class JobForWorker {
    @Id
    @Field(type = FieldType.Integer)
    private Integer id;

    @Field(type = FieldType.Text, analyzer = "ik_max_word")
    private String contactor;

    @Field(type = FieldType.Keyword)
    private Integer certLevel;

    @Field(type = FieldType.Keyword)
    private Integer entCertLevel;

    @JsonIgnore
    private String avatar;

    @Field(type = FieldType.Keyword)
    private String avatarUrl;

    @Field(type = FieldType.Date, value = "first_pub_time")
    private Date firstPubTime;

    @Field(type = FieldType.Text, analyzer = "ik_max_word")
    private String desc;

    @Field(type = FieldType.Text, analyzer = "ik_max_word")
    private String projectDesc;

    @JsonIgnore
    private String proCode;

    @Field(type = FieldType.Keyword)
    private String proName;
    @JsonIgnore
    private String cityCode;

    @Field(type = FieldType.Keyword)
    private String cityName;

    @JsonIgnore
    private String areaCode;
    @Field(type = FieldType.Keyword)
    private String areaName;

    @JsonIgnore
    private Integer workTypeId;
    @JsonIgnore
    private Integer subWorkTypeId;

    @Field(type = FieldType.Keyword)
    private String workType;

    @Field(type = FieldType.Keyword)
    private String subWorkType;
    @JsonIgnore
    private String imgId;

    @Field(type = FieldType.Keyword)
    private String imgUrl;

    /**
     * 来源,normal 匠圈,advanced 优匠
     */
    @Field(type = FieldType.Keyword)
    private String source;

    /**
     * 是否收藏
     */
    @Field(type = FieldType.Boolean)
    private boolean star;
    /**
     * 是否置顶
     */
    @Field(type = FieldType.Boolean)
    private boolean top;
    /**
     * 置顶截止时间
     */
    @Field(type = FieldType.Date)
    private Date topEndTime;

    /**
     * 录入者
     */
    @JsonIgnore
    private Integer importerId;

    /**
     * 经度
     */
    @Field(type = FieldType.Double)
    private Double lon;

    /**
     * 纬度
     */
    @Field(type = FieldType.Double)
    private Double lat;

    /**
     * 详细地址
     */
    @Field(type = FieldType.Text, analyzer = "ik_max_word")
    private String detailAddress;
}

增删改查

@Slf4j
@Component
public class ElasticSearchUtils {
    @Autowired
    private ElasticsearchRestTemplate elasticsearchRestTemplate;

    public <T> Iterable<T> save(Iterable<T> entities) {
        try {
            return elasticsearchRestTemplate.save(entities);
        } catch (Exception e) {
            log.error("error!!! write elasticsearch is empty");
            return null;
        }
    }

    public <T> T save(T t) {
        try {
            return elasticsearchRestTemplate.save(t);
        } catch (Exception e) {
            log.error("error!!! write elasticsearch is empty");
            return null;
        }
    }

    public Boolean delete(String id, Class<?> entityType) {
        try {
            String delete = elasticsearchRestTemplate.delete(id, entityType);
            return true;
        }catch (Exception e) {
            log.error("error!!! delete elasticsearch entity fail");
            return false;
        }
    }

    /**
     * @param entity : 这个实体对象必须是被@Document注解且有indexName属性值)
     *               以及主键必须有值,其它参数有无都没关系(和用主键id删除没区别)
     */
    public Boolean delete(Object entity) {
        try {
            String delete = elasticsearchRestTemplate.delete(entity);
            return true;
        }catch (Exception e) {
            log.error("error!!! delete elasticsearch entity fail");
            return false;
        }

    }

    /**
     * 修改
     *
     * @param id        主键
     * @param object    要修改的数据
     * @param classType 文档类型
     * @return
     */
    public Boolean update(String id, Object object, Class<?> classType) {
        UpdateQuery.Builder builder = UpdateQuery.builder(id)
                .withDocument(org.springframework.data.elasticsearch.core.document.Document.parse(JsonUtils.toJson(object)));
        IndexCoordinates of = IndexCoordinates.of(classType.getAnnotation(Document.class).indexName());
        try {
            UpdateResponse update = elasticsearchRestTemplate.update(builder.build(), of);
            return true;
        }catch (Exception e) {
            log.error("error!!! update elasticsearch entity fail");
            return false;
        }
    }

    public <T> T get(String id, Class<T> classType) {
        T t = elasticsearchRestTemplate.get(id, classType);
        return t;
    }

    public Boolean exists(String id, Class<?> classType) {
        return elasticsearchRestTemplate.exists(id, classType);
    }

    public <T> Pair<Long, List<T>> selectPage(Integer pageNum,
                                              Integer pageSize,
                                              String key,
                                              String value,
                                              Class<T> classType) {
        val matchQuery = QueryBuilders
                .matchQuery(key, value);

        val boolQuery = QueryBuilders.boolQuery()
                .must(matchQuery);

        return nativeSearch(pageNum, pageSize, classType, boolQuery);
    }


    public <T> Pair<Long, List<T>> selectJobPage(Integer pageNum,
                                                 Integer pageSize,
                                                 String key,
                                                 String value,
                                                 String termKey,
                                                 String termValue,
                                                 Class<T> classType) {

        val termQuery = QueryBuilders.termQuery(termKey, termValue);
        val matchQuery = QueryBuilders
                .matchQuery(key, value);

        val boolQuery = QueryBuilders.boolQuery()
                .must(termQuery)
                .must(matchQuery);

        return nativeSearch(pageNum, pageSize, classType, boolQuery);
    }

    private <T> Pair<Long, List<T>> nativeSearch(Integer pageNum,
                                                 Integer pageSize,
                                                 Class<T> classType,
                                                 BoolQueryBuilder boolQuery) {
        val query = new NativeSearchQueryBuilder()
                .withQuery(boolQuery)
                .withSorts(SortBuilders.scoreSort().order(SortOrder.DESC)) //相关性排序
                .withSorts(SortBuilders.fieldSort("first_pub_time").order(SortOrder.DESC))//时间排序
                .withPageable(PageRequest.of(pageNum == null || pageNum == 0 ? 0 : pageNum - 1, pageSize))
                .build();

        IndexCoordinates of = IndexCoordinates.of(classType.getAnnotation(Document.class).indexName());
        SearchHits<T> page = null;
        try {
            page = elasticsearchRestTemplate.search(query, classType, of);
        } catch (Exception e) {
            log.error("elasticsearch error!!!!check linux");
            return null;
        }
        log.info("es total num: {}", page.getTotalHits());
        val tuple = Pair.of(page.getTotalHits(), page.getSearchHits().stream().map(SearchHit::getContent).collect(Collectors.toList()));
        return tuple;
    }
}

欢迎欢迎~热烈欢迎~