中文分词
在ES中支持中文分词,如smartCN、IK等,推荐使用
IK分词器
。安装IK
开源分词器IK的GIthub:https://github.com/medcl/elasticsearch-analysis-ik/releases
📢📢📢IK分词器版本与ES版本必须对应
Docker中使用IK分词器插件
# 查询ES镜像 root@redis01:~/es-kibana# docker ps # 进入容器,拿到plugins目录 root@redis01:~/es-kibana# docker exec -it dab88e080a1a bash elasticsearch@dab88e080a1a:~$ ll total 912 drwxrwxr-x 1 root root 4096 Apr 14 07:55 ./ drwxr-xr-x 1 root root 4096 Mar 30 00:43 ../ -rw-r--r-- 1 root root 220 Mar 30 00:43 .bash_logout -rw-r--r-- 1 root root 3771 Mar 30 00:43 .bashrc drwxrwxr-x 3 elasticsearch root 4096 Apr 14 07:55 .cache/ -rw-r--r-- 1 root root 807 Mar 30 00:43 .profile -r--r--r-- 1 root root 3860 Mar 29 21:18 LICENSE.txt -r--r--r-- 1 root root 858797 Mar 29 21:22 NOTICE.txt -r--r--r-- 1 root root 2710 Mar 29 21:18 README.asciidoc drwxrwxr-x 1 elasticsearch root 4096 Mar 30 00:42 bin/ drwxrwxr-x 4 elasticsearch root 4096 Apr 14 06:43 config/ drwxrwxr-x 5 elasticsearch root 4096 Apr 15 02:49 data/ dr-xr-xr-x 1 root root 4096 Mar 29 21:25 jdk/ dr-xr-xr-x 3 root root 4096 Mar 29 21:25 lib/ drwxrwxr-x 1 elasticsearch root 4096 Apr 14 07:55 logs/ dr-xr-xr-x 66 root root 4096 Mar 29 21:25 modules/ drwxrwxr-x 2 elasticsearch root 4096 Mar 29 21:22 plugins/ elasticsearch@dab88e080a1a:~$ cd plugins/ elasticsearch@dab88e080a1a:~/plugins$ pwd /usr/share/elasticsearch/plugins elasticsearch@dab88e080a1a:~/plugins$
下载
下载对应版本IK,大小4.3M
解压
我这里在windows解压,命名为ik-8.1.2
上传文件到Linux的
/root/es-kibana/
目录下。数据卷挂载
将docker中进行数据卷挂载映射
# 关闭当前的es kibana root@redis01:~/es-kibana# docker compose down # 编辑docker-compose.yml root@redis01:~/es-kibana# vim docker-compose.yml
docker-compose.yml
version: "1.0" volumes: data: config: networks: # 声明使用网络 es: services: elasticsearch: image: elasticsearch:8.1.2 ports: - "9200:9200" - "9300:9300" networks: - "es" environment: - "discovery.type=single-node" - "ES_JAVA_OPTS=-Xms512m -Xmx512m" volumes: - ./elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml - data:/usr/share/elasticsearch/data - config:/usr/share/elasticsearch/config - ./ik-8.1.2:/usr/share/elasticsearch/plugins/ik-8.1.2 kibana: image: kibana:8.1.2 ports: - "5601:5601" networks: - "es" volumes: - ./kibana.yml:/usr/share/kibana/config/kibana.yml
启动
#-d 以后台方式启动 root@redis01:~/es-kibana# docker compose up -d
IK使用
IK有两种颗粒度的拆分:
- ik_smart:会做最粗颗粒的拆分
- ik_max_word:会将文本做最细颗粒的拆分
POST /_analyze { "analyzer": "ik_smart", "text":"中华人民共和国国歌" } POST /_analyze { "analyzer": "ik_max_word", "text":"中华人民共和国国歌" }
测试IK分词器
DELETE /test PUT /test { "mappings":{ "properties":{ "title":{ "type":"text", "analyzer":"ik_max_word" } } } } PUT /test/_doc/1 { "title":"今天是中华人民共和国成立多少周年,应该放中华人民共和国国歌" } GET /test/_search { "query": { "term": { "title": { "value": "共和国" } } } }
现在就可以汉字词的检索了。