udd06

Upravljanje digitalnim dokumentima Elasticsearch Analyzer / Digital documents management Elasticsearch Analyzer

DESCRIPTION:

SerbianAnalyzerElasticSearchPlugin is used for analyzing and preparing input text for indexing.

This plugin is written for Java 13.0.1, Gradle 6.0 and Elasticsearch 7.4.0.

TEXT PROCESS FLOW:

It uses StandardTokenizer for creating tokens out of input text. After tokenizer tokens go trough five filters:

1. LowerCaseFilter - filter transforms all characters from token to lowercase.
2. LatCyrFilter - takes lowercase words, and turns them from cyrilic to latinic.
3. StopFilter - removes the stop words/tokens.
4. SnowballFilter - stems tokens, taking only root of the word.
5. RemoveAccentsFilter - replaces 'Dj' to 'D'.

HOW TO BUILD PLUGIN

To build SerbianAnalyzer you will need Gradle. Position yourself in the root of the project and execute

./gradlew clean build

This will create distribution archive file located in build/distributions/ named serbian-analyzer-1.0-SNAPSHOT.zip.

HOW TO SETUP PLUGIN TO WORK WITH ELASTICSEARCH

Position yourself inside /bin directory of the elasticsearch root and execute

 ./elasticsearch-plugin install file:<absolute path of distribution archive>

Run elasticsearch server

Issue an mapping query on elasticsearch server:

 curl -H 'Content-Type: application/json' -X PUT -D '{"mappings":{"properties":{"content":{"type":"text","fields":{"sr":{"type":"text","analyzer":"serbian"},"en":{"type":"text","analyzer":"english"}}}}}}' http://localhost:9200/tweet

Afterwards, on creation of new tweet document, content field will be analyzed by default analyzer for english and custom serbian, which was installed as a plugin.

HOW TO REMOVE PLUGIN

Position yourself inside /bin directory of the elasticsearch root and execute

./elasticsearch-plugin remove serbian-analyzer

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
gradle/wrapper		gradle/wrapper
src/main		src/main
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
NOTICE.txt		NOTICE.txt
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

udd06

DESCRIPTION:

TEXT PROCESS FLOW:

HOW TO BUILD PLUGIN

HOW TO SETUP PLUGIN TO WORK WITH ELASTICSEARCH

HOW TO REMOVE PLUGIN

About

Releases

Packages

Contributors 2

Languages

License

chenejac/udd06

Folders and files

Latest commit

History

Repository files navigation

udd06

DESCRIPTION:

TEXT PROCESS FLOW:

HOW TO BUILD PLUGIN

HOW TO SETUP PLUGIN TO WORK WITH ELASTICSEARCH

HOW TO REMOVE PLUGIN

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages