Simple is better than complex
QuickEntity is a python module designed to help you train your own Named Entity Recognition (NER) model quickly and easily. With quick NER, you can customize model your NER model by providing your own list of named entities.
You can install QuickEntity by runing the following command:
pip install quickentity
- Easy-to-use API for training NER models
- Ability to set language and load custom named entity lists
- Automatic saving of trained model to disk
QuickEntity requires:
- spacy (>= 3.5.0)
- nltk (>=3.7)
To use QuickEntity, you need to import the QuickEntity module
from quickentity import QuickEntity
Then, you need to create an instance of the Quick_NER class:
phrase = "Steve played a pivotal role in the development of Apple, the company responsible for creating innovative products such as the iPad"
QE = QuickEntity(language="en", phrase=phrase, save_model=False)
The language
parameter specifies the language of the text you want to train the model on (default is "en"
). The phrase
parameter is an exemple text phrase used to create a Doc
object for training. The save_model
parameter specifies whether to save the treined model to disk or not (default is True).
Before training the model, you need to load entity list using the read_json
ent_list = QE.read_json("entities.json")
The named entity list should be a JSON file with a dictionary of entities and their labels with prefix B-
. Here's an example:
{
"Apple":"B-ORG",
"Steve":"B-PERSON",
"iPad":"B-PRODUCT"
}
Next, process your text data using the process_text
method to obtain the list of words, spaces, and entity labels. Look how to do it:
model = QE.process_text(ent_list)
Once you've processed your text data, you should train the model using the train
method:
QE.train(model)
Visualize the results of your model using the show
method:
QE.show()
'''
1- install:
pip install quickentity
2- punkt package from nltk is required to tokenization:
import nltk
nltk.download('punkt')
'''
from quickentity import QuickEntity
words = """ Steve played a pivotal role in the development of Apple,
the company responsible for creating innovative products such as the iPad."""
# config the QuickEntity, phrase is required
#language is "en" by default,
#save_model is true by default.
QE = QuickEntity(language="en",phrase=words, save_model=True)
#load entities file in json format
ent_list = QE.read_json("ent_list.json")
# process the text data to associate entities labels
model = QE.process_text(ent_list)
# train de model
QE.train(model)
# output :
# file ./train.spacy saved on disk
# view in a jupyter-based notebook.
QE.show()
QuickEntity(language, phrase, save_model)
Create an instance of the Quick_NER class.
language
(string): Language for the NER model. Default is"en"
.phrase
(string): Example text used for training.save_model
(bool): Whether to save the treined model to disk. Default isTrue
set_language(language)
: Set the language of the NER model.
language
(string): Language for NER model.
read_json(file)
: Load named entities from a JSON file.
file
(string): Path to JSON file containing named entities.
process_text(text)
: Process the entities obtained from the read_json
to obtain the list of words, spaces, and entity labels.
ent_list
(object): Object processed withread_json
method.
train(model)
: Train the NER model using the processed training data.
model
(object) : Object obtained from theprocess_text
method.
show()
: Visualize the results of the trained model.
- None.
This project is licensed under the MIT License