This is a PyQt5 application designed to provide a user-friendly interface for creating JSON files necessary for fine-tuning the Named Entity Recognition (NER) component of a SpaCy model. The tool allows users to manually tag entities in text and save the annotations in JSON format. It supports bulk tagging and includes features for undoing and redoing annotations.
- Load Text Files: Open and display text files for annotation. (only .txt is supported as of now)
- Manual Tagging: Select text and tag it with a specified entity type.
- Bulk Tagging: Automatically tag all occurrences of a specified text with a given entity type.
- Undo/Redo: Undo and redo tagging actions.
- Save Annotations: Save the tagged entities in JSON format.
- Context Menu: Right-click on tagged entities to delete them.
- Status Bar: Display messages in the status bar.
- Credits: Link to the developer's GitHub profile.
- Python 3.x
- SpaCy
- PyQt5
-
Clone the repository:
git clone https://github.com/MaddyDev-Glitch/ner-finetuning-tool.git cd ner-finetuning-tool
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the dependencies:
pip install -r requirements.txt python -m spacy download en_core_web_sm
-
Run the application:
python main.py
-
Load a text file:
- Click on the
Open File
button and select a text file to load.
- Click on the
-
Manual Tagging:
- Select text in the main window.
- Enter the tag name in the
Enter tag here...
box and pressEnter
.
-
Bulk Tagging:
- Enter the text to be tagged in the
Enter text to tag...
box. - Enter the tag name in the
Enter tag for bulk text...
box. - Click on the
Apply Bulk Tag
button.
- Enter the text to be tagged in the
-
Undo/Redo:
- Use
Ctrl+Z
to undo the last action. - Use
Ctrl+Y
to redo the last undone action.
- Use
-
Save Annotations:
- Click on the
Save to JSON
button and choose the location to save the JSON file.
- Click on the
-
Delete Tagged Entities:
- Right-click on an entry in the list and select
Delete
to remove the tag.
- Right-click on an entry in the list and select
Developed by MaddyDev-Glitch ✨
This project is licensed under the Apache 2.0 - see the LICENSE file for details.