Skip to content

Training of multi-label embeddings with k-shingled input sequences

License

Notifications You must be signed in to change notification settings

ulf1/torch-multilabel-embedding

Repository files navigation

PyPI version DOI PyPi downloads

torch-multilabel-embedding

The package contains a TensorFlow2/Keras class to train an Embedding matrix for multi-label inputs, i.e. instead of 1 ID per token (one hot encoding), N IDs per token can be provided as model input.

An TensorFlow2/Keras implementation can be found here: https://github.com/ulf1/keras-multilabel-embedding (pip install keras-multilabel-embedding)

Usage

import torch_multilabel_embedding as tml
import torch

# a sequence of multi-label data points
x_ids = [[1, 2, 4], [0, 1, 2], [2, 1, 4], [3, 2, 1]]
x_ids = torch.tensor(x_ids)

# initialize layer
layer = tml.MultiLabelEmbedding(
    vocab_size=5, embed_size=300, random_state=42)

# predict
y = layer(x_ids)

Appendix

Installation

The torch-multilabel-embedding git repo is available as PyPi package

pip install torch-multilabel-embedding
pip install git+ssh://[email protected]/ulf1/torch-multilabel-embedding.git

Install a virtual environment

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir
pip install -r requirements-dev.txt --no-cache-dir
pip install -r requirements-demo.txt --no-cache-dir

(If your git repo is stored in a folder with whitespaces, then don't use the subfolder .venv. Use an absolute path without whitespaces.)

Python commands

  • Jupyter for the examples: jupyter lab
  • Check syntax: flake8 --ignore=F401 --exclude=$(grep -v '^#' .gitignore | xargs | sed -e 's/ /,/g')
  • Run Unit Tests: PYTHONPATH=. pytest

Publish

pandoc README.md --from markdown --to rst -s -o README.rst
python setup.py sdist 
twine upload -r pypi dist/*

Clean up

find . -type f -name "*.pyc" | xargs rm
find . -type d -name "__pycache__" | xargs rm -r
rm -r .pytest_cache
rm -r .venv

Support

Please open an issue for support.

Contributing

Please contribute using Github Flow. Create a branch, add commits, and open a pull request.

About

Training of multi-label embeddings with k-shingled input sequences

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published

Languages