Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Onnx support for Canine Model #1987

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RaghavPrabhakar66
Copy link

What does this PR do?

Add support for Canine.

To run model for Sequence Classification or Token Classification, make a temporary change here from

VOCAB_SIZE = "vocab_size"

to

TYPE_VOCAB_SIZE = "vocab_size"

Fixes #555

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@ozancaglayan

@RaghavPrabhakar66 RaghavPrabhakar66 changed the title [WIP] Add Onnx support for Canine Model Add Onnx support for Canine Model Aug 12, 2024
@ozancaglayan
Copy link

Hi there. Thanks a lot! I have another set of changes for this in my private branch but I think the main problem we may be hitting here is the ad-hoc implementation of local attention in the Canine model which does data dependent pythonic computations that are not properly traced. I think the shape incompatibility may be due to the tracing sticking to the dummy inputs given at tracing time which are no longer true later?

We'll benefit an input from a more experienced dev here

@ozancaglayan
Copy link

ozancaglayan commented Aug 12, 2024

For my own PR, the major difference was to add a NormalizedTextConfig class for the vocabulary issue. This model does not have an explicit vocabulary hence does not have vocab_size attribute in their model config. Vocab size is the maximum unicode code points which is hardcoded as 1114112 in the model. Below class implements a fake vocab_size attribute for the config. I think TYPE_VOCAB_SIZE is the size of the token_type_ids choices and is 16 in the model config

In [13]: AutoModel.from_pretrained("google/canine-s").config.type_vocab_size
Out[13]: 16

In [14]: CanineTokenizer().vocab_size
Out[14]: 1114112

so this is what I did for my patch:

class CanineNormalizedTextConfig(NormalizedTextConfig):
    def __getattr__(self, attr_name):
        if attr_name.upper() == 'VOCAB_SIZE':
            return 1114112
        return super().__getattr__(attr_name)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Community contribution - optimum.exporters.onnx support for new models!
2 participants