-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automated Parser Generation and Serialization #333
base: develop
Are you sure you want to change the base?
Conversation
…port for multiple abbreviation dicts, edge case handling and exception handling
…ldingMOTIF into dliu-parser-generations
…into dliu-parser-generations
…import, modify parser serialization to include list of parsed labels and serialized parsers
Hey @dllliu and @TShapinsky -- does this incorporate changes from any outstanding PRs? I want to make sure my review sticks to @dllliu's code |
There's a cherrypicked commit from the create parser UI branch which fixes the serialization. Besides that, no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, @dllliu ! Let me know if you have any questions on my comments
from buildingmotif.label_parsing.tools import abbreviationsTool, codeLinter | ||
|
||
|
||
class SerializedParserMetrics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you had something else in mind, but I'm wondering why this class is building the parser. I think this metrics class is better suited as an output of another method which builds the parsers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if I am misinterpreting, but the generate_parsers_for_points()
and generate_parsers_for_clusters()
methods in __init__
are building the parsers, while the class just populates the relevant instance variables with the appropriate information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but these are called from this class. I think the entrypoint should be something like a ParserBuilder
class which calls those generation methods, and emits the Metrics class. We are not constructing the metrics. We are constructing parsers! The metrics are a side-effect of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gtfierro The ParserBuilder
class now emits a ParserMetrics
class which takes care of serialization and gathering parsing related metrics. The ParserBuilder
class still keeps track of the clustering and distance metrics, as that is directly related to parser generation. The notebook and tests have also been updated, let me know what you think.
pyproject.toml
Outdated
abbreviations = "^0.2.5" | ||
langchain = "^0.2.3" | ||
langchain-community = "^0.2.4" | ||
pyenchant = "^3.2.2" | ||
scikit-learn = "^1.5.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we gate your dependencies behind a feature flag? Call it something like label_parsing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest commit also addresses this, please let me know if this is resolved.
… on develop, standardize documentation
…reviation recognition
Checklist
Notes