Automated Parser Generation and Serialization #333

dllliu · 2024-07-09T20:56:30Z

Package parser/serialization/metrics work into SerializedParserMetrics class
Pre processing/clustering for building point labels with DBScan
Parsers dynamically ran via dynamic module loading at runtime to obtain emitted tokens
Multiple abbreviation list support (tools to merge/sort abbreviations)

Checklist

Example notebook showcasing how to use
Unit tests for SerializedParserMetrics class
Detailed Documentation for relevant functions and SerializedParserMetrics class
Exception Handling for file/io errors, not enough points to cluster, invalid llm output
Parser and Serializer work has been combined

Notes

Schikit-Learn requires Python version of at least 3.9
Other added packages are: langchain, langchain-community, pyenchant, scikit-learn
Running all unit tests will take longer (SerializedParserMetrics class takes time to populate)

…erminal

…port for multiple abbreviation dicts, edge case handling and exception handling

…ldingMOTIF into dliu-parser-generations

…into dliu-parser-generations

…import, modify parser serialization to include list of parsed labels and serialized parsers

gtfierro · 2024-07-10T21:31:01Z

Hey @dllliu and @TShapinsky -- does this incorporate changes from any outstanding PRs? I want to make sure my review sticks to @dllliu's code

TShapinsky · 2024-07-11T00:05:34Z

Hey @dllliu and @TShapinsky -- does this incorporate changes from any outstanding PRs? I want to make sure my review sticks to @dllliu's code

There's a cherrypicked commit from the create parser UI branch which fixes the serialization. Besides that, no

gtfierro

Great work, @dllliu ! Let me know if you have any questions on my comments

notebooks/examples/basic.csv

buildingmotif/label_parsing/SerializedParserMetrics.py

gtfierro · 2024-07-11T15:21:02Z

buildingmotif/label_parsing/SerializedParserMetrics.py

+from buildingmotif.label_parsing.tools import abbreviationsTool, codeLinter
+
+
+class SerializedParserMetrics:


Let me know if you had something else in mind, but I'm wondering why this class is building the parser. I think this metrics class is better suited as an output of another method which builds the parsers.

I am not sure if I am misinterpreting, but the generate_parsers_for_points() and generate_parsers_for_clusters() methods in __init__ are building the parsers, while the class just populates the relevant instance variables with the appropriate information.

Yes, but these are called from this class. I think the entrypoint should be something like a ParserBuilder class which calls those generation methods, and emits the Metrics class. We are not constructing the metrics. We are constructing parsers! The metrics are a side-effect of this

@gtfierro The ParserBuilder class now emits a ParserMetrics class which takes care of serialization and gathering parsing related metrics. The ParserBuilder class still keeps track of the clustering and distance metrics, as that is directly related to parser generation. The notebook and tests have also been updated, let me know what you think.

buildingmotif/label_parsing/combinators.py

buildingmotif/label_parsing/docs/usage.py

gtfierro · 2024-07-11T15:27:39Z

pyproject.toml

+abbreviations = "^0.2.5"
+langchain = "^0.2.3"
+langchain-community = "^0.2.4"
+pyenchant = "^3.2.2"
+scikit-learn = "^1.5.0"


Can we gate your dependencies behind a feature flag? Call it something like label_parsing

The latest commit also addresses this, please let me know if this is resolved.

notebooks/examples/usage.md

… on develop, standardize documentation

…reviation recognition

dllliu and others added 13 commits July 1, 2024 11:34

parser generation for clusters and three modes: combined, separate, t…

728dada

…erminal

package parser and cluster into a class for easier serialization, sup…

afe5ba0

…port for multiple abbreviation dicts, edge case handling and exception handling

Delete buildingmotif/label_parsing/compare_parsers.py

dd67dd0

Delete buildingmotif/label_parsing/cluster.py

e3c665f

make abbreviations tool and write complete documentation for all code

8e08db9

Merge branch 'dliu-parser-generations' of https://github.com/NREL/Bui…

bc367af

…ldingMOTIF into dliu-parser-generations

fix imports and add notebook

ba4aa37

remove uneeded files

0e9976c

prep for serialization

b3490b6

use inspect to help generate argument serialization

045744b

Merge branch 'serialize_fix' of https://github.com/NREL/BuildingMOTIF …

42a065f

…into dliu-parser-generations

add unit tests, made usage document for llm loaded bia buildingmotif …

a7fb0ac

…import, modify parser serialization to include list of parsed labels and serialized parsers

remove nltk, matplotlib, seaborn

2231299

dllliu requested a review from gtfierro July 9, 2024 20:56

dllliu added 2 commits July 9, 2024 16:18

fix styling

12f863f

style label parsing

ffac50e

gtfierro requested changes Jul 11, 2024

View reviewed changes

dllliu added 3 commits July 11, 2024 16:23

changes requested, rename files/folders, restore combinators.py based…

b5a9dc9

… on develop, standardize documentation

ParserBuilder now emits ParserMetrics class, updated notebook and tests

8bdac3e

fix issue with abbreviations when writing to directory and update abb…

ef1ff91

…reviation recognition

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated Parser Generation and Serialization #333

Automated Parser Generation and Serialization #333

dllliu commented Jul 9, 2024 •

edited

Loading

gtfierro commented Jul 10, 2024

TShapinsky commented Jul 11, 2024

gtfierro left a comment

gtfierro Jul 11, 2024

dllliu Jul 12, 2024

gtfierro Jul 12, 2024

dllliu Jul 12, 2024

gtfierro Jul 11, 2024

dllliu Jul 12, 2024

		from buildingmotif.label_parsing.tools import abbreviationsTool, codeLinter


		class SerializedParserMetrics:

Automated Parser Generation and Serialization #333

Are you sure you want to change the base?

Automated Parser Generation and Serialization #333

Conversation

dllliu commented Jul 9, 2024 • edited Loading

Checklist

Notes

gtfierro commented Jul 10, 2024

TShapinsky commented Jul 11, 2024

gtfierro left a comment

Choose a reason for hiding this comment

gtfierro Jul 11, 2024

Choose a reason for hiding this comment

dllliu Jul 12, 2024

Choose a reason for hiding this comment

gtfierro Jul 12, 2024

Choose a reason for hiding this comment

dllliu Jul 12, 2024

Choose a reason for hiding this comment

gtfierro Jul 11, 2024

Choose a reason for hiding this comment

dllliu Jul 12, 2024

Choose a reason for hiding this comment

dllliu commented Jul 9, 2024 •

edited

Loading