Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mysql/Oracle] Add ports for sql/mysql/Oracle, and add a better test suite for the grammar. #4278

Merged
merged 21 commits into from
Oct 23, 2024

Conversation

kaby76
Copy link
Contributor

@kaby76 kaby76 commented Oct 10, 2024

This is a change to the sql/mysql/Oracle grammar. The grammar was refactored to be in "target agnostic format". Ports were added for Antlr4ng, CSharp, Java, Python3, and TypeScript. (For other targets, I leave as an exercise for anyone who would be interested.)

Grammar changes

Target agnostic format is the preferred format for grammars-v4 grammars. This allows for a single copy of the .g4 files to be shared across targets, and is very important in order to stop diverging fixes to the grammar between targets.

Target agnostic means that the Antlr actions are in the target-specific language but in a syntax palatable across targets. The only syntax that works is a method call. For example, instead of BIT_OR_SYMBOL : B I T '_' O R { this.type = this.determineFunction(MySQLLexer.BIT_OR_SYMBOL); };, the code must be encapsulated in a method call: BIT_OR_SYMBOL : B I T '_' O R { this.doBitOr(); };. Expressions that use operators, e.g., >=, <, !, &&, ||, or constants like "true" and "false" cannot be used directly in Antlr Actions because they can have different syntax across targets. Assignments are also not allowed because the name of the field varies across runtimes.

However, target agnostic format does not work perfectly because there are still different syntax for method codes. A method call for C++ is this->isThisFine(), for Python3 self.isThisFine(). For C# and Java, the method call can have this.isThisFine() or ThisFine(), whereas in TypeScript the this. must appear as in this.isThisFine(). Antlr4 @header is required in some targets, but the target-specific code cannot be placed in the grammar. To get around these syntactic issues, transformGrammar.py is used to modify the .g4s for the specific target.

The start rule for the grammar was changed to allow multiple statements per file separated by a semi-colon.

Base class file names

The base class file names follow the standard used throughout grammars-v4: <grammar name> ('Parser' | 'Lexer') 'Base' <target-specific file extension>. MySQLBaseRecognizer is the parser base class, but Lexer and Parser both inherit from a Recognizer. I.e., they are all "recognizers."

Old files saved

The old Antlr4ng grammar and "TypeScript/" code were moved to "original/".

A new test suite added

The original grammar and tester "demo.ts" tested less than 5% of the grammar. (dotnet trcover -i "select *, _latin1'😁' from sakila.actor where actor_id = 1;") This PR adds the tests from Positive-Technologies/examples/*.sql. The driver code required some initialization settings for the lexer and parser, and these are added for each port. (Initialization code should be in the base class constructors.) Many SQL statements were commented out as they would not parse. See comments #NB lines in the test files.

NB: Someone will need to verify that the Oracle grammar is correct and verify that commented-out SQL statements are wrong.

@kaby76 kaby76 changed the title [mysql/Oracle] Port sql/mysql/Oracle to CSharp. [mysql/Oracle] Add ports for sql/mysql/Oracle, and add a better test suite for the grammar. Oct 21, 2024
@kaby76 kaby76 marked this pull request as ready for review October 23, 2024 00:21
@teverett
Copy link
Member

@kaby76 thanks!

@teverett teverett merged commit 9f95ea4 into antlr:master Oct 23, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants