[mysql/Oracle] Add ports for sql/mysql/Oracle, and add a better test suite for the grammar. #4278

kaby76 · 2024-10-10T11:00:43Z

This is a change to the sql/mysql/Oracle grammar. The grammar was refactored to be in "target agnostic format". Ports were added for Antlr4ng, CSharp, Java, Python3, and TypeScript. (For other targets, I leave as an exercise for anyone who would be interested.)

Grammar changes

Target agnostic format is the preferred format for grammars-v4 grammars. This allows for a single copy of the .g4 files to be shared across targets, and is very important in order to stop diverging fixes to the grammar between targets.

Target agnostic means that the Antlr actions are in the target-specific language but in a syntax palatable across targets. The only syntax that works is a method call. For example, instead of BIT_OR_SYMBOL : B I T '_' O R { this.type = this.determineFunction(MySQLLexer.BIT_OR_SYMBOL); };, the code must be encapsulated in a method call: BIT_OR_SYMBOL : B I T '_' O R { this.doBitOr(); };. Expressions that use operators, e.g., >=, <, !, &&, ||, or constants like "true" and "false" cannot be used directly in Antlr Actions because they can have different syntax across targets. Assignments are also not allowed because the name of the field varies across runtimes.

However, target agnostic format does not work perfectly because there are still different syntax for method codes. A method call for C++ is this->isThisFine(), for Python3 self.isThisFine(). For C# and Java, the method call can have this.isThisFine() or ThisFine(), whereas in TypeScript the this. must appear as in this.isThisFine(). Antlr4 @header is required in some targets, but the target-specific code cannot be placed in the grammar. To get around these syntactic issues, transformGrammar.py is used to modify the .g4s for the specific target.

The start rule for the grammar was changed to allow multiple statements per file separated by a semi-colon.

Base class file names

The base class file names follow the standard used throughout grammars-v4: <grammar name> ('Parser' | 'Lexer') 'Base' <target-specific file extension>. MySQLBaseRecognizer is the parser base class, but Lexer and Parser both inherit from a Recognizer. I.e., they are all "recognizers."

Old files saved

The old Antlr4ng grammar and "TypeScript/" code were moved to "original/".

A new test suite added

The original grammar and tester "demo.ts" tested less than 5% of the grammar. (dotnet trcover -i "select *, _latin1'😁' from sakila.actor where actor_id = 1;") This PR adds the tests from Positive-Technologies/examples/*.sql. The driver code required some initialization settings for the lexer and parser, and these are added for each port. (Initialization code should be in the base class constructors.) Many SQL statements were commented out as they would not parse. See comments #NB lines in the test files.

NB: Someone will need to verify that the Oracle grammar is correct and verify that commented-out SQL statements are wrong.

… to compile that code.

…c format.

…f tools.

teverett · 2024-10-23T15:36:16Z

@kaby76 thanks!

Port sql/mysql/Oracle to CSharp.

bb7ae07

kaby76 mentioned this pull request Oct 10, 2024

[mysql/Oracle] Grammar is ambiguous. #4279

Open

kaby76 added 12 commits October 10, 2024 09:12

Remove dependency on Antlr4BuildTasks (for trperf) and add StackQueue.

3304246

Add Antlr4ng target and redo actions in grammar.

9904bfc

Updates for CSharp and Antlr4ng targets.

68a13d3

Fix Antlr4ng code.

f590dbc

Force removal of original/ from Generated directory because tsc wants…

1f08867

… to compile that code.

Fix gitattributes for mysql/Oracle/examples/

5e84a1c

Update test to remove parse errors.

53c330d

Remove errors and tree files.

e472a6c

Complete fixing tests.

240afda

Update formatting of "#NB".

5b94998

Add Java port.

f9627af

Add source file.

7642e66

kaby76 changed the title ~~[mysql/Oracle] Port sql/mysql/Oracle to CSharp.~~ [mysql/Oracle] Add ports for sql/mysql/Oracle, and add a better test suite for the grammar. Oct 21, 2024

kaby76 added 7 commits October 21, 2024 20:50

Add TypeScript port.

4033bf4

Adjust TypeScript lexer and parser settings.

b4be465

Update CSharp target.

91afa95

Update other targets as we added a few new methods for target agnosti…

370e4cf

…c format.

Update base class for TypeScript.

af2ffc5

Clean up.

7d583c9

Updates for all targets due to additional Antlr Action changes.

cddd57a

kaby76 marked this pull request as ready for review October 23, 2024 00:21

Make sure lexer and parser are initialized even for trparse and trper…

4d7f3cc

…f tools.

teverett merged commit 9f95ea4 into antlr:master Oct 23, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mysql/Oracle] Add ports for sql/mysql/Oracle, and add a better test suite for the grammar. #4278

[mysql/Oracle] Add ports for sql/mysql/Oracle, and add a better test suite for the grammar. #4278

kaby76 commented Oct 10, 2024 •

edited

Loading

teverett commented Oct 23, 2024

[mysql/Oracle] Add ports for sql/mysql/Oracle, and add a better test suite for the grammar. #4278

[mysql/Oracle] Add ports for sql/mysql/Oracle, and add a better test suite for the grammar. #4278

Conversation

kaby76 commented Oct 10, 2024 • edited Loading

Grammar changes

Base class file names

Old files saved

A new test suite added

teverett commented Oct 23, 2024

kaby76 commented Oct 10, 2024 •

edited

Loading