-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mysql/Oracle] Add ports for sql/mysql/Oracle, and add a better test suite for the grammar. #4278
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… to compile that code.
kaby76
changed the title
[mysql/Oracle] Port sql/mysql/Oracle to CSharp.
[mysql/Oracle] Add ports for sql/mysql/Oracle, and add a better test suite for the grammar.
Oct 21, 2024
@kaby76 thanks! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a change to the sql/mysql/Oracle grammar. The grammar was refactored to be in "target agnostic format". Ports were added for Antlr4ng, CSharp, Java, Python3, and TypeScript. (For other targets, I leave as an exercise for anyone who would be interested.)
Grammar changes
Target agnostic format is the preferred format for grammars-v4 grammars. This allows for a single copy of the .g4 files to be shared across targets, and is very important in order to stop diverging fixes to the grammar between targets.
Target agnostic means that the Antlr actions are in the target-specific language but in a syntax palatable across targets. The only syntax that works is a method call. For example, instead of BIT_OR_SYMBOL : B I T '_' O R { this.type = this.determineFunction(MySQLLexer.BIT_OR_SYMBOL); };, the code must be encapsulated in a method call:
BIT_OR_SYMBOL : B I T '_' O R { this.doBitOr(); };
. Expressions that use operators, e.g.,>=
,<
,!
,&&
,||
, or constants like "true" and "false" cannot be used directly in Antlr Actions because they can have different syntax across targets. Assignments are also not allowed because the name of the field varies across runtimes.However, target agnostic format does not work perfectly because there are still different syntax for method codes. A method call for C++ is
this->isThisFine()
, for Python3self.isThisFine()
. For C# and Java, the method call can havethis.isThisFine()
orThisFine()
, whereas in TypeScript thethis.
must appear as inthis.isThisFine()
. Antlr4@header
is required in some targets, but the target-specific code cannot be placed in the grammar. To get around these syntactic issues, transformGrammar.py is used to modify the .g4s for the specific target.The start rule for the grammar was changed to allow multiple statements per file separated by a semi-colon.
Base class file names
The base class file names follow the standard used throughout grammars-v4:
<grammar name> ('Parser' | 'Lexer') 'Base' <target-specific file extension>
. MySQLBaseRecognizer is the parser base class, but Lexer and Parser both inherit from a Recognizer. I.e., they are all "recognizers."Old files saved
The old Antlr4ng grammar and "TypeScript/" code were moved to "original/".
A new test suite added
The original grammar and tester "demo.ts" tested less than 5% of the grammar. (
dotnet trcover -i "select *, _latin1'😁' from sakila.actor where actor_id = 1;"
) This PR adds the tests fromPositive-Technologies/examples/*.sql
. The driver code required some initialization settings for the lexer and parser, and these are added for each port. (Initialization code should be in the base class constructors.) Many SQL statements were commented out as they would not parse. See comments#NB
lines in the test files.NB: Someone will need to verify that the Oracle grammar is correct and verify that commented-out SQL statements are wrong.