Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed parsing #44

Open
GordonSmith opened this issue Mar 9, 2020 · 11 comments
Open

Failed parsing #44

GordonSmith opened this issue Mar 9, 2020 · 11 comments

Comments

@GordonSmith
Copy link

(\b(?i:(integer|unsigned))(EX)?\b)

Specifically the case insensitive part ?i:

@bd82
Copy link
Owner

bd82 commented Mar 11, 2020

Thanks for reporting this @GordonSmith
Do you know which version of ECMAScript added this syntax?

This makes wonder again if I could transition Chevrotain to user another regExp parser (one which I do not have to maintain) as over time more and more regExp features are added and I've built this project mainly for use in Chevrotain.

@GordonSmith
Copy link
Author

I don't know off hand, but do know I have been using it for several years now.

@GordonSmith
Copy link
Author

Out of curiosity why does Chevrotain need to parse the RegEx? I would have thought using them "black box" would have been sufficient?

@GordonSmith
Copy link
Author

I didn't mention why this is import! I want to common up my VS Code "syntaxes" regex with the ones I use in the Chevrotain lexer (with plans to automate the syncing).

In VS Code the declaration looks like this (from json file):

        {
            "name": "entity.name.type.ecl",
            "match": "\\b(?i:(integer|unsigned))[1-8]?\\b"
        },

While in Chevrotain:

const IntegerType = createToken({ name: "IntegerType", pattern: /(\b(integer|unsigned)[1-8]?\b)/i });

My current plan was to standardize the two to look like this:

  • \\?i:(b((integer|unsigned))[1-8]?\\b)
  • /(\b(integer|unsigned)[1-8]?\b)/i

At which point I would have some hope of auto syncing...

@bd82
Copy link
Owner

bd82 commented Mar 12, 2020

Out of curiosity why does Chevrotain need to parse the RegEx? I would have thought using them "black box" would have been sufficient?

It is not mandatory, just for optimization purposes, by understand which characters can match each token pattern Chevrotain can save quite a-lot of time during the lexing phase.

See: https://sap.github.io/chevrotain/docs/guide/performance.html#ensuring-lexer-optimizations

So I am uncertain this issue should be a blocker for you

@GordonSmith
Copy link
Author

Interesting - as a potential side project I could see a "chevrotain grammar -> VSCode Language Extension" utility being able to get a huge % of the grunt work automated.

My gut says there is a disconnect between the Grammar and the CST tree (the loss of some of the semantic logic from the parser definition) that if it was preserved in the CstNode as information would simplify the Visitor pattern somewhat. At the moment it feels like I have to write everything twice (but slightly differently), but if I knew that certain children where "OR" and what the sequential order of the children was, then I could simply walk the CST Tree with a simpler visit pattern.

(sorry for nattering off topic).

@bd82
Copy link
Owner

bd82 commented Mar 13, 2020

as a potential side project I could see a "chevrotain grammar -> VSCode Language Extension" utility being able to get a huge % of the grunt work automated.

I think you be describing something like Xtext

@bd82
Copy link
Owner

bd82 commented Mar 13, 2020

I have created some editor logic utils specifically for the XML language.
The most complex case is the content assist logic which is responsible for understanding the content assist **syntactic context" and executing a relevant callback to provide the **semantic** suggestions.

I find it hard to imagine how such logic would be generalized to a library
The grammar information by itself is not sufficient, perhaps some additional "annotations" could provide the required extra info.

EDIT: you may want to look here: Chevrotain/chevrotain#921

@bd82
Copy link
Owner

bd82 commented Mar 13, 2020

Regarding the CST structure. it is intentionally very simple to allow fast construction and traversal.

You may be able to override methods from the tree builder trait to change the CST structure being built.

Feel free to share your results.

@GordonSmith
Copy link
Author

Re XText - yes, but 100% within JS (I actually had some experience with XText about 5 years ago, while writing a language extension for Eclipse - the same language I am partially implementing in Chevrotain now...).
FWIW I already have an LSP implementation, which hooks into our native compiler (c++ based) and while its semantic output is "ok" for a lot of things, it is only accurate at the time of syntax check. My primary goal is to see if I can get enough grammar together for auto formatting the language.

@bd82
Copy link
Owner

bd82 commented Mar 15, 2020

Perhaps a prettier plugin is relevant for you, here are a couple of examples using Chevrotain:

Although as you mentioned you main issue is the parsing enough of a not well defined language to properly parse it.

The Java Parser (in prettier-java) above uses a-lot of back tracking as lookahead to stay close to the Java Spec which is very well defined just not LL(K)... perhaps such backtracking would be of use to you when handling your difficult grammar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants