createToken shortcut functions #1029

HoldYourWaffle · 2019-09-01T10:40:16Z

The current createToken function does exactly what it should do, but it's pretty clunky to use (especially when you have a lot of tokens).

I propose adding 2 extra helper functions to make creating your lexicon less painful:

createToken shorthand

Definition:

function createToken(name: string, pattern: RegExp, config?: Partial<ITokenConfig>) { ... }

As you can see it's just a simple shorthand for the 2 properties all tokens will need. Most tokens don't need any extra configuration, saving a lot of space and increasing readability. With this helper the full lexicon from the tutorial page can be shortened to this:

const Identifier = createToken("Identifier", /[a-zA-Z]\w*/ )
// We specify the "longer_alt" property to resolve keywords vs identifiers ambiguity.
// See: https://github.com/SAP/chevrotain/blob/master/examples/lexer/keywords_vs_identifiers/keywords_vs_identifiers.js
const Select = createToken("Select", /SELECT/, { longer_alt: Identifier })
const From = createToken("From", /FROM/, { longer_alt: Identifier })
const Where = createToken("Where", /WHERE/, { longer_alt: Identifier })

const Comma = createToken("Comma", /,/)
const Integer = createToken("Integer", /0|[1-9]\d*/ )
const GreaterThan = createToken("GreaterThan", />/ )
const LessThan = createToken("LessThan", /</ )

const WhiteSpace = createToken("WhiteSpace", /\s+/, { group: chevrotain.Lexer.SKIPPED })

I don't want to change the existing API, I just want to add an overload for convenience.

createKeyword

Almost every language has keywords, and all keywords have two things in common:

The longer_alt option should be set to an Identifier-like token (if such a token exists)
The name and pattern of the token are (almost always) the same

This repetition is ground for the other helper function I want to propose:

function createKeywordToken(keyword: string, identifierToken?: TokenType, caseSensitive = true) { ... }

The identifierToken parameter is needed to set the longer_alt option. It's optional because not every language might have such an Identifier-like token. There's also a caseSensitive option to toggle the i option for the generated regex pattern, SQL has caseinsensitive keywords for example.
The name property of the generated token would be the first-capitalized version of the keyword parameter, as per the 'standard'.

This extra helper allows the lexicon from the tutorial page to be shortened even further to just this:

const Identifier = createToken("Identifier", /[a-zA-Z]\w*/ )

const Select = createKeywordToken("select" Identifier, false)
const From = createKeywordToken("from", Identifier, false)
const Where = createKeywordToken("where", Identifier, false)

const Comma = createToken("Comma", /,/)
const Integer = createToken("Integer", /0|[1-9]\d*/ )
const GreaterThan = createToken("GreaterThan", />/ )
const LessThan = createToken("LessThan", /</ )

const WhiteSpace = createToken("WhiteSpace", /\s+/, { group: chevrotain.Lexer.SKIPPED })

I feel like there should be a way to set the Identifier-like token only once for all keywords, but I can't figure out how such a system would work. Perhaps someone with more JS experience can tune in on that.

The tutorial page is probably not the best example for the usefulness of these shorthands, the benefit grows massively the more tokens (especially keywords) you add (like when you want to put in a full SQL language).
I always define these shorthand functions myself, but I figured they could be useful for more people.

The text was updated successfully, but these errors were encountered:

bd82 · 2019-09-01T14:57:41Z

An interesting reference would be the Java Lexer Implemented as part of Prettier Java

https://github.com/jhipster/prettier-java/blob/master/packages/java-parser/src/tokens.js

HighLights

HoldYourWaffle · 2019-09-02T19:30:15Z

Interesting.

So do you think it'd make sense to add some of these shortcuts to the main API?

bd82 · 2019-09-02T21:30:34Z

So do you think it'd make sense to add some of these shortcuts to the main API?

Not sure, I am not sure its a good idea to have multiple ways to do things, additionally it is more consistent to have most of the APIs using the JavaScript Config Pattern. and I don't think it is a huge advantage to save a few property names...

I think end users would benefit more from other upgrades to the tokenizer, for example:

Support Lazy/Defered references in createToken function #999 which will make Keywords Vs Identifiers handling easier.

bd82 · 2019-09-04T17:04:39Z

I will be closing this for now as I think it would be quite easy to any end user to implement this themselves if they want and it would also be confusing to provide multiple ways to accomplish the same thing so I would rather stick with the more verbose but also extensible and consistent config object style API.

Thanks for the suggestion and feedback, it is appreciated 👍

Also note that if you want a really concise API for token/lexer definitions it is always possible to build
this externally from Chevroain, e.g:

https://github.com/aetherised/lexdoc

@HoldYourWaffle if you are interested in "good first issues" improvements in the API area
have a look at #1032, perhaps it would interest you? 😄

HoldYourWaffle · 2019-09-06T16:05:51Z

I agree that there isn't much value to adding this, I just figured "why not suggest it" 😄
Great to see you being open on this kind of stuff, makes contributing to this project a very pleasant experience.

Lexdoc looks really promising, maybe we could add a reference to it somewhere in the tutorial?
#1032 looks way out of my league unfortunately 😅

bd82 · 2019-09-06T18:12:15Z

#1032 looks way out of my league unfortunately 😅

It is just a similar tiny method that does return this.CST_STACK[this.CST_STACK.length - 1]
Most of the work is in APIs/docs/testing which is why it is a good first issue
as there is very little actual code / logic and it touches on full flow of implementing a new feature.

HoldYourWaffle · 2019-09-07T10:25:44Z

Now that I've read the issue with a not-very-tired-head it indeed looks very simple. Unfortunately I don't have the time to help right now due to deadline, but if it's still open when I do have time I'll be sure to take a look 😄

bd82 · 2019-09-07T10:53:10Z

Now that I've read the issue with a not-very-tired-head it indeed looks very simple. Unfortunately I don't have the time to help right now due to deadline, but if it's still open when I do have time I'll be sure to take a look 😄

Great, take your time no rush 😄
I am intentionally leaving some simple non-urgent issues open to be used as "good first issues"...
To make it easier for new contributors.

bd82 added API Question labels Sep 4, 2019

bd82 closed this as completed Sep 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

createToken shortcut functions #1029

createToken shortcut functions #1029

HoldYourWaffle commented Sep 1, 2019

bd82 commented Sep 1, 2019

HoldYourWaffle commented Sep 2, 2019

bd82 commented Sep 2, 2019

bd82 commented Sep 4, 2019

HoldYourWaffle commented Sep 6, 2019

bd82 commented Sep 6, 2019

HoldYourWaffle commented Sep 7, 2019

bd82 commented Sep 7, 2019

createToken shortcut functions #1029

createToken shortcut functions #1029

Comments

HoldYourWaffle commented Sep 1, 2019

createToken shorthand

createKeyword

bd82 commented Sep 1, 2019

HighLights

HoldYourWaffle commented Sep 2, 2019

bd82 commented Sep 2, 2019

bd82 commented Sep 4, 2019

HoldYourWaffle commented Sep 6, 2019

bd82 commented Sep 6, 2019

HoldYourWaffle commented Sep 7, 2019

bd82 commented Sep 7, 2019