Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Operator escaping #1221

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

WIP: Operator escaping #1221

wants to merge 12 commits into from

Conversation

let-def
Copy link
Contributor

@let-def let-def commented Apr 17, 2017

This is another attempt at solving https://github.com/facebook/reason/pull/909.

The rules for operator escaping now:

  • / and * can be escaped,
  • only /* and */ have to be escaped to /\* and *\/ so that they cannot be confused with comment sequence
  • when printing, the minimum number of \ are reintroduced (for instance, ^\/ in a source will be reprinted as ^/ because it is not ambiguous)

Implementation:

  • unescaping is now done at the lexer-level: this makes the code simpler, plus it avoid one intermediate AST
  • lexing of operators accepts all characters, there is no attempt at avoiding */ or /* which would require a more expressive lexer generator, but a post-lexing step scan the lexeme for beginning of comments and cut the lexeme (giving back characters to the lexer)

Corner cases (need to reach a consensus...):
\/* or \*/ are erroneous: they are lexed as an empty operator followed by beginning / ending of comment.
For instance: \/* is lexed as \ followed by /*, \ is interpreted as escaping ``, so this means the empty operator.
So what should this sequence mean? An erroneous beginning/ending of comment that deserve a warning or the escaped version of an operator that looks like beginning/ending of comment?

GC stat after
==============

parsing:
        allocated_words: 37657442
        minor_words: 15582165
        promoted_words: 2166069
        major_words: 24241346
        minor_collections: 103
        major_collections: 24
        heap_words: 7015424
        heap_chunks: 20
        top_heap_words: 7015424
        compactions: 0

reformatting:
        allocated_words: 153497667
        minor_words: 131135436
        promoted_words: 36777601
        major_words: 59139832
        minor_collections: 553
        major_collections: 41
        heap_words: 9278464
        heap_chunks: 22
        top_heap_words: 9278464
        compactions: 0

GC stat before
==============

parsing:
        allocated_words: 41243812
        minor_words: 19168535
        promoted_words: 3911746
        major_words: 25987023
        minor_collections: 117
        major_collections: 25
        heap_words: 7015424
        heap_chunks: 20
        top_heap_words: 7015424
        compactions: 0

reformatting:
        allocated_words: 159464070
        minor_words: 137101839
        promoted_words: 39600932
        major_words: 61963163
        minor_collections: 576
        major_collections: 43
        heap_words: 8068096
        heap_chunks: 21
        top_heap_words: 8068096
        compactions: 0
GC stat after
=============

parsing:
        allocated_words: 35870858
        minor_words: 13795581
        promoted_words: 2141052
        major_words: 24216329
        minor_collections: 96
        major_collections: 24
        heap_words: 7015424
        heap_chunks: 20
        top_heap_words: 7015424
        compactions: 0

reformatting:
        allocated_words: 149597673
        minor_words: 127235442
        promoted_words: 36784443
        major_words: 59146674
        minor_collections: 539
        major_collections: 42
        heap_words: 395776
        heap_chunks: 1
        top_heap_words: 9278464
        compactions: 2
Replace a global buffer by a buffer local to parsing code.

This fixes a bug when processing multiple inputs. For instance with
refmt a.re b.re, b.re comments will (incorrectly) be read from a buffer
that begins with content from a.
prefix ! is now "not"
postfix ^ is now dereferencing (instead of prefix !)
infix ++ is now string concatenation (instead of infix ^)

^ now defines a family of postfix operators (^@, ^/, etc), that cannot
terminates with ".".
(How should existing ocaml operators be remapped?)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants