Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: move string_value to external scanner #235

Closed
wants to merge 2 commits into from

Conversation

calebdw
Copy link
Collaborator

@calebdw calebdw commented Mar 20, 2024

Checklist

  • All tests pass in CI
  • There are enough tests for the new fix/feature
  • Grammar rules have not been renamed unless absolutely necessary (0 rules renamed)
  • The conflicts section hasn't grown too much (0 new conflicts)
  • The parser size hasn't grown too much (no change)

Hello!

This moves the string_value rule to the external scanner.

Closes #234

Thanks!

@amaanq
Copy link
Member

amaanq commented Mar 21, 2024

does it have to be in the external scanner? this seems like an easy enough case to handle with an alternation I think?

@calebdw
Copy link
Collaborator Author

calebdw commented Mar 21, 2024

Some context on the issue:

The $string_value should match everything between the '...' except for the two possible escape sequences, \\ and \'.

A simplified version of the string rule is:

      string: $ => seq(
        '\'',
        repeat(choice(
          alias(token(choice('\\\\', '\\\'')), $.escape_sequence),
          token(prec(1, repeat1(/\\?[^'\\]/))), // $string_value
        )),
        '\'',
      ),

Note that the prec was added to $string_value because there were issues with parsing a string that started with a comment char (e.g., '#foo').

The above rule correctly parses single quoted strings with escape sequences and strings with comment chars. The problem arises when trying to parse a string with both an escape sequence and a comment char (e.g., '#^(\\)#').

I tried a lot of things to fix the issue but wasn't getting anywhere---this does not occur with double quoted strings because it uses the scanner for the body of the string. So I eventually just gave up and decided to use the scanner for the $string_value as well. This also solved the comment precedence issue so kinda killed two birds with one stone.

Granted, I'm not a tree-sitter (or regex) expert so there may be a better way to solve this---I'm not sure what you mean by alternation, could you show an example?

@calebdw
Copy link
Collaborator Author

calebdw commented Mar 25, 2024

Any thoughts @amaanq? I'm open to ideas :)

@amaanq
Copy link
Member

amaanq commented Apr 12, 2024

yeah you can just use + in the regex instead of repeat1, sorry for taking a while got super busy updating every other grammar and just forgot about this, I'll open a PR with a fix soon, if it doesn't look right let me know, but I think it should be good.

@amaanq amaanq closed this Apr 12, 2024
@calebdw
Copy link
Collaborator Author

calebdw commented Apr 12, 2024

No worries, I understand!

@calebdw
Copy link
Collaborator Author

calebdw commented Apr 23, 2024

@amaanq,

I've tried using:

      string_value: _ => token(prec(1, /(\\?[^'\\])+/)),

instead of:

      string_value: _ => token(prec(1, repeat1(/\\?[^'\\]/))),

but no luck. Am I missing something?

@amaanq
Copy link
Member

amaanq commented Apr 28, 2024

Sorry I've been kinda blocked on this because of #3305 upstream, I was debugging that with php because even rewriting the scanner to use the array/alloc headers didn't fix it, took a bit of time + had other stuff to do so I'll try and update php asap now, sorry

FYI that PR should fix the very sporadic failures when editing and undoing edits with php where the syntax trees are not consistent

@calebdw calebdw deleted the string_scanner branch May 19, 2024 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parsing error
2 participants