You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using characters like "✓" and "✕", MLton reports:
Error: /home/runner/work/effekt/effekt/out/tests/effekt.mltests/lib_test.sml 391.69-391.76.
String constant with character too large for type: #"\u2715".
type: string
Error: /home/runner/work/effekt/effekt/out/tests/effekt.mltests/lib_test.sml 561.67-561.74.
String constant with character too large for type: #"\u2713".
type: string
either use a different string type on MLton which supports these large escapes (WideString is UTF-32 [?] as far as I understand http://mlton.org/Unicode)
or escape them byte-by-byte with the \[0-9]{3}-style syntax.
Solution
I think keeping strings UTF-8(-ish) is worth it, so I'd prefer the solution 2, even though it's slightly more work on our part.
case (acc, c) if (c.isControl || c < ' ' || c > '~') =>
acc ++= f"\\u${c.toInt}%04x"
I think that something like c.toString.getBytes("UTF-8") could be useful here to get a sequence of bytes which then get mapped to the \[0-9]{3} format each.
Testing
It would be very valuable to have a few more test for this behaviour and check that such characters work on all of the different backends.
The text was updated successfully, but these errors were encountered:
Fixeseffekt-lang#544
Update the `escape` function in `Transformer.scala` to convert characters outside the ASCII range to byte-by-byte format.
* Modify the `escape` function to use `c.toString.getBytes("UTF-8")` to get a sequence of bytes for characters outside the ASCII range.
* Map each byte to the `\\[0-9]{3}` format required for byte-by-byte escaping.
* Add a new test file `MLTests.scala` to ensure that characters like '✓' and '✕' are correctly escaped to the `\\[0-9]{3}` format.
* Verify that the new implementation works on all different backends.
---
For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/effekt-lang/effekt/issues/544?shareId=XXXX-XXXX-XXXX-XXXX).
Moved from #542
Motivation
When using characters like
"✓"
and"✕"
, MLton reports:Investigation
Here's the source for MLton's lexer which indicates support for
\[0-9]{3}
,\u[0-9A-F]{4}
and\U[0-9A-F]{8}
escapes: https://github.com/MLton/mlton/blob/680bfcc6d6d8df3e51220fd88d297830316b89b4/mlton/front-end/ml.lex#L446-L457 but there are no real docs for it, the only thing I found suggests that multi-byte escapes should be escaped to single-bytes (locked under a flag), see http://www.mlton.org/SuccessorML#ExtendedTextConstsThe error itself is defined here: https://github.com/MLton/mlton/blob/680bfcc6d6d8df3e51220fd88d297830316b89b4/mlton/elaborate/elaborate-core.fun#L451-L464, and I think this indicates that we should:
\[0-9]{3}
-style syntax.Solution
I think keeping strings UTF-8(-ish) is worth it, so I'd prefer the solution 2, even though it's slightly more work on our part.
Here's the code that needs to change:
effekt/effekt/shared/src/main/scala/effekt/generator/ml/Transformer.scala
Lines 637 to 638 in 08fc8fd
I think that something like
c.toString.getBytes("UTF-8")
could be useful here to get a sequence of bytes which then get mapped to the\[0-9]{3}
format each.Testing
It would be very valuable to have a few more test for this behaviour and check that such characters work on all of the different backends.
The text was updated successfully, but these errors were encountered: