Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source map improvements #1716

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open

Source map improvements #1716

wants to merge 25 commits into from

Conversation

vouillon
Copy link
Member

I started working on improving the source map output of Wasm_of_ocaml, and got a bit side-tracked...

Many changes to preserve debug information from the bytecode as well as possible and put this information in the source map at appropriate places for the JavaScript debuggers.

Explain a bit how they are used by the debuggers.
The debuggers do not stop on some statements, like function
declarations. So there is no point in outputting some debug information
there.
The best position is at the beginning of the expression after the `=`.
There are some locations which are marked as ghost locations, but which
actually make sense.
Chrome will stop right after a return before returning from a function.
This will yield to an ambiguity if there is a statement there. To
prevent this, we add a space after a return statement when source maps
are enabled.
The Chrome debugger uses this information to get the original names of
variables. It does not look at all the identifier occurences but only at
where they are bound.

Also, we provide the original name, not the shortened name, which is not
very useful.
- Do not repeat the output if the location has not changed
- Start a new source map mapping before outputing the debug info as a
  comment (we need to start a mapping right after a return statement,
  moving it after a comment would not work)
- Outputting an identifier name will not change the location (there is
  no point in doing that)
This is simpler than to associate a location to all instructions.
Also, this avoid the issue of locations getting lost because an
instruction gets optimized away.
In particular, no longer track the location of variables. This may make
sense during code generation, though: the location of a statement should
be the earliest location of the effectful expressions it contains (if
there is any).
If the first block of the function starts with an event, we use the
event's location. This is useful in case of tail-calls, where some code
is generated before this first block. If we don't have an event, we set
the initial location to unkown to prevent previous locations to bleed
into the function body.
There is some code at the beginnning of exception handlers for which we
have no debugging information.
Statements will get the earliest location of the expressions performing
a call it contains, if there is any. Then, for an expression `if (e)
{...}`, we don't move first to the location of the conditional before
going back to the location within `e` which might be strictly before.

When queueing expressions, we ignore their locations if they don't
perform a call; when flushed, we use the current location. This again
prevent locations to be reordered.
The generated code is also hidden by using a dummy file.
@vouillon vouillon force-pushed the source-maps branch 2 times, most recently from 0dc5e0e to 2913f20 Compare October 18, 2024 14:12
Firefox assumes that a mapping stops at the end of a line, which is
inconvenient. When this happens, we repeat the mapping on the next line.
Outputting this source map would fail on 32-bit architectures since
its size, once Base64-encoded, would be over the string length limit
of 16 MiB.
^ ^ ^

Chrome uses the location of the opening parenthesis of a function
declaration to determine the function name in the stack.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do these infos come from ? Are there any link you can join to this doc ?

@@ -456,6 +456,33 @@ struct
in
traverse l e

let stop_on_statement st =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here

The debuggers do not stop on some statements, like function
declarations. So there is no point in outputting some debug information
there.

Where does this info come from ?

, (Return_statement (Some e1, _), _)
, Some (Return_statement (Some e2, _), _) ) ->
( Return_statement
(Some (ECond (cond, e1, e2)), U (*ZZZ Use end of function? *))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a todo ?

| Env.Env_module (_summary, ident, _, description) when Ident.same ident ident' ->
Some description.Types.md_loc
| Env.Env_extension (_summary, ident, description) when Ident.same ident ident' ->
Some description.Types.ext_loc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this affect tests-full ?

( Extern "caml_blit_string"
, [ Pv a'; Pc (Int zero1); Pv bytes'; Pc (Int zero2); Pv alen'' ] ) )
, _ )
; (Event _, _)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern seems fragile, can we do better ?

? caml_cps_call2(Stdlib[79], cst_toto$0, _o_)
: caml_cps_call2(Stdlib[79], cst_titi, _o_);
? caml_cps_call2
(Stdlib[79], cst_toto$0, function(_q_){return _p_(_q_);})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this diff ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some inlining optim needs to be updated ?

(link_flags
:standard
; The source map would be too large on 32-bit architectures otherwise
--no-source-map)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be enough to disable the inline sourcemap ?

(link_flags (:standard --source-map \ --source-map-inline)) 

Or maybe just disable sourcemap on 32bit arch only

@@ -501,55 +501,165 @@ let constant ~ctx x level =
type queue_elt =
{ prop : int
; ce : J.expression
; loc : J.location
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The let* rewrite make it hard to review the rest of the diff. could this commit be split, maybe ?

- omitted else clause
*)
instrs
| _, _ ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have an exhaustive pattern for event (at least on ev_kind) ?

let ignore_list =
List.filter sources ~f:(fun filename ->
String.length filename >= 9
&& String.equal (String.sub filename ~pos:0 ~len:9) "/builtin/")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String.starts_with

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants