Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diagnostics are too chatty #442

Open
davidanthoff opened this issue Jun 26, 2024 · 8 comments
Open

Diagnostics are too chatty #442

davidanthoff opened this issue Jun 26, 2024 · 8 comments

Comments

@davidanthoff
Copy link
Contributor

davidanthoff commented Jun 26, 2024

I've started implementing a linter that will eventually power the LS. For now all it does is use the diagnostic messages from JuliaSyntax. https://github.com/davidanthoff/StringBuilders.jl/pull/96/files is an example. There is one syntax error, a missing ) on line 14 and the error message for that is nice.

But then all the other error messages after it I feel ideally would not appear at all. In my mind once the parser recognizes that the error on line 14, it should continue parsing pretending that the error was fixed and then not report all these knock-off errors further down in the file.

This is actually probably a reason to not surface the diagnostics in the VS Code extension yet. CSTParser doesn't have a nice error message, but it just stops processing the file after this first error, which at the end of the day is probably a better user experience because the error list is not populated by lots of false-positive like things.

FYI @pfitzseb

@KristofferC
Copy link
Member

You can chose what to propagate, or? Analogous to #344

@davidanthoff
Copy link
Contributor Author

That is true, we can easily go back to the CSTParser situation by just reporting the first error. Having said that, it would be nice if the error recovery here would be better :)

@davidanthoff
Copy link
Contributor Author

dotnet/roslyn#16151 (comment) is interesting. I would imagine that in the case I linked to, we detect that a closing ) is missing, and then bubble up one level and continue parsing normally, as if that ) was there, right?

@davidanthoff
Copy link
Contributor Author

I started to surface JuliaSyntax.jl errors and warnings in the VS Code extension now (and also in a new GitHub lint action). And I'd really like to not just show the first diagnostics. If there are multiple errors in a file, it is really useful to show them all, so I think for now I'm going to err on the side of showing too many errors, hoping that we can gradually improve the error recovery here in JuliaSyntax.

@c42f
Copy link
Member

c42f commented Jul 18, 2024

Yeah, we don't recover well from this kind of thing. Hence #344 :-(

Recovery is a whole thing I haven't worked on. Partly because I don't believe it's possible to do it well with a big hand-written pile of heuristics. At least, not without a heroic amount of engineering effort.

I'd like to do it in a data-driven way instead because I think that's a good long term solution. ("Data driven" == a user-contributed database of common errors and their fixes along with human-readable explanations. Then use that in conjunction with a pluggable recovery system - initially perhaps a hand-written pattern matcher, but I think a learned model is the ultimate solution to this.)

@davidanthoff
Copy link
Contributor Author

Take everything that follows with a grain of salt because I'm still trying to understand JuliaSyntax and might mix things up :) But here we go.

If I understand what is happening in this case of a missing closing parenthesis for a function, then

function bump_closing_token(ps, closing_kind, alternative_closer_hint=nothing)
handles that. And the logic seems to be that it if there isn't a ) where it expects that, then it just keeps looking for the next ) and marks everything in between as an error node. Is that right?

If that is so, then the issue here doesn't seem to be a pattern matching system that sits on top of the nodes, but instead I think the actual nodes that are generated would have to look differently and ideally follow the strategy that is described in the C#/Roslyn issue I linked to above. If they don't find a closing parenthesis, the first thing they try is see whether the next token can be successfully parsed by the "context" above the current one, and if so continue in that way. So here the context that is missing the ) is the function arguments, and the context above would be the statements in the function body, and so Roslyn would just check whether the next thing that isn't the closing ) is a valid statement (presumably), and if so, contineu with that.

I'm not sure how that would be implemented here, though, as I looks to me as if inside bump_closing_token there is no info what the next higher context would be, so it seems hard to test whether the next token would be valid there, right?

@c42f
Copy link
Member

c42f commented Jul 19, 2024

And the logic seems to be that it if there isn't a ) where it expects that, then it just keeps looking for the next ) and marks everything in between as an error node. Is that right?

Yes, it's pretty much the simplest possible recovery algorithm. It's not good 😆

I'm not sure how that would be implemented here, though, as I looks to me as if inside bump_closing_token there is no info what the next higher context would be, so it seems hard to test whether the next token would be valid there, right?

Right. My understanding is that we'd have to manually push onto a stack of parent contexts as we recursively call the parsing functions. Essentially we're keeping a "shadow stack" which mirrors the actual program stack but is accessible at the source level.

It's not the worst thing ever and maybe could even be automated with a macro to wrap a push/pop around every parse_* function call body. But it's also not lovely. IIRC the rust-analyzer parser also does this kind of thing. And it seems like a reasonable amount of engineering effort to get something a lot better than what we have now.

I still don't think that kind of heuristic can ever be "really great", but if it's fairly easy it could be a worthwhile win.

@c42f
Copy link
Member

c42f commented Jul 19, 2024

dotnet/roslyn#16151 (comment) is interesting. I would imagine that in the case I linked to, we detect that a closing ) is missing, and then bubble up one level and continue parsing normally, as if that ) was there, right?

Btw if someone has the energy to make an experimental prototype of this and demonstrate what it could give us - and what the cost is in code complexity - that would be amazing. Doesn't have to be fancy - just a demo to give a taste.

I'm really busy right now so I can't write the code but I could look at a prototype.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants