Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sh: switch to LALRPOP? #34

Open
Arcterus opened this issue Jul 22, 2018 · 4 comments
Open

sh: switch to LALRPOP? #34

Arcterus opened this issue Jul 22, 2018 · 4 comments
Labels
question Further information is requested

Comments

@Arcterus
Copy link
Contributor

I would vastly prefer to use something like LALRPOP or pest instead of nom, but right now neither support arbitrary bytes as input. There is this issue for LALRPOP and this one for pest that deal with that problem, but I'm not sure if it's fine to just say that invalid UTF-8 paths can't be given directly as arguments for now (they'd still work if a glob was given where the expanded part was invalid UTF-8). Switching to either LALRPOP or pest should help improve the error messages and make it simpler to fix some of the current parsing issues.

As a side note, this is actually the same issue that prevented me from just using pest in the first place.

@Arcterus Arcterus added the question Further information is requested label Jul 22, 2018
@mssun
Copy link
Contributor

mssun commented Jul 30, 2018

I think it's OK to say that "invalid UTF-8 paths can't be given directly" for now and use LALRPOP or pest (pest looks good since it uses PGE for parsing). Then, let's explain our usages to the pest/LALRPOP authors and push the implementation of the "matching bytes" feature.

@Arcterus
Copy link
Contributor Author

Arcterus commented Aug 2, 2018

So, I ended up rewriting the parser without a library using iterators (retrieved from OsStr) directly. Currently we use EncodeWide on Windows and std::slice::Iter<u8> on Unix. While I haven't tested on Windows yet, I believe this solution should let the parser function on Windows. We still need to figure out how to deal with forking on Windows (e.g. for subshells), but it should be possible to at least enable some sort of shell script checker on all platforms now.

I imagine the performance can be vastly improved as I just did a simple port from nom. If this proves too difficult, we can switch to LALRPOP/Pest at a later date (once they at least support parsing u8 streams).

The rewritten parser is in the sh-parser branch. The REPL doesn't support multi-line inputs right now, but for the most part things seem to function better than before. I am at this point considering just integrating the changes into master as (IMO) a good REPL is less important than good scripting support.

@Arcterus
Copy link
Contributor Author

Arcterus commented Aug 2, 2018

BTW the code from sh-parser has now been added to master.

@mssun
Copy link
Contributor

mssun commented Aug 2, 2018

OK, good. Let's use the handwritten parser first and consider LALRPOP/Pest later (if we cannot handle it properly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants