Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stackoverflow when deriving token with block comment regex #400

Open
KarelPeeters opened this issue Jul 9, 2024 · 3 comments
Open

Stackoverflow when deriving token with block comment regex #400

KarelPeeters opened this issue Jul 9, 2024 · 3 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@KarelPeeters
Copy link

The following derive setup causes the build to fail:

#[derive(Logos)]
enum TokenType {
    #[regex(r"/\*([^\*]*\*+[^\*/])*([^\*]*\*+|[^\*])*\*/")]
    BlockComment,
}

The error printed is:

error: rustc interrupted by SIGSEGV, printing backtrace

/home/karel/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-1ccb730c51a3970e.so(+0x2ea5963)[0x7f95caea5963]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f95c7c42520]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0x9b7cf)[0x7f95b7c9b7cf]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0x959d7)[0x7f95b7c959d7]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0x97e78)[0x7f95b7c97e78]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0x95b78)[0x7f95b7c95b78]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcad1a)[0x7f95b7ccad1a]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0x7e906)[0x7f95b7c7e906]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcfa89)[0x7f95b7ccfa89]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcfc95)[0x7f95b7ccfc95]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0x62a64)[0x7f95b7c62a64]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd01c8)[0x7f95b7cd01c8]

### cycle encountered after 12 frames with period 14
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
### recursed 17 times

/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xcedab)[0x7f95b7ccedab]
/home/karel/Documents/mini/HwLang/demo/target/debug/deps/liblogos_derive-d7456505651eb2ec.so(+0xd02fe)[0x7f95b7cd02fe]

note: rustc unexpectedly overflowed its stack! this is a bug
note: maximum backtrace depth reached, frames may have been lost
note: we would appreciate a report at https://github.com/rust-lang/rust
help: you can increase rustc's stack size by setting RUST_MIN_STACK=16777216
note: backtrace dumped due to SIGSEGV! resuming signal
error: could not compile `demo` (bin "demo")

Other tokens (normal literal string, other regular expressions) work fine. I assume this is because somewhere the the derive machinery this specific regex causes infinite recursion.

Note: I got this regex from the LALRPOP book here.

@KarelPeeters
Copy link
Author

KarelPeeters commented Jul 9, 2024

After a bit of debugging:

  • It looks like this is caused by infinite recursion between merge_unchecked and insert.
  • If I replace the call to merge_unchecked with merge I get the error "Merging two reserved nodes! This is a bug, please report it [...]". This means either of the following:
    • merge and merge_unchecked are not actually intended to be equivalent except for some extra checking, and this error proves nothing.
    • There is some bigger logic issue that causes merge_unchecked to be called on an invalid pair of node.

I'll continue to investigate, but any advice would be appreciated!

@jeertmans jeertmans added the bug Something isn't working label Jul 17, 2024
@jeertmans
Copy link
Collaborator

Hello @KarelPeeters! Thanks for reporting this bug (though I am not sure if this is a bug or a limitation of Logos).

Unfortunately, I don't have time to investigate this at the moment. However, your regex seems very complex, and it might be worth trying to simplify it, at least by breaking it down into multiple tokens or using callbacks (this is usually the simplest thing to do when trying to match block comments).

@facefaceless
Copy link

For now, I think handling multiple line comment manually would be better. Here is code snippet from my project.

...
#[token("/*", multiline_comment)]
BlockComment,
...
fn multiline_comment(lex: &mut Lexer<TokenType>) -> FilterResult<(), LogosLexError> {
    enum State {
        ExpectStar,
        ExpectSlash,
    }
    let remainder = lex.remainder();
    let (mut state, mut iter) = (State::ExpectStar, remainder.chars());
    while let Some(next_char) = iter.next() {
        match next_char {
            '\n' => {
                lex.extras.line += 1;
                lex.extras.line_beg = lex.span().end + (remainder.len() - iter.as_str().len());
                state = State::ExpectStar;
            }
            '*' => state = State::ExpectSlash,
            '/' if matches!(state, State::ExpectSlash) => {
                lex.bump(remainder.len() - iter.as_str().len());
                return FilterResult::Skip;
            }
            _ => state = State::ExpectStar,
        }
    }
    lex.bump(remainder.len());
    FilterResult::Error(LogosLexError::IncompleteComment)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants