Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime stack overflow when lexing certain strings #424

Closed
Philogy opened this issue Sep 22, 2024 · 3 comments
Closed

Runtime stack overflow when lexing certain strings #424

Philogy opened this issue Sep 22, 2024 · 3 comments
Labels
bug Something isn't working question Further information is requested

Comments

@Philogy
Copy link

Philogy commented Sep 22, 2024

I was trying to get C-style multiline comment working in logos, the state machine for it is quite simple but I can't seem to get it working in logos. This regex seemed to work but it causes a panic when parsing certain things.

The code for report:

use logos::Logos;

#[derive(Logos, Debug, PartialEq)]
#[logos(skip r"[ \t\n\f\r]+")] // Ignore this regex pattern between tokens
enum Token {
    // Tokens can be literal strings, of any length.
    #[token("#define")]
    Define,

    #[token("macro")]
    Macro,

    #[regex("//[^\n]*\n?", logos::skip)]
    LineComment,

    #[regex("/\\*([^\\*]*(\\*[^/])?)*\\*/", logos::skip)]
    MultiLineComment,

    // Or regular expressions.
    #[regex("0x[0-9a-fA-F]+")]
    HexLiteral,

    #[regex("[a-zA-Z_]\\w*:")]
    Label,

    #[regex("[a-zA-Z_]\\w*")]
    Ident,
}

fn main() {
    let src = "
/*wow amazing!!!!!*** /*  **/
// wow very nice
#define macro hi: very nice

";

    let mut lexer = Token::lexer(src);
    while let Some(token) = lexer.next() {
        println!("{:?} {}", token, lexer.slice());
    }
}

The error I'm getting:

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
[1]    5609 abort      cargo run
@Philogy Philogy changed the title Runtime panic when parsing lexing Runtime panic when lexing based on certain regex Sep 22, 2024
@Philogy Philogy changed the title Runtime panic when lexing based on certain regex Runtime stack overflow when lexing certain strings Sep 22, 2024
@jeertmans jeertmans added question Further information is requested bug Something isn't working labels Sep 26, 2024
@jeertmans
Copy link
Collaborator

Hello @Philogy, I currently haven't much time to invest in this issue, but I would recommend to you the same thing as I do for all comment-style lexing: just create a token that matches the start of a comment, and then process the comment with a callback. This is usually much better, as comments can contain almost any characters, like escaped /, which makes it super hard to write a regex that handles all specific cases.

See #421 (comment) for an example on XML comments, which is very similar to multiline strings.

@jeertmans
Copy link
Collaborator

jeertmans commented Sep 26, 2024

Looks like this is a duplicate of #400, so closing this anyway :-)

@conradludgate
Copy link

conradludgate commented Oct 22, 2024

Looks like this is a duplicate of #400, so closing this anyway :-)

Looks different to me. #400 seems to error in the derive, whereas this errors at runtime.

For what it's worth, I also encountered a stack overflow/infinite loop at runtime with a small test case:

#[derive(Logos, Debug, PartialEq)]
enum TestToken {
    #[regex("c(a*b?)*c")]
    Token
}

#[cfg(test)]
mod logos_test {
    use logos::Logos;

    use crate::TestToken;

    #[test]
    fn overflow() {
        let _ = TestToken::lexer("c").next();
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants