Skip to content

Commit

Permalink
The . regex should not take the ASCII fast path
Browse files Browse the repository at this point in the history
see #375 for an example of undefined behavior because of this fast path.

TLDR: the ASCII fast path will stop matching on the first matching byte,
however this would split multi-byte codepoints. Combined with
`Lexer::remaining` (or even just capturing the string like in the issue),
this leads to non-utf8 strings escaping into user code. This is UNSOUND.
  • Loading branch information
RustyYato committed Feb 16, 2024
1 parent ba69cc3 commit d44d81b
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions logos-codegen/src/graph/regex.rs
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ fn is_ascii(class: &ClassUnicode) -> bool {
let start = range.start() as u32;
let end = range.end() as u32;

start < 128 && (end < 128 || end == 0x0010_FFFF)
start < 128 && end < 128
})
}

Expand All @@ -178,7 +178,7 @@ fn is_one_ascii(class: &ClassUnicode) -> bool {
let start = range.start() as u32;
let end = range.end() as u32;

start < 128 && (end < 128 || end == 0x0010_FFFF)
start < 128 && end < 128
}

#[cfg(test)]
Expand Down

0 comments on commit d44d81b

Please sign in to comment.