Expression::lex parses negation inconsistently #263

spud · 2022-01-31T05:10:11Z

I've just tried out my first implementation of TNTSearch, so bear with me!

I'd been struggling with strange results from boolean searches using the "foo -bar" syntax, seeing results that were clearly inaccurate. Glancing at the source code, I noticed that the tilde (~) was also used for excluding words, so I tried the same query using "foo ~bar", expecting the same result set, but got totally different (and more accurate) results.

While debugging, I noticed that the output produced by Expression::lex was different in the two cases.

$ex = new Expression();
$tokens_1 = $ex->lex("foo -bar");
$tokens_2 = $ex->lex("foo ~bar");

The problem is
$tokens_1 != $tokens_2

That simple inconsistency is the basic bug for this report. But I am aware of #246, and I cannot speak to whether or not this fix might address any aspect of that issue. I do know that $tokens_1 was producing wildly inaccurate results, and $tokens_2 produced much better matches, so there is definitely a difference in the results they produce.

A quick look at the code for lex seems to indicate that the inconsistency in parsing can be rectified by changing the initial search and replace arrays into a different order:
$bad = [' or ', ' ', '-'];
$good = ['|', '&', '~'];

This ends up producing the same token array in both situations. I'm just not familiar enough with the implications of that change (it's consistent, but is it right?) to go straight to a pull request. (But happy to if this is confirmed.)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expression::lex parses negation inconsistently #263

Expression::lex parses negation inconsistently #263

spud commented Jan 31, 2022

Expression::lex parses negation inconsistently #263

Expression::lex parses negation inconsistently #263

Comments

spud commented Jan 31, 2022