You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've just tried out my first implementation of TNTSearch, so bear with me!
I'd been struggling with strange results from boolean searches using the "foo -bar" syntax, seeing results that were clearly inaccurate. Glancing at the source code, I noticed that the tilde (~) was also used for excluding words, so I tried the same query using "foo ~bar", expecting the same result set, but got totally different (and more accurate) results.
While debugging, I noticed that the output produced by Expression::lex was different in the two cases.
That simple inconsistency is the basic bug for this report. But I am aware of #246, and I cannot speak to whether or not this fix might address any aspect of that issue. I do know that $tokens_1 was producing wildly inaccurate results, and $tokens_2 produced much better matches, so there is definitely a difference in the results they produce.
A quick look at the code for lex seems to indicate that the inconsistency in parsing can be rectified by changing the initial search and replace arrays into a different order: $bad = [' or ', ' ', '-']; $good = ['|', '&', '~'];
This ends up producing the same token array in both situations. I'm just not familiar enough with the implications of that change (it's consistent, but is it right?) to go straight to a pull request. (But happy to if this is confirmed.)
The text was updated successfully, but these errors were encountered:
I've just tried out my first implementation of TNTSearch, so bear with me!
I'd been struggling with strange results from boolean searches using the
"foo -bar"
syntax, seeing results that were clearly inaccurate. Glancing at the source code, I noticed that the tilde (~) was also used for excluding words, so I tried the same query using"foo ~bar"
, expecting the same result set, but got totally different (and more accurate) results.While debugging, I noticed that the output produced by
Expression::lex
was different in the two cases.$ex = new Expression();
$tokens_1 = $ex->lex("foo -bar");
$tokens_2 = $ex->lex("foo ~bar");
The problem is
$tokens_1 != $tokens_2
That simple inconsistency is the basic bug for this report. But I am aware of #246, and I cannot speak to whether or not this fix might address any aspect of that issue. I do know that $tokens_1 was producing wildly inaccurate results, and $tokens_2 produced much better matches, so there is definitely a difference in the results they produce.
A quick look at the code for
lex
seems to indicate that the inconsistency in parsing can be rectified by changing the initial search and replace arrays into a different order:$bad = [' or ', ' ', '-'];
$good = ['|', '&', '~'];
This ends up producing the same token array in both situations. I'm just not familiar enough with the implications of that change (it's consistent, but is it right?) to go straight to a pull request. (But happy to if this is confirmed.)
The text was updated successfully, but these errors were encountered: