Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

most longest prefix value exist in the column value but it doesn't intersect with other columns values in the query that we need part of query retrieval #300

Open
fouadazem opened this issue Jul 10, 2022 · 7 comments

Comments

@fouadazem
Copy link

lets suppose we have
we have part of our query select R from Table where Longest(A=112233) and (B>5 and D<10)
A B D R
11223 1 3 15
1122 6 9 12

since the longest appears where is B and D doesn't fit base on our query.

so the case above we expect to receive the second line even we don't have the longest match for the second line.
but as result of the problem no records received part of the query output.

@npgall
Copy link
Owner

npgall commented Jul 10, 2022

Should that be an or() query in the middle instead?

@fouadazem
Copy link
Author

fouadazem commented Jul 10, 2022 via email

@npgall
Copy link
Owner

npgall commented Jul 10, 2022

I'm not sure that I understand what you need.

Is it that you effectively have 2 queries? By default you want results to be returned for the first query only. However if there are no results for the first query then you want instead the results for the second query to be returned?

If thats what you need could you just do 2 separate queries where your application has that logic?

If you're looking for something else, could you provide a test case which shows what you're looking for (e.g. assertions on the results)? Or else maybe you explain how it would be achieved in SQL?

@fouadazem
Copy link
Author

fouadazem commented Jul 10, 2022 via email

@fouadazem
Copy link
Author

fouadazem commented Jul 11, 2022 via email

@npgall
Copy link
Owner

npgall commented Jul 11, 2022

I think I understand what you're looking for now - that was a good example, thank you.

You are right that the longest prefix query will only match the longest one.

I think there are a few ways to solve your problem:

Option 1
Execute a query that only refers to your constraints on the B and C columns/attributes, and then order the results in descending order of the length of the match for column/attribute A.

In this case you can do the ordering/sorting of the results inside or outside of CQEngine.

To do it inside CQEngine, you might need to implement a special/custom attribute MATCH_LENGTH (or similar) which will return the length of the match. And, since the attribute will need to know the string that it is comparing the stored value with, you can pass that string to it as a query option. You can invoke it by supplying query option orderBy(descending(MATCH_LENGTH)) (and supply the input string as another query option as well).

This will leverage any indexes you have on B and C, but the sorting/ordering will not leverage indexes at all. (So this option is not a panacea, hence why I've mentioned other options below. See my discussion of tradeoffs below too.)

I do think CQEngine could better support you if you want to go with this option. If you get that working and you can contribute the code for the attribute you will write, I (or you via pull requests 😉) could use it as a basis for better supporting this use case in CQEngine.

Option 2
Don't use CQEngine for this, use the ConcurrentRadixTree directly instead.

This will allow you to use the radix tree as an index on A to retrieve the matches for your input string. You can then filter those matches to ensure they match B and C.

This will leverage an index on A, but not on B or C.

Option 3
Improve support in CQEngine to better leverage the ConcurrentRadixTree for queries like this on A, when an "index" ordering strategy is selected. This would basically merge support into CQEngine for Option 2.

Tradeoffs
Unfortunately what you really want to do is not supported well with CQEngine currently. The main challenge is that this use case is unlike most others, because an index on A which could support your query, would actually match every object in the collection. That's because it seems to be a kind of "soft" constraint that is used for ordering more than filtering. So therefore it would not integrate well with the core of the query engine which is based on set theory. However it could be integrated better with CQEngine's index ordering strategy.

Which approach will work best for you? It depends on the X% of the collection that is matched by your queries on B and C. If X is small, something like Option 1 should work well. If X is large, something like Option 2 or 3 would be better.

Hope that helps. And I hope I've understood your use case properly and that the above makes sense!

@fouadazem
Copy link
Author

fouadazem commented Jul 18, 2022

Hi Niall @npgall ,

Thanks for explanation and details and your quick response , appreciate that!

we are using the CQEngine version 3.5.0 .

first , would inform that we choose the option #3
Improve support in CQEngine to better leverage the ConcurrentRadixTree for queries like this on A, when an "index" ordering strategy is selected. This would basically merge support into CQEngine for Option 2.

how to get base specific field name values from the ResultSet ? i do expect to give field name and get me the values of this field name from the result set that m planning to do on CollectionQueryEngine : retrieveWithIndexOrdering ?

Best Regards Fouad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants