Add better sorting of completions #1165

schmeic · 2024-06-13T01:52:12Z

Description

Sort completions by match point, then match length, then item text.

Note that match length is equal to the length of the search text for non fuzzy matches, and equal to the length of the match group for fuzzy matches.

This means that matches that start with the search text will be displayed first, followed by matches that contain the search text, and then fuzzy matches.

Sorting by item text when both match point and match length are the same for matched completions ensures that completions that start with the search text will be in alphabetical order. For example, searching for fr will display FROM before FROM_UNIXTIME.

Checklist

I've added this contribution to the changelog.md.
I've added my name to the AUTHORS file (or it's already there).

rolandwalker · 2024-06-13T12:03:25Z

Wow, you weren't joking about the tests.

From the English description, the algorithm sounds great.

I may have time to experiment with this on the weekend.

schmeic · 2024-06-13T18:37:20Z

Yeah, that's why I wanted to make sure the algorithm would be accepted before working on the tests. From my experiments with this new sorting, I've found that it has always put what I consider the most relevant results at the top.

The funny thing was that the find_matches(...) function was already storing all 3 of the parameters I used for sorting, and then ignoring everything but the item text at the end when creating the completions generator. That's what gave me the idea for the sorting algorithm.

rolandwalker · 2024-06-13T19:23:16Z

Let's let the PR hang around for some days and see if it attracts any objections.

schmeic · 2024-06-13T19:28:33Z

Ok. Maybe you'll have some time to experiment with it as well. I'm hoping it will attract some "likes"...

rolandwalker · 2024-06-13T21:48:47Z

Should the proposed change only apply when if not smart_completion: ? It would see that the point of smart_completion is to give categories in category order.

rolandwalker · 2024-06-13T21:52:11Z

mycli/sqlcompleter.py

+
+        def sorted_completions(matches):
+            # sort by match point, then match length, then item text
+            matches = sorted(matches, key=lambda m: (m[1], m[0], m[2].lower()))


As long as the sort is casefolding it might as well also do

m[2].lower().strip('`')

Yeah, I can certainly add that. Pretty sure I've seen someone else complain in another issue about the back tick messing with sorting.

schmeic · 2024-06-13T21:53:56Z

I don't think so, I personally would rather have a completion at the top that best matches my search text regardless of the category. Have to tried it out yet? Maybe you will agree after some experimenting.

I actually first tried doing the same sort in the find_matches(...) function, so per category, and the results were not ideal.

rolandwalker · 2024-06-13T22:19:52Z

I'm trying it now. It would seem that the config file lies when it claims

# Enables context sensitive auto-completion. If this is disabled then all
# possible completions will be listed.
smart_completion = True

because when set to False, table names and such are simply not returned.

It is rather helpful to see the completions in the "smart" ordering when nothing has yet been typed, like maybe

            if len(word_before_cursor) != 0 and word_before_cursor[-1] != '(':
                matches = sorted(matches, key=lambda m: (m[1], m[0], m[2].lower().strip('`')))

schmeic · 2024-06-13T22:51:59Z

Regarding the smart_completion = False setting, I think what you're seeing is a bug. You need to first do:

use <db name>

and then you should see table and column names. It doesn't work if you use a DSN to select the db.

By the way, I think it would be really useful to do fuzzy matching when not doing smart completion. Maybe this can be implemented as a setting?

rolandwalker · 2024-06-13T23:15:50Z

Why closed?

schmeic · 2024-06-13T23:30:46Z

Sorry, didn't realize I did that. I deleted a comment and maybe that closed it?

How can I reopen?

schmeic · 2024-06-13T23:32:47Z

Accidentally closed.

rolandwalker · 2024-06-14T11:31:47Z

Regarding the smart_completion = False setting, I think what you're seeing is a bug. You need to first do:
use
and then you should see table and column names. It doesn't work if you use a DSN to select the db.

Your suggestion works! But we really need to fix that bug.

By the way, I think it would be really useful to do fuzzy matching when not doing smart completion. Maybe this can be implemented as a setting?

Yes, though one issue is that I adore settings, and @amjith really does not. For example for the entire functionality in this PR, I think it would be fantastic if ~/.myclirc added

completion_aggresive_sort = True

or any other property name that makes sense.

I personally would rather have a completion at the top that best matches my search text regardless of the category.

Right. I personally would also prefer to use your algorithm. But we have to think of the most general usecase and the default expectation. That is why I proposed to enable it outside of the smart_completion codepath.

amjith · 2024-06-14T14:21:32Z

Let me catch up on the PR.

I'm not opposed to adding a config option but if the new completion behavior feels logical we can switch to it by default. Let me read through the proposal and try it out locally.

schmeic · 2024-06-14T17:34:19Z

I think this new sort is much better than the current default, so it seems like most people would prefer it. By the way, I just upgraded to the latest pip version of 1.27.2, and the sorting is now much worse than before. Now it appears to be returning results in alphabetical order, where as before the order was more by relevance (previous version I used was 1.26.1).

So now in 1.27.2, when I do this:
SELECT * FROM e
I see a bunch of tables starting with a at the top of the results. Previously, the top results were tables starting with e.

Also, there are other open issues I've seen where people complain about the fuzzy matching and want an option to turn it off because the fuzzy matches are often at the top of the results. My algorithm will also fix this since the fuzzy matches will always be below exact matches.

amjith · 2024-06-15T21:22:56Z

I'm still testing things out. Notes so far, I like that FROM is listed about FROM_UNIXTIME. There is a crash that happens when I type SOURCE . Haven't dug into it yet, if you have the time please take a look.

schmeic · 2024-06-17T18:24:25Z

I'm still testing things out. Notes so far, I like that FROM is listed about FROM_UNIXTIME. There is a crash that happens when I type SOURCE . Haven't dug into it yet, if you have the time please take a look.

Ok, I see the issue. The find_files() method is returning a Completion generator and the code is expecting an array of match tuples.

Any idea what happened to the sorting in the latest version (1.27.1) ? Why is it now alphabetical?

schmeic · 2024-06-27T19:22:51Z

Still waiting on feedback for this one...

amjith · 2024-06-27T19:33:32Z

I've tested it locally and I'm fine with the ordering. Happy to merge if you can fix the tests.

schmeic · 2024-06-27T19:38:46Z

Cool, I'll work on fixing the tests.

schmeic · 2024-07-09T21:56:31Z

Is there a way to run tests individually? I'm having a bit of trouble understanding the test setup.

amjith · 2024-07-09T23:23:39Z

We just use pytest. You can run pytest -k <name of the test> to run just a single test.

For example:

pytest -k test_suggested_multiple_column_names_with_dot will run only that one single test.

You can also specify partial names to select multiple tests.

pytest -k test_suggested will select all the tests that start with that name.

schmeic · 2024-07-10T00:28:05Z

Cool, thanks! I don't use python that much, so this is all pretty new to me. First test fixed...

schmeic · 2024-08-07T22:16:45Z

I've fixed all of the unit tests, but when I pushed my changes, I get an email saying the "PR run failed ...".

How do I fix this?

schmeic · 2024-08-19T20:08:12Z

Looks like the error is related to the test command, which is not directly related to this pull request.

"The test command is disabled and references to it are deprecated."

What's the status of this pull request?

rolandwalker · 2024-08-19T21:52:17Z

What happens if you change the line setuptools in requirements-dev.txt to setuptools<=71.1.0 ?

schmeic · 2024-08-20T17:56:44Z

Better, but why are most of these checks getting cancelled? Also, the one that failed had no errors:

8 features passed, 0 failed, 0 skipped
32 scenarios passed, 0 failed, 0 skipped
139 steps passed, 0 failed, 0 skipped, 0 undefined
Took 0m16.305s
Error: Process completed with exit code 1.

rolandwalker · 2024-08-20T21:36:13Z

First pytest is run, then behave is run. Then the CI run exits with a failure if either one failed. In this case, pytest is showing failures:

test/test_clistyle.py ss                                                 [  0%]
test/test_completion_engine.py ......................................... [ 15%]
...................x...................................................  [ 41%]
test/test_completion_refresher.py ....                                   [ 42%]
test/test_config.py ..............                                       [ 47%]
test/test_dbspecial.py ....                                              [ 49%]
test/test_main.py ...................                                    [ 56%]
test/test_naive_completion.py .....                                      [ 57%]
test/test_parseutils.py .............X........................           [ 71%]
test/test_prompt_utils.py .                                              [ 72%]
test/test_smart_completion_public_schema_only.py .................F....  [ 80%]
test/test_special_iocommands.py ...................                      [ 86%]
test/test_sqlexecute.py ...................................              [ 99%]
test/test_tabular_output.py F                                            [100%]

I believe the other CI runs in the matrix were canceled as soon as one failed.

schmeic · 2024-08-21T00:53:43Z

I realized that the reason one of the tests is failing is because my sort algorithm is comparing lower cased matches, which doesn't work for the special commands starting with \, since these are case sensitive. How should I handle this?

I feel like there shouldn't even be empty string completion, especially since it isn't really empty string, but rather blank string and requires entering something like a space to trigger it.

It's also strange that when I run ./setup.py test I get an error, but when I run the following command, it passes:
pytest -vv -k test_empty_string_completion test/test_smart_completion_public_schema_only.py

schmeic · 2024-09-12T19:10:08Z

Still waiting to hear back on how to handle this one failing unit test. Doesn't seem worth complicating the sort method just to support these special commands, and fix a single unit test.

schmeic · 2024-10-09T23:34:01Z

Pretty disappointed not to get any feedback on this, I spent a lot of time trying to fix all these unit tests and I think it's really close.

I wanted to contribute to this project, but I guess I can just use my own branch.

amjith · 2024-11-03T16:44:14Z

Sorry @schmeic. Life got ridiculous for the past few months. I'm taking a look now.

amjith · 2024-11-03T17:09:09Z

mycli/sqlcompleter.py

+        def sorted_completions(matches):
+            # sort by match point, then match length, then item text
+            matches = sorted(list(matches), key=lambda m:
+                             (m[1], m[0], m[2].lower().strip('`'), m[2].startswith('`')))


I think this strip is causing the tests to fail. The backticks are used to escape the table name if it happens to be a keyword. For instance if someone names the table table then it going to conflict with the keyword since table is a keyword. This adds backticks around it to escape the string to avoid the conflict.

See the comment by @rolandwalker on June 13. The idea is that a table/field surrounded by backticks should match the same one without backticks. I've tried the change the tests to work with this.

amjith · 2024-11-03T18:07:17Z

I noticed that the sort order seems to have messed up the ordering when it completes column names in a table.

This is the completion from your branch. The columns are not listed at the top but instead pushed down below the keywords:

This is the completion order from the main branch, the columns from the table are listed at the top.

amjith · 2024-11-04T03:59:16Z

@schmeic I went down a rabbit hole here. The reason we don't sort the completion candidates is because we have the keywords listed in a curated order that is based on popularity instead of lexicography.

But I went back to your original motivation for this PR. When you type SELECT * FR it should list FROM instead of FROM_UNIXTIMESTAMP. I modified the completion engine a little bit to allow only keywords (not functions) which fixes the issue superficially.

I'll open a new PR and tag you on it so you could take it for a spin and let me know what you think.

amjith · 2024-11-04T04:10:49Z

#1175

schmeic · 2024-11-04T18:30:02Z

@schmeic I went down a rabbit hole here. The reason we don't sort the completion candidates is because we have the keywords listed in a curated order that is based on popularity instead of lexicography.

But I went back to your original motivation for this PR. When you type SELECT * FR it should list FROM instead of FROM_UNIXTIMESTAMP. I modified the completion engine a little bit to allow only keywords (not functions) which fixes the issue superficially.

I'll open a new PR and tag you on it so you could take it for a spin and let me know what you think.

The FROM coming after FROM_UNIXTIMESTAMP was just one example of what I thought was wrong with the completions. Why should columns be at the top? There are plenty of times when I'm looking for a keyword, and don't want to see all those unsorted column names at the top. I think the completions are a lot more intuitive when everything is sorted and works the same.

amjith · 2024-11-05T17:10:23Z

The FROM coming after FROM_UNIXTIMESTAMP was just one example of what I thought was wrong with the completions.

I understand that. That's why I said my PR is a superficial fix.

Why should columns be at the top?

Imagine this scenario:
You just connected to a fresh DB and you don't know the tables or the columns in each table. You start typing SELECT * FROM and you get the list of tables at the top so you navigate using arrow keys to get an idea of the tables. Then you pick the table from the suggestion and continue with your query SELECT * FROM users WHERE , you don't know the columns in the table but after the WHERE clause you get a list of column names from the users table. You can find all the columns using the arrow keys and pick the one that makes the most sense and so on. If we take the approach you've suggested the user has to know the columns in the table in order to start typing a few letters to get the desired completion. It thwarts the ability discover the schema.

I think the current implementation is a more human centric approach with UX. In fact, the keywords are ordered (I think I mentioned this in my previous comment as well) to make sure the most common keywords are listed at the top. This is also meant to facilitate a user to pick things from the completion menu rather than continue to type to get to their desired keyword.

My goal was to ensure we can provide a user friendly UX for people who pick from a menu without compromising the efficiency of someone who likes to type (or knows the schema full well). But you pointed out a very valid flaw with the FROM and FROM_UNIXTIMESTAMP that was annoying for a user who prefers to type rather than use arrows. So I decided to fix that tactically.

I hope my counterpoint convinces you of the reason for sticking with the current approach. As always I'm open to revisiting the design if there are compelling reasons to do so.

schmeic · 2024-11-05T18:41:43Z

Ok, I guess I see your point about "discovering the schema", but I would argue that if you want to do that, use:
DESC <table name>

Once you start typing any characters, the current approach will typically provide worse results than my approach, especially when you want a keyword instead of a column. So, I still feel like my approach is better, since it will provide more relevant results the majority of the time. It seems pretty inefficient to try and discover the schema using autocomplete.

As far as your point about keyword ordering, they may be listed with more common ones at the top, but what if I'm looking for a less common one? My approach will find it much more efficiently. However, I think the keyword search should be doing fuzzy searching when using my approach. Currently, when I search for WHERE, I need to type the first 4 chars to get past WHEN. If we enable fuzzy searching, I would be able to just type wr.

amjith · 2024-11-07T05:41:52Z

Ok, I guess I see your point about "discovering the schema", but I would argue that if you want to do that, use:
DESC
The whole point of mycli is to not break your flow. When I am typing a query I don't want to stop in the middle and run a DESC table to continue my query. This becomes especially relevant when writing JOIN queries.
Adding fuzzy matching for keywords is not a bad idea. Let me see if I can prototype that in a PR.

amjith · 2024-11-07T06:36:08Z

I've updated #1175 to enable fuzzy matching for keywords. Can you give it a try?

You can install it as follows:

pip install -U https://github.com/dbcli/mycli/archive/refs/heads/fix-select-star.zip

schmeic · 2024-11-07T18:16:26Z

Sounds like you prefer good discovery over relevant results when text is typed. I want to find the completion I'm looking for with the fewest keystrokes. I'm not sure if it's possible to have both, let me know if you have any ideas on how to make it work.

I actually added fuzzy keyword matching on my branch, it works better.

Add better sorting of completions

e97ca2d

rolandwalker reviewed Jun 13, 2024

View reviewed changes

schmeic closed this Jun 13, 2024

schmeic reopened this Jun 13, 2024

Fix issue with file name completions

2bec7e8

Kevin Schmeichel added 3 commits August 2, 2024 11:51

Remove back tick before sorting completions

9409239

Fix some unit tests to work with new completing sorting

fa150e2

Fix how the Completion position is set

80d1945

Kevin Schmeichel added 3 commits August 6, 2024 10:42

Prevent duplicate completions

8139ab1

Fix smart_completion_public_schema_only unit tests

fb44577

Fix sql_output unit test

d532221

Fix an issue with casing of completions

60c9677

Try using an older version of setuptools for unit tests

77cd828

Comment out test_empty_string_completion for now

f5b674b

amjith reviewed Nov 3, 2024

View reviewed changes

Add better sorting of completions #1165

Are you sure you want to change the base?

Add better sorting of completions #1165

Conversation

schmeic commented Jun 13, 2024 • edited Loading

Description

Checklist

rolandwalker commented Jun 13, 2024

schmeic commented Jun 13, 2024

rolandwalker commented Jun 13, 2024

schmeic commented Jun 13, 2024

rolandwalker commented Jun 13, 2024

rolandwalker Jun 13, 2024

Choose a reason for hiding this comment

schmeic Jun 13, 2024

Choose a reason for hiding this comment

schmeic commented Jun 13, 2024 • edited Loading

rolandwalker commented Jun 13, 2024

schmeic commented Jun 13, 2024 • edited Loading

rolandwalker commented Jun 13, 2024

schmeic commented Jun 13, 2024

schmeic commented Jun 13, 2024

rolandwalker commented Jun 14, 2024

amjith commented Jun 14, 2024

schmeic commented Jun 14, 2024 • edited Loading

amjith commented Jun 15, 2024

schmeic commented Jun 17, 2024 • edited Loading

schmeic commented Jun 27, 2024

amjith commented Jun 27, 2024

schmeic commented Jun 27, 2024

schmeic commented Jul 9, 2024

amjith commented Jul 9, 2024

schmeic commented Jul 10, 2024

schmeic commented Aug 7, 2024 • edited Loading

schmeic commented Aug 19, 2024

rolandwalker commented Aug 19, 2024

schmeic commented Aug 20, 2024

rolandwalker commented Aug 20, 2024

schmeic commented Aug 21, 2024 • edited Loading

schmeic commented Sep 12, 2024

schmeic commented Oct 9, 2024

amjith commented Nov 3, 2024

amjith Nov 3, 2024

Choose a reason for hiding this comment

schmeic Nov 4, 2024

Choose a reason for hiding this comment

amjith commented Nov 3, 2024

amjith commented Nov 4, 2024

amjith commented Nov 4, 2024

schmeic commented Nov 4, 2024

amjith commented Nov 5, 2024 • edited Loading

schmeic commented Nov 5, 2024 • edited Loading

amjith commented Nov 7, 2024

amjith commented Nov 7, 2024

schmeic commented Nov 7, 2024

schmeic commented Jun 13, 2024 •

edited

Loading

schmeic commented Jun 13, 2024 •

edited

Loading

schmeic commented Jun 13, 2024 •

edited

Loading

schmeic commented Jun 14, 2024 •

edited

Loading

schmeic commented Jun 17, 2024 •

edited

Loading

schmeic commented Aug 7, 2024 •

edited

Loading

schmeic commented Aug 21, 2024 •

edited

Loading

amjith commented Nov 5, 2024 •

edited

Loading

schmeic commented Nov 5, 2024 •

edited

Loading