Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom joins #691

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open

Conversation

cmacmackin
Copy link
Contributor

@cmacmackin cmacmackin commented Nov 19, 2023

This introduces the ability to JOIN schemas in aggregations ON fields other than the page ID. It required some significant refactoring of the SQL generation so will need extensive review.

Fixes #269, fixes #285, fixes #598

For the time being, joins are only supported on single-valued columns. Multi-valued columns represented another layer of complexity on what is already an extremely complicated set of changes. It would probably significantly delay me finishing this work, so I thought it would be best to get what I currently have merged in. Someone can always add support later.

This work involved the following changes:

  • Introducing additional config syntax to specify what fields to join schemas on (plus associated code to process and store this data)
  • Setting the "context" for all of the fake columns used by Search.
  • Moving logic for handling multi-valued fields to within the select() method
  • Moving datatype-specific select logic out of select() and into selectCol() (which gets called from select())
  • Adding a getSqlCompareValue() method to the datatype classes. This returns an expression for the contents of the column to be used either in filtering or joining. If need be, it will also insert any other conditional expressions needed into a query's WHERE clause.
  • Adding a getAdditionalJoinForComparison() method for the datatype classes. This returns information needed to perform any extra joins required when filtering or joining data. Returning the arguments, rather than setting up the join itself, allows the order of joins to be chosen appropriately for the context.
  • Adding wrapValues()/wrapValue() methods to the datatype classes. These will wrap a value (or array of values) that is being compared against in any additional logic that is needed (e.g., making it lower-case or converting it to digits).
  • Implement filter() only on AbstractBaseType and refactor it to use getSqlCompareValue() and getAdditionalJoinForComparison() for the type-specific logic.
  • Introducing a joinCondition() method to construct alternative JOIN ON clauses when generating SQL queries.
  • Introducing a joinConditionIfAdditionalJoin() method to allow any further type-specific logic for complicated joins (i.e., page titles)
  • Adding logic to a number of existing methods to handle custom join conditions.
  • In the QueryBuilder::addLeftJoin(), choose insertion order based on all tables referenced in the ON clause (not just the left table being joined against).
  • Updates to existing tests plus writing new tests for the new features (especially joining on page titles, as this is the most complicated case and also likely to be one of the most useful).

This included adding some arguments to old tests and writing a few new
ones to test custom join logic.
Now just need to implement the joins when performing searches...
This supports types that need no special processing of the column data
returned from the SQLite database and which are single-valued. Further
work will be needed to support multi-valued columns. I also need to
implement some more sophisticated logic for some of the more complex
data types. Basically, any type which overrides the `filter` method
will also need an override for the purposes of JOINing.
Moving the left-join needed for multi-columns into a separate method
should prove useful for filtering and joining as well.
After my refactoring, prefixes and postfixes are only registered as
values once when comparing columns against multiple RHS values. This
will work just fine but means that things differ slightly from
existing tests.
@splitbrain
Copy link
Member

@cmacmackin awesome! It will probably take us a while to review this but I appreciate the effort.

This allows the order of the additional join to be varied depending on
whether filtering or joining primary data.

Also renamed getSqlConstantValue to wrapValue, which is more
descriptive.
This is known not to work when the RHS of the JOIN is a page title;
the SQL ends up broken somehow. It does work when only the LHS is a
title though
It now respects order not only for the existing table being joined
against, but also for any other tables referenced in the ON
clause. This was necessary to get custom joins against page titles to
work properly.
These both represent complicated joins which actually require joining
against more than 1 table. The page title one is particularly
complicated, as it should actually match against two possible
values (id or title).
@cmacmackin cmacmackin marked this pull request as ready for review November 30, 2023 22:49
@cmacmackin
Copy link
Contributor Author

@splitbrain I've now finished my work on this. I appreciate that this is a very complicated pull request that will take some time to review. I'm not really in a rush. This was just something I started working on months ago and figured I should try to get it finished before tackling the other issues I recently raised.

I've written tests for new features that I've added but I can't speak for how well the existing test-suite covers some of my refactoring.

@cmacmackin
Copy link
Contributor Author

Also, I'm not experienced working with databases, so it is possible that some of the queries could be structured better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants