-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cherry-pick orca and gpopt commits till 16Jun2022 #688
Cherry-pick orca and gpopt commits till 16Jun2022 #688
Conversation
|
I will fix tests, see failed pipelines |
9085246
to
63511a5
Compare
this is because orca planner not support parallel feature. |
63511a5
to
9dbc53d
Compare
ec0fd40
to
0d1c0a1
Compare
If SubqueryAll operator has a Subquery scalar child then we should treat the subquery context as EsqctxtValue, even if the SubqueryAll comes from the predicate. Let's use the following example: ``` SELECT * FROM foo WHERE (SELECT a FROM foo limit 1) = ALL(SELECT b FROM bar); +--CScalarSubqueryAll(=)["b" (8)] |--CLogicalGet "bar" ("bar"), +--CScalarSubquery["a" (16)] +--CLogicalLimit <empty> global |--CLogicalGet "foo" ("foo"), |--CScalarConst (0) +--CScalarCast +--CScalarConst (1) ``` Setting subquery context EsqctxtValue signals to transform the scalar subquery into a left outer apply as opposed to an inner apply in function FGenerateCorrelatedApplyForScalarSubquery(). A left outer apply is necessary to preserve non-matching rows in the output (inner apply would have filtered them out) which are required to correctly determine the ALL_SUBLINK result. Co-authored-by: Orhan Kislal <[email protected]>
Issue is that with the overload we cannot properly raise while passing a variadic integer argument. Instead it will interpret the argument as severity_level. This leads to bogus error messages that don't make sense like: "Attribute number 23761064 not found in project list" Co-authored-by: David Kimura <[email protected]> Co-authored-by: Orhan Kislal <[email protected]>
Following should produce a HashAggregate alternative: ``` CREATE TABLE t(a int); INSERT INTO t VALUES (1), (2), (3); ANALYZE t; SET optimizer_enable_groupagg=off; EXPLAIN SELECT COUNT(DISTINCT a) FROM t GROUP BY (a); ``` Prior to this commit, the query alternatives added in CXformSplitDQA contain aggregates with distinct agg func. However, executor does not supported that for AGG_HASHED implementation. CXformGbAgg2HashAgg guards against constructing such a plan. This commit enables HashAggregate plans by constructing a two-stage aggregate alternative in CXformSplitDQA that does not include a distinct agg func.
When deriving distribution spec for `CPhysicalSplit`, ORCA checks if its disjoint with the hashed colrefs aswell as the equivalent hashed colrefs (PdshashedEquiv()). For certain nested queries with a JOIN on the same column, it is possible that the equivalent hashed colrefs also have other equivalent hashed colrefs. ORCA did not consider such cases and would skip recursively checking for hashed-equivalent colrefs and include it to check if its disjoint with the colrefs being modified. Thus generating a plan without a Redistribute on top of a Split node when updating the distribution column. This would lead to wrong results. This commit fixes the issue. Co-authored-by: gpopt <[email protected]> Close #13416
This commit adds a rewindablility spec for PhysicalAgg if it is created by an NL Join. This addition ensures that there is a materialize on top of the aggregate which improves the performance. Co-authored-by: Ekta Khanna <[email protected]>
Commit 6f175df1 added support for subplan test expression containing scalar ident on left side. But in order to properly support that scenario we needed to account for cast and coerceio as well.
When there is a param, `ExprCollation` relies on paramcollid to figure out the resultcollid. We weren't setting it while creating a param. This commit fixes it. Co-authored-by: Sambitesh Dash <[email protected]> Co-authored-by: Bhuvnesh Chaudhary <[email protected]>
Refactoring UpgradeRelLockIfNecessary() and CondUpgradeRelLock(). Currently CondUpgradeRelLock() always needs to open and close table to check if it's an AO table. Try to avoid that by passing the already opened table back to the caller. Since 55679b7, CondUpgradeRelLock() no longer needs to be called elsewhere other than UpgradeRelLockIfNecessary(), so we can just remove it to make the code much simpler.
This commit removes the function UpgradeRelLockAndReuseRelIfNecessary and clean up the implementation of CdbTryOpenTable. Now the logic is: 1. open the relation using some lockmode 2. if need correct lockmode (AO|AOCO with GDD enabled), then close and reopen it using upgraded lockmode 3. if need lockUpgraded information, deduce this using actually mode with request mode. Also we remove the post-check for AO table not upgrade, because under the new implementation that's impossible to happen and NOTE Greenplum does not support change table's storage kind yet (Even support some day, there will not be any problem). The other place that uses UpgradeRelLockAndReuseRelIfNecessary is gpdbwrappers.cpp:GPDBLockRelationOid. Before this commit, it passes nullptr to UpgradeRelLockAndReuseRelIfNecessary, thus will open, lock, close the relation, and then lock the relation oid again. This commit implement this by just calling LockRelationOid since there is no need to test lock-upgrade here: 1. Greenplum only need to upgrade lock level for target relation of DELETE | UPDATE statement 2. Parse-Analyze stage will always handle root partition correctly and store the lockmode in RTE 3. ORCA does not support DML on partiton table yet, and in future when implementing this, ORCA should use the lockmode in RTE of root so no need to deduce lock-upgrade 4. Index does not have lock-upgrade issue.
Found while rebasing Jesse's WIP refcount branch that is attempting to remove manual refcounting. Example demonstrating leak: ``` $ gpconfig -c optimizer_use_gpdb_allocators -v off --skipvalidation $ gpstop -air test=# CREATE TYPE myType as (a int, b int); test=# CREATE OR REPLACE FUNCTION myFunc() RETURNS myType IMMUTABLE as $$ SELECT 1, 2 $$ LANGUAGE SQL; test=# EXPLAIN SELECT * FROM myFunc(); ```
0d1c0a1
to
ee1876a
Compare
Rebased to resolve merge conflicts. Still need to fix tests and some review issues. I will take some time, because some test issues do not reproduce in my local development environment. I will let you know when I'm finished. |
ee1876a
to
2353066
Compare
2353066
to
849ce00
Compare
Original commit "Clean up logic in CdbTryOpenTable" should be fixed to correct support Cloudberry features: 1. CB has special singlenode mode. In that mode we act as if the global deadlock detector is enabled and try to use RowExclusiveLock 2. It is better to use RelationIsNonblockRelation macro instead of RelationIsAppendOptimized because of Commit b413e0b.
849ce00
to
ca2ac10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Important notes:
The file select_parallel_optimizer.out was deleted in the "Initial Cloudberry code dump 2" commit. The reason - this is because orca planner not support parallel feature
I cherry-picked in time-based fashion, from older to newer, and skipped "fix incorrect RTE's rellockmode in the process of translating DX" because I didn't know if the issue still existed or not.
Some fixes were already merged Cherry-pick ORCA InPlaceUpdate implementation #603.
Change logs
2nd iteration:
gporca
Apr 15, 2022 Fix subquery all subquery context (#13377) open-gpdb/gpdb@f5e3a58 - conflicts with 22fc48b
Apr 28, 2022 Remove overload raise function with severity_level (#13376) open-gpdb/gpdb@94db8a1
May 10 [ORCA] Enable more HashAggregate alternative plans (#13421) open-gpdb/gpdb@0276e5c
Jun 1, 2022 Fix missing Redistribute on top of Split Update with Orca open-gpdb/gpdb@f47b262
Jun 3, 2022 Materialize aggregations in NL Join inner child open-gpdb/gpdb@6420b60
gpopt
Mar 23, 2022 Fix testexpr translation of outer expr (#13296) open-gpdb/gpdb@fd9e35b
Mar 24, 2022 Fix paramcollid for param in ORCA translator (#13302) open-gpdb/gpdb@d03632b
Apr 27, 2022 Avoid opening table in CondUpgradeRelLock() when possible open-gpdb/gpdb@73e50fb - conflicts with 16b0b9d
May 6, 2022 Clean up logic in CdbTryOpenTable. open-gpdb/gpdb@43a5e58 - dangerous, need more thorougly review
Jun 16, 2022 [ORCA] Fix memory leaks in translator (#13656) open-gpdb/gpdb@6bbef9e
Why are the changes needed?
Describe why the changes are necessary.
Does this PR introduce any user-facing change?
If yes, please clarify the previous behavior and the change this PR proposes.
How was this patch tested?
Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.
Contributor's Checklist
Here are some reminders and checklists before/when submitting your pull request, please check them:
make installcheck
make -C src/test installcheck-cbdb-parallel
cloudberrydb/dev
team for review and approval when your PR is ready🥳