Cherry-pick orca and gpopt commits till 16Jun2022 #688

leborchuk · 2024-10-29T12:29:55Z

Important notes:

The file select_parallel_optimizer.out was deleted in the "Initial Cloudberry code dump 2" commit. The reason - this is because orca planner not support parallel feature
I cherry-picked in time-based fashion, from older to newer, and skipped "fix incorrect RTE's rellockmode in the process of translating DX" because I didn't know if the issue still existed or not.
Some fixes were already merged Cherry-pick ORCA InPlaceUpdate implementation #603.

Change logs

2nd iteration:
gporca
Apr 15, 2022 Fix subquery all subquery context (#13377) open-gpdb/gpdb@f5e3a58 - conflicts with 22fc48b
Apr 28, 2022 Remove overload raise function with severity_level (#13376) open-gpdb/gpdb@94db8a1
May 10 [ORCA] Enable more HashAggregate alternative plans (#13421) open-gpdb/gpdb@0276e5c
Jun 1, 2022 Fix missing Redistribute on top of Split Update with Orca open-gpdb/gpdb@f47b262
Jun 3, 2022 Materialize aggregations in NL Join inner child open-gpdb/gpdb@6420b60

gpopt
Mar 23, 2022 Fix testexpr translation of outer expr (#13296) open-gpdb/gpdb@fd9e35b
Mar 24, 2022 Fix paramcollid for param in ORCA translator (#13302) open-gpdb/gpdb@d03632b
Apr 27, 2022 Avoid opening table in CondUpgradeRelLock() when possible open-gpdb/gpdb@73e50fb - conflicts with 16b0b9d
May 6, 2022 Clean up logic in CdbTryOpenTable. open-gpdb/gpdb@43a5e58 - dangerous, need more thorougly review
Jun 16, 2022 [ORCA] Fix memory leaks in translator (#13656) open-gpdb/gpdb@6bbef9e

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
Sign the Contributor License Agreement as prompted for your first-time contribution(One-time setup).
Learn the coding contribution guide, including our code conventions, workflow and more.
List your communication in the GitHub Issues or Discussions (if has or needed).
Document changes.
Add tests for the change
Pass make installcheck
Pass make -C src/test installcheck-cbdb-parallel
Feel free to request cloudberrydb/dev team for review and approval when your PR is ready🥳

CLAassistant · 2024-10-29T12:30:05Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 7 committers have signed the CLA.

✅ leborchuk
❌ dgkimura
❌ khannaekta
❌ sambitesh
❌ huansong
❌ orhankislal
❌ kainwen
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

leborchuk · 2024-10-29T15:32:04Z

I will fix tests, see failed pipelines

yjhjstz · 2024-10-30T01:36:16Z

select_parallel_optimizer.out was deleted

this is because orca planner not support parallel feature.

src/backend/access/table/table.c

src/test/regress/expected/gist.out

src/backend/executor/execMain.c

src/backend/access/table/table.c

If SubqueryAll operator has a Subquery scalar child then we should treat the subquery context as EsqctxtValue, even if the SubqueryAll comes from the predicate. Let's use the following example: ``` SELECT * FROM foo WHERE (SELECT a FROM foo limit 1) = ALL(SELECT b FROM bar); +--CScalarSubqueryAll(=)["b" (8)] |--CLogicalGet "bar" ("bar"), +--CScalarSubquery["a" (16)] +--CLogicalLimit <empty> global |--CLogicalGet "foo" ("foo"), |--CScalarConst (0) +--CScalarCast +--CScalarConst (1) ``` Setting subquery context EsqctxtValue signals to transform the scalar subquery into a left outer apply as opposed to an inner apply in function FGenerateCorrelatedApplyForScalarSubquery(). A left outer apply is necessary to preserve non-matching rows in the output (inner apply would have filtered them out) which are required to correctly determine the ALL_SUBLINK result. Co-authored-by: Orhan Kislal <[email protected]>

Issue is that with the overload we cannot properly raise while passing a variadic integer argument. Instead it will interpret the argument as severity_level. This leads to bogus error messages that don't make sense like: "Attribute number 23761064 not found in project list" Co-authored-by: David Kimura <[email protected]> Co-authored-by: Orhan Kislal <[email protected]>

Following should produce a HashAggregate alternative: ``` CREATE TABLE t(a int); INSERT INTO t VALUES (1), (2), (3); ANALYZE t; SET optimizer_enable_groupagg=off; EXPLAIN SELECT COUNT(DISTINCT a) FROM t GROUP BY (a); ``` Prior to this commit, the query alternatives added in CXformSplitDQA contain aggregates with distinct agg func. However, executor does not supported that for AGG_HASHED implementation. CXformGbAgg2HashAgg guards against constructing such a plan. This commit enables HashAggregate plans by constructing a two-stage aggregate alternative in CXformSplitDQA that does not include a distinct agg func.

When deriving distribution spec for `CPhysicalSplit`, ORCA checks if its disjoint with the hashed colrefs aswell as the equivalent hashed colrefs (PdshashedEquiv()). For certain nested queries with a JOIN on the same column, it is possible that the equivalent hashed colrefs also have other equivalent hashed colrefs. ORCA did not consider such cases and would skip recursively checking for hashed-equivalent colrefs and include it to check if its disjoint with the colrefs being modified. Thus generating a plan without a Redistribute on top of a Split node when updating the distribution column. This would lead to wrong results. This commit fixes the issue. Co-authored-by: gpopt <[email protected]> Close #13416

This commit adds a rewindablility spec for PhysicalAgg if it is created by an NL Join. This addition ensures that there is a materialize on top of the aggregate which improves the performance. Co-authored-by: Ekta Khanna <[email protected]>

Commit 6f175df1 added support for subplan test expression containing scalar ident on left side. But in order to properly support that scenario we needed to account for cast and coerceio as well.

When there is a param, `ExprCollation` relies on paramcollid to figure out the resultcollid. We weren't setting it while creating a param. This commit fixes it. Co-authored-by: Sambitesh Dash <[email protected]> Co-authored-by: Bhuvnesh Chaudhary <[email protected]>

Refactoring UpgradeRelLockIfNecessary() and CondUpgradeRelLock(). Currently CondUpgradeRelLock() always needs to open and close table to check if it's an AO table. Try to avoid that by passing the already opened table back to the caller. Since 55679b7, CondUpgradeRelLock() no longer needs to be called elsewhere other than UpgradeRelLockIfNecessary(), so we can just remove it to make the code much simpler.

This commit removes the function UpgradeRelLockAndReuseRelIfNecessary and clean up the implementation of CdbTryOpenTable. Now the logic is: 1. open the relation using some lockmode 2. if need correct lockmode (AO|AOCO with GDD enabled), then close and reopen it using upgraded lockmode 3. if need lockUpgraded information, deduce this using actually mode with request mode. Also we remove the post-check for AO table not upgrade, because under the new implementation that's impossible to happen and NOTE Greenplum does not support change table's storage kind yet (Even support some day, there will not be any problem). The other place that uses UpgradeRelLockAndReuseRelIfNecessary is gpdbwrappers.cpp:GPDBLockRelationOid. Before this commit, it passes nullptr to UpgradeRelLockAndReuseRelIfNecessary, thus will open, lock, close the relation, and then lock the relation oid again. This commit implement this by just calling LockRelationOid since there is no need to test lock-upgrade here: 1. Greenplum only need to upgrade lock level for target relation of DELETE | UPDATE statement 2. Parse-Analyze stage will always handle root partition correctly and store the lockmode in RTE 3. ORCA does not support DML on partiton table yet, and in future when implementing this, ORCA should use the lockmode in RTE of root so no need to deduce lock-upgrade 4. Index does not have lock-upgrade issue.

Found while rebasing Jesse's WIP refcount branch that is attempting to remove manual refcounting. Example demonstrating leak: ``` $ gpconfig -c optimizer_use_gpdb_allocators -v off --skipvalidation $ gpstop -air test=# CREATE TYPE myType as (a int, b int); test=# CREATE OR REPLACE FUNCTION myFunc() RETURNS myType IMMUTABLE as $$ SELECT 1, 2 $$ LANGUAGE SQL; test=# EXPLAIN SELECT * FROM myFunc(); ```

leborchuk · 2024-11-01T14:05:06Z

Rebased to resolve merge conflicts. Still need to fix tests and some review issues. I will take some time, because some test issues do not reproduce in my local development environment. I will let you know when I'm finished.

src/backend/storage/lmgr/lmgr.c

Original commit "Clean up logic in CdbTryOpenTable" should be fixed to correct support Cloudberry features: 1. CB has special singlenode mode. In that mode we act as if the global deadlock detector is enabled and try to use RowExclusiveLock 2. It is better to use RelationIsNonblockRelation macro instead of RelationIsAppendOptimized because of Commit b413e0b.

jiaqizho

LGTM.

leborchuk added the cherry-pick cherry-pick upstream commts label Oct 29, 2024

leborchuk assigned jiaqizho and yjhjstz Oct 29, 2024

leborchuk force-pushed the CherryPickOrcaFeb2023_2iter branch from 9085246 to 63511a5 Compare October 29, 2024 21:19

my-ship-it requested a review from jiaqizho October 30, 2024 01:50

leborchuk force-pushed the CherryPickOrcaFeb2023_2iter branch from 63511a5 to 9dbc53d Compare October 30, 2024 12:52

reshke mentioned this pull request Oct 30, 2024

cherry-pick signle commit 43a5e58dfa: Clean up logic in CdbTryOpenTable. #689

Closed

9 tasks

my-ship-it requested a review from yjhjstz October 31, 2024 09:26

leborchuk force-pushed the CherryPickOrcaFeb2023_2iter branch 2 times, most recently from ec0fd40 to 0d1c0a1 Compare October 31, 2024 19:37