Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick orca and gpopt commits till 16Jun2022 #688

Merged
merged 11 commits into from
Nov 4, 2024

Conversation

leborchuk
Copy link
Contributor

@leborchuk leborchuk commented Oct 29, 2024

Important notes:

  • The file select_parallel_optimizer.out was deleted in the "Initial Cloudberry code dump 2" commit. The reason - this is because orca planner not support parallel feature

  • I cherry-picked in time-based fashion, from older to newer, and skipped "fix incorrect RTE's rellockmode in the process of translating DX" because I didn't know if the issue still existed or not.

  • Some fixes were already merged Cherry-pick ORCA InPlaceUpdate implementation #603.


Change logs

2nd iteration:
gporca
Apr 15, 2022 Fix subquery all subquery context (#13377) open-gpdb/gpdb@f5e3a58 - conflicts with 22fc48b
Apr 28, 2022 Remove overload raise function with severity_level (#13376) open-gpdb/gpdb@94db8a1
May 10 [ORCA] Enable more HashAggregate alternative plans (#13421) open-gpdb/gpdb@0276e5c
Jun 1, 2022 Fix missing Redistribute on top of Split Update with Orca open-gpdb/gpdb@f47b262
Jun 3, 2022 Materialize aggregations in NL Join inner child open-gpdb/gpdb@6420b60

gpopt
Mar 23, 2022 Fix testexpr translation of outer expr (#13296) open-gpdb/gpdb@fd9e35b
Mar 24, 2022 Fix paramcollid for param in ORCA translator (#13302) open-gpdb/gpdb@d03632b
Apr 27, 2022 Avoid opening table in CondUpgradeRelLock() when possible open-gpdb/gpdb@73e50fb - conflicts with 16b0b9d
May 6, 2022 Clean up logic in CdbTryOpenTable. open-gpdb/gpdb@43a5e58 - dangerous, need more thorougly review
Jun 16, 2022 [ORCA] Fix memory leaks in translator (#13656) open-gpdb/gpdb@6bbef9e

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

  • Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
  • Sign the Contributor License Agreement as prompted for your first-time contribution(One-time setup).
  • Learn the coding contribution guide, including our code conventions, workflow and more.
  • List your communication in the GitHub Issues or Discussions (if has or needed).
  • Document changes.
  • Add tests for the change
  • Pass make installcheck
  • Pass make -C src/test installcheck-cbdb-parallel
  • Feel free to request cloudberrydb/dev team for review and approval when your PR is ready🥳

@CLAassistant
Copy link

CLAassistant commented Oct 29, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 7 committers have signed the CLA.

✅ leborchuk
❌ dgkimura
❌ khannaekta
❌ sambitesh
❌ huansong
❌ orhankislal
❌ kainwen
You have signed the CLA already but the status is still pending? Let us recheck it.

@leborchuk leborchuk added the cherry-pick cherry-pick upstream commts label Oct 29, 2024
@leborchuk
Copy link
Contributor Author

I will fix tests, see failed pipelines

@yjhjstz
Copy link
Member

yjhjstz commented Oct 30, 2024

select_parallel_optimizer.out was deleted

this is because orca planner not support parallel feature.

dgkimura and others added 10 commits November 1, 2024 16:29
If SubqueryAll operator has a Subquery scalar child then we should treat
the subquery context as EsqctxtValue, even if the SubqueryAll comes from
the predicate. Let's use the following example:

```
   SELECT * FROM foo WHERE (SELECT a FROM foo limit 1) = ALL(SELECT b FROM bar);

   +--CScalarSubqueryAll(=)["b" (8)]
      |--CLogicalGet "bar" ("bar"),
      +--CScalarSubquery["a" (16)]
         +--CLogicalLimit <empty> global
            |--CLogicalGet "foo" ("foo"),
            |--CScalarConst (0)
            +--CScalarCast
               +--CScalarConst (1)
```

Setting subquery context EsqctxtValue signals to transform the scalar
subquery into a left outer apply as opposed to an inner apply in
function FGenerateCorrelatedApplyForScalarSubquery(). A left outer apply
is necessary to preserve non-matching rows in the output (inner apply
would have filtered them out) which are required to correctly determine
the ALL_SUBLINK result.

Co-authored-by: Orhan Kislal <[email protected]>
Issue is that with the overload we cannot properly raise while passing a
variadic integer argument. Instead it will interpret the argument as
severity_level. This leads to bogus error messages that don't make sense
like:

"Attribute number 23761064 not found in project list"

Co-authored-by: David Kimura <[email protected]>
Co-authored-by: Orhan Kislal <[email protected]>
Following should produce a HashAggregate alternative:

  ```
  CREATE TABLE t(a int);
  INSERT INTO t VALUES (1), (2), (3);
  ANALYZE t;
  SET optimizer_enable_groupagg=off;

  EXPLAIN SELECT COUNT(DISTINCT a) FROM t GROUP BY (a);
  ```

Prior to this commit, the query alternatives added in CXformSplitDQA
contain aggregates with distinct agg func. However, executor does not
supported that for AGG_HASHED implementation. CXformGbAgg2HashAgg guards
against constructing such a plan.

This commit enables HashAggregate plans by constructing a two-stage
aggregate alternative in CXformSplitDQA that does not include a distinct
agg func.
When deriving distribution spec for `CPhysicalSplit`, ORCA checks if its
disjoint with the hashed colrefs aswell as the equivalent hashed colrefs
(PdshashedEquiv()). For certain nested queries with a JOIN on the same
column, it is possible that the equivalent hashed colrefs also have
other equivalent hashed colrefs. ORCA did not consider such cases and
would skip recursively checking for hashed-equivalent colrefs and
include it to check if its disjoint with the colrefs being modified.
Thus generating a plan without a Redistribute on top of a Split node
when updating the distribution column. This would lead to wrong results.
This commit fixes the issue.

Co-authored-by: gpopt <[email protected]>

Close  #13416
This commit adds a rewindablility spec for PhysicalAgg if it is created
by an NL Join. This addition ensures that there is a materialize on top
of the aggregate which improves the performance.

Co-authored-by: Ekta Khanna <[email protected]>
Commit 6f175df1 added support for subplan test expression containing
scalar ident on left side. But in order to properly support that
scenario we needed to account for cast and coerceio as well.
When there is a param, `ExprCollation` relies on paramcollid to figure out the
resultcollid. We weren't setting it while creating a param.

This commit fixes it.

Co-authored-by: Sambitesh Dash <[email protected]>
Co-authored-by: Bhuvnesh Chaudhary <[email protected]>
Refactoring UpgradeRelLockIfNecessary() and CondUpgradeRelLock().

Currently CondUpgradeRelLock() always needs to open and close table
to check if it's an AO table. Try to avoid that by passing the already
opened table back to the caller.

Since 55679b7, CondUpgradeRelLock() no longer needs to be called
elsewhere other than UpgradeRelLockIfNecessary(), so we can just
remove it to make the code much simpler.
This commit removes the function UpgradeRelLockAndReuseRelIfNecessary
and clean up the implementation of CdbTryOpenTable. Now the logic is:

1. open the relation using some lockmode
2. if need correct lockmode (AO|AOCO with GDD enabled), then close and
   reopen it using upgraded lockmode
3. if need lockUpgraded information, deduce this using actually mode
   with request mode.

Also we remove the post-check for AO table not upgrade, because under
the new implementation that's impossible to happen and NOTE Greenplum
does not support change table's storage kind yet (Even support some
day, there will not be any problem).

The other place that uses UpgradeRelLockAndReuseRelIfNecessary is
gpdbwrappers.cpp:GPDBLockRelationOid. Before this commit, it passes
nullptr to UpgradeRelLockAndReuseRelIfNecessary, thus will open, lock,
close the relation, and then lock the relation oid again. This commit
implement this by just calling LockRelationOid since there is no need
to test lock-upgrade here:
   1. Greenplum only need to upgrade lock level for target relation of
      DELETE | UPDATE statement
   2. Parse-Analyze stage will always handle root partition correctly
      and store the lockmode in RTE
   3. ORCA does not support DML on partiton table yet, and in future
      when implementing this, ORCA should use the lockmode in RTE of
      root so no need to deduce lock-upgrade
   4. Index does not have lock-upgrade issue.
Found while rebasing Jesse's WIP refcount branch that is attempting to
remove manual refcounting.  Example demonstrating leak:

  ```
  $ gpconfig -c optimizer_use_gpdb_allocators -v off --skipvalidation
  $ gpstop -air
  test=# CREATE TYPE myType as (a int, b int);
  test=# CREATE OR REPLACE FUNCTION myFunc() RETURNS myType IMMUTABLE as $$ SELECT 1, 2 $$ LANGUAGE SQL;
  test=# EXPLAIN SELECT * FROM myFunc();
  ```
@leborchuk
Copy link
Contributor Author

Rebased to resolve merge conflicts. Still need to fix tests and some review issues. I will take some time, because some test issues do not reproduce in my local development environment. I will let you know when I'm finished.

Original commit "Clean up logic in CdbTryOpenTable" should be fixed to
correct support Cloudberry features:

1. CB has special singlenode mode. In that mode we act as if the global
deadlock detector is enabled and try to use RowExclusiveLock

2. It is better to use RelationIsNonblockRelation macro instead of RelationIsAppendOptimized because of Commit b413e0b.
Copy link
Contributor

@jiaqizho jiaqizho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@my-ship-it my-ship-it merged commit 69ba2c9 into apache:main Nov 4, 2024
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick cherry-pick upstream commts
Projects
None yet
Development

Successfully merging this pull request may close these issues.