Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#4984] improvement(core, doris): Add the random distribution strategy #4985

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yuqi1129
Copy link
Contributor

@yuqi1129 yuqi1129 commented Sep 23, 2024

What changes were proposed in this pull request?

Add the random strategy instead of even for Distribution

Why are the changes needed?

Doris support random distribution instead of even.

Fix: #4984

Does this PR introduce any user-facing change?

N/A.

How was this patch tested?

IT

EVEN,

/** Distributes data randomly across partitions or table. */
RANDOM;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the differences between EVEN and RANDOM?

AFAIK, RANDOM is a kind of implementation of EVEN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both aim to balance the distribution of data to optimize performance, "Random" emphasizes more on the randomness of the data, while "Even" focuses on maintaining the uniformity of the distribution.

They are slightly different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FANNG1 do you have any comments on this issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, HASH, RANGE RANDOM are the implemetation how we do the distribution, even is the something like distribution result, both HASH and RANDOM are the implementation of EVEN

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's recommend to remove EVEN? But I remember that @yuqi1129 has done research and there is a certain kind of table that uses EVEN as a distribution name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuqi1129 , do you remember which kind of table use even distribution, could it be replaced by round-robin or random?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerryshao any thoughts on this point? #4991 depends on this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You guys have better background on this, you can have a off-line discussion and negotiate out a solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could not reach an agreement till now, so I postponed it to 0.7.0 as it's not a bugfix.

@yuqi1129 yuqi1129 removed the branch-0.6 Automatically cherry-pick commit to branch-0.6 label Sep 27, 2024
@jerryshao
Copy link
Contributor

@yuqi1129 this should be merged before 0.7.0.

@yuqi1129
Copy link
Contributor Author

@yuqi1129 this should be merged before 0.7.0.

Got it.

@yuqi1129
Copy link
Contributor Author

Postpone this to 0.8.0 as it's not a big need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Add distribution strategy random as the drois table support random
4 participants