Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][transform] transform support explode #7928

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from

Conversation

CosmosNi
Copy link
Contributor

#7926
support explode transform function

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

@Hisoka-X
Copy link
Member

cc @corgy-w

@Hisoka-X
Copy link
Member

cc @YuriyGavrilov as well.

Explode {
source_table_name = "fake"
result_table_name = "fake1"
explode_fields = {"name":",","card":";"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should support list type value too. For example,

name age value list
Joy Ding,May Ding 20 [123, 234]
Kin Dom,Joy Dom 20 [123, 345]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should support list type value too. For example,

| name | age | value list |

|-------------------|-----|---------|

| Joy Ding,May Ding | 20 | [123, 234] |

| Kin Dom,Joy Dom | 20 | [123, 345] |

Just don't understand how names are grouped. List type for values great idea but if we lose link between column name and values not so good.

Copy link
Member

@Hisoka-X Hisoka-X Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


## Description

Split a row of data into multiple data according to the specified delimiter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this code basic is looks good to me.

But I have a thoughts about this explode feature, can we only support explode a list type column? the string split phase move to another phase, like add a sql function, split(str_col, ';')

More over, we can add explode as a sql function, then the transform all use Sql transfrom. the sql is looks like
select explode(split(str_col,';')) from

(Just discussion, WDYT @Hisoka-X )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @liunaijie , I think you speak to the point.

More over, we can add explode as a sql function

Yes, I like sql function more.

select explode(split(str_col,';'))

This way more clear and better. cc @CosmosNi WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: we should add LATERAL VIEW keywords too. Please refer https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-lateral-view.html. It is reasonable to split it into multiple PRs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liunaijie @Hisoka-X But SQLTransform only support one to one. So edit SQLTransform to support flatmap?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, some code in this pr can be reused.

* @param row the data need be transformed.
* @return transformed data.
*/
List<T> flatMap(T row);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataList.add(inputData);

for (SeaTunnelTransform<T> transformer : transform) {
log.info("transform test: {}", transformer.getPluginName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove.

Comment on lines +95 to +98
if (!outputDataList.isEmpty()) {
// todo log metrics
for (T outputData : outputDataList) {
collector.collect(new Record<>(outputData));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hisoka-X I have a question. Do I need to record the metrics information of transform once before transform and again after the result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the metric needs to be recorded before and after each transform.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, our metrics only record source output and sink input. We should support record transform input/output too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants