-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][transform] transform support explode #7928
base: dev
Are you sure you want to change the base?
Conversation
cc @corgy-w |
cc @YuriyGavrilov as well. |
seatunnel-api/src/main/java/org/apache/seatunnel/api/transform/SeaTunnelMultiRowTransform.java
Outdated
Show resolved
Hide resolved
docs/en/transform-v2/explode.md
Outdated
Explode { | ||
source_table_name = "fake" | ||
result_table_name = "fake1" | ||
explode_fields = {"name":",","card":";"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should support list type value too. For example,
name | age | value list |
---|---|---|
Joy Ding,May Ding | 20 | [123, 234] |
Kin Dom,Joy Dom | 20 | [123, 345] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should support list type value too. For example,
| name | age | value list |
|-------------------|-----|---------|
| Joy Ding,May Ding | 20 | [123, 234] |
| Kin Dom,Joy Dom | 20 | [123, 345] |
Just don't understand how names are grouped. List type for values great idea but if we lose link between column name and values not so good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
## Description | ||
|
||
Split a row of data into multiple data according to the specified delimiter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, this code basic is looks good to me.
But I have a thoughts about this explode feature, can we only support explode a list type column? the string split phase move to another phase, like add a sql function, split(str_col, ';')
More over, we can add explode
as a sql function, then the transform all use Sql
transfrom. the sql is looks like
select explode(split(str_col,';')) from
(Just discussion, WDYT @Hisoka-X )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @liunaijie , I think you speak to the point.
More over, we can add explode as a sql function
Yes, I like sql function more.
select explode(split(str_col,';'))
This way more clear and better. cc @CosmosNi WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS: we should add LATERAL VIEW
keywords too. Please refer https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-lateral-view.html. It is reasonable to split it into multiple PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liunaijie @Hisoka-X But SQLTransform only support one to one. So edit SQLTransform to support flatmap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, some code in this pr can be reused.
* @param row the data need be transformed. | ||
* @return transformed data. | ||
*/ | ||
List<T> flatMap(T row); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @hailin0
dataList.add(inputData); | ||
|
||
for (SeaTunnelTransform<T> transformer : transform) { | ||
log.info("transform test: {}", transformer.getPluginName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove.
if (!outputDataList.isEmpty()) { | ||
// todo log metrics | ||
for (T outputData : outputDataList) { | ||
collector.collect(new Record<>(outputData)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Hisoka-X I have a question. Do I need to record the metrics information of transform once before transform
and again after the result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the metric needs to be recorded before and after each transform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, our metrics only record source output and sink input. We should support record transform input/output too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get
#7926
support explode transform function
Purpose of this pull request
Does this PR introduce any user-facing change?
How was this patch tested?
Check list
New License Guide
release-note
.