Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Connector-V2]Jdbc chunk split add “snapshot.split.column” params #7794 #7840

Open
wants to merge 26 commits into
base: dev
Choose a base branch
from

Conversation

XenosK
Copy link
Contributor

@XenosK XenosK commented Oct 15, 2024

Purpose of this pull request

fix: #7794

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test case.

@github-actions github-actions bot added the e2e label Oct 15, 2024
@XenosK
Copy link
Contributor Author

XenosK commented Oct 15, 2024

Please add test case.

Test case have been added, take a look.

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table table = dialect.queryTableSchema(jdbc, tableId).getTable();

// first , compare user defined split column is in the primary key or unique key
Properties splitColumnProperties = new Properties();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use Map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use Map?

org.apache.seatunnel.connectors.cdc.base.config.BaseSourceConfig , I see dbzProperties use Properties, so do the same

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @XenosK

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Map, properties are used because they interact with debezium, I think we should use Map in here just like other connectors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Map, properties are used because they interact with debezium, I think we should use Map in here just like other connectors.

ok

XenosK and others added 7 commits October 16, 2024 11:01
…ain/java/org/apache/seatunnel/connectors/cdc/base/source/enumerator/splitter/AbstractJdbcSourceChunkSplitter.java

Co-authored-by: Jia Fan <[email protected]>
…ain/java/org/apache/seatunnel/connectors/cdc/base/dialect/JdbcDataSourceDialect.java

Co-authored-by: Jia Fan <[email protected]>
…ain/java/org/apache/seatunnel/connectors/cdc/base/dialect/JdbcDataSourceDialect.java

Co-authored-by: Jia Fan <[email protected]>
@XenosK XenosK requested a review from Hisoka-X October 21, 2024 01:34
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @hailin0 take a look.

Hisoka-X
Hisoka-X previously approved these changes Oct 21, 2024
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -184,6 +184,7 @@ When an initial consistent snapshot is made for large databases, your establishe
| stop.specific-offset.file | String | No | - | Stop from the specified binlog file name. **Note, This option is required when the `stop.mode` option used `specific`.** |
| stop.specific-offset.pos | Long | No | - | Stop from the specified binlog file position. **Note, This option is required when the `stop.mode` option used `specific`.** |
| snapshot.split.size | Integer | No | 8096 | The split size (number of rows) of table snapshot, captured tables are split into multiple splits when read the snapshot of table. |
| snapshot.split.column | Config | No | - | The table name and split column (must be unique key) of table snapshot, captured tables are split into multiple splits when read the snapshot of table. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@hailin0 hailin0 Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link
#6106

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@hailin0 hailin0 Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, snapshot.split.column requires binding to a specific table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, snapshot.split.column requires binding to a specific table

Now snapshot.split.column config like this, has binded to a specific table, you mean snapshot.split.column into table-name-config?
截屏2024-10-22 10 15 16

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

table-names-config must have primaryKeys , otherwise, an error will be reported, now I only want to config table and snapshot.split.column

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants